Statit Custom QC Statistics
Descriptive Statistics
Univariate Statistics
The univariate statistics procedure computes
various univariate statistics: mean, median,
variance, maximum, minimum, Coefficient of variation,
Corrected sum of squares, Geometric mean, Standard
error of the geometric mean, Harmonic mean,
Standard error of harmonic mean, Interquartile
range, Interquartile range of the median, Kurtosis,
Standard error of the median, Midrange, Number
of missing cases, First quartile, Third quartile,
Range, Sample Size, Skewness, Standard deviation,
Standard error of the mean, Sum, Sum of case
weights, Number of valid cases. Percentiles
may also be computed. The results may be displayed
separately for each variable or in summary form
for all variables. The results may also be saved
for use in other calculations.
These statistics are also available as functions
in Statit expressions.
Frequency
Distributions
The frequency distribution procedure computes
a frequency distribution for measurement variables.
Rather than computing counts for individual
values, this procedure computes counts for values
that fall into continuous intervals. The output
consists of: lower and upper endpoints of the
intervals, frequency counts, and relative and
cumulative percentages.
Frequency
Tables
The frequency table procedure produces 1way
to nway frequency and crosstabulation tables
and multiple response tables.
Frequency tables show the distribution of the
values of a variable with the number of occurrences
of each unique value of the variable. Crosstabulation
tables show combined frequencies for two or
more variables. The results of the crosstabulation
may be saved for later use.
When the Statistics Module is licensed, the
frequency table procedure also performs tests
and computes measures of association. For nway
tables, it does stratified analysis, computing
statistics within and across strata.
MultiWay Univariate
Statistics
The multiway univariate statistics procedure
provides a technique for examining various statistics
for dependent or analysis variables among various
groupings in a sample or population. The groupings
are determined by using categorical class variables;
e.g., group the dependent variable GPA by Sex
and Class.
The default statistics are: frequency count,
mean, standard deviation, and number of valid
cases. The following statistics may be computed:
C.O.V., maximum, mean, midrange, minimum, missing
cases, valid cases, range, standard deviation,
standard error, sum, sum of case weights, and
variance.
Tabular Reporting
The Statit tabular report procedure builds tables
of descriptive statistics from classification
variables and analysis variables. Tables are
constructed in up to three dimensions: stub,
banner, and page. The stub (row dimension) and
banner (column dimension) may have multiple
variables, nested or concatenated.
The body of the table is made up of cells,
which contain the information in the tablefrequency
counts, percentages, means, or other statistics.
The cells are defined by the values of the variable,
or combination of variables, for the table.
In a onedimensional table, the cells are formed
by rows, in a twodimensional table they are
formed by the intersection of rows and columns,
and in a threedimensional table, cells are
formed by the intersection of rows, columns,
and pages.
Statistics for each cell are calculated on
values from all cases defined by that cell.
That is, each value of a classification variable
such as Academic_class, freshman, sophomore,
etc., defines a cell. When calculating statistics
for an analysis variable such as GPA, statistics
are calculated for the values of GPA that correspond
to the different academic classes.
Graphics Capabilities
Statit provides procedures to graphically explore
the shapes, patterns, and relationships of your
data. Graphics are available for:
 pie
 bar
 histogram
 dot
 box
 probability
 percentile
 scatter or curve, contour, bubble, sunflower
 scatterplot matrix
Inferential Statistics
One And TwoSample
Inference
Statit provides procedures for testing and estimation
in one or twosample problems. This includes
both “continuous” responses and exact
tests and other inferences for proportions.
For the onesample case, a confidence interval
for the population mean is provided, along with
an optional test of an hypothesized mean.
For the twosample case and the paireddata
case, a test for equal population means is provided
along with confidence limits for the differences
in means. Some diagnostics are provided, indicating
when the procedures may not be appropriate.
In these situations more robust procedures may
be used, such as the Location procedure, which
provides inference about either the population
mean or median; the Dispersion procedure provides
inference about either the population standard
deviation or interquartile range (IQR) of a
population based on a single sample.
The Location and Dispersion procedures include
diagnostics to indicate when methods for normallydistributed
data are not suitable, along with suggestions
as to how to proceed in such cases. An approximation
to the ShapiroWilk W test is used to test for
normality.
The following procedures are also available
for twosamples:
 Compare Location provides inferences comparing
either the population means, medians, or geometric
means.
 Compare Dispersion provides inferences comparing
either the population standard deviations
or the interquartile ranges.
 Guided Compare provides “interactive”
measure of location for two samples with guidance.
Each of these procedures includes diagnostics
to indicate when methods for normally distributed
data are not suitable, and suggestions as to
how to proceed in such cases. The following
rank methods are included:
 Wilcoxon test for comparing two independent
samples
 Sign and Signed Rank test for paired data
 Median test for two independent samples
 Runs test
 KolmorogovSmirnov test for comparing two
samples
Enumerative
Data
Statit provides enumerative data procedures
for:
 Binomial data which includes both one and
twosample applications and regression models.
The binomial regression performs maximum likelihood
fitting of regression models where the data
are proportions, following the binomial distribution,
using logistic (logit) or probit models.
 Poisson Regression for maximum likelihood
fitting using a loglinear model.
 Contingency tables, including oneway to
nway frequency and crosstabulation tables
and multiple response tables.
For nway tables, Statit does stratified analysis,
computing statistics within and across strata.The
following statistics can be requested:
 Chisquare
 Likelihood Ratio Chisquare
 MantelHaenszel Chisquare
 Phi Coefficient
 Contingency Coefficient
 Cramer's V
For 2 X 2 tables, the following are also computed:
 Continuity Adjusted Chisquare
 Fisher Exact Test (ltail and 2tail)
 McNemar's Test (+ continuity adjusted)
For tests across strata, the CochranMantelHaenszel
correlation statistic (df=l) may be computed
for an nway table. If all of the tables are
2 X 2, then summary estimates of the relative
risk are also computed.
The following measures of association and their
asymptotic standard error can be requested:
 Gamma Kendall’s Tau b
 Stuart’s Tau c
 Somers’ D
 Pearson's Correlation
 Lambda Asymmetric
 Uncertainty Coefficient
 Uncertainty Coefficient Symmetric
For 2 X 2 tables, relative risk estimates plus
confidence intervals are computed. Also, loglinear
models may be fitted via:
 The Parameter Estimates procedure which
uses a NewtonRaphson method to find parameter
estimates and standard errors for such models.
 The Fitted Values procedure which uses iterative
proportional fitting and does not give parameter
estimates. It is mainly used to determine
whether interactions are significant, and
to fit models assuming specified higher order
interactions are absent.
Analysis Of Variance
Statit provides several parametric and nonparametric
procedures for analysis of variance.
 The oneway procedure includes the posthoc
tests: Fisher's LSD, Tukey's W, NewmanKeuls,
Duncan's New Multiple Range and Scheffe's
S.
 Nway factorial designs with either balanced
or unbalanced data, provided there are no
empty cells.
 Repeated measures such as splitplot and
changeover designs with either balanced or
unbalanced cell sizes; missing cells are not
supported.
 Analysis of Covariance for a oneway treatment
design and one numerical covariable.
 The General Linear Models procedure provides
for the use of regression models with factors
specified by matrices; each matrix containing
one or more columns of covariables; also provides
for both univariate and multivariate analysis.
 KruskalWallis oneway rank ANOVA.
 Friedman ANOVA by ranks for randomized block
designs, including Kendall’s coefficient
of concordance.
 Cochran’s Q test for matched frequencies.
Correlation
Analysis
Statit provides both parametric and nonparametric
procedures for computing correlation analysis.
The Pearson productmoment and Spearman rank
order correlation coefficients are calculated.
Options for calculating ttests and computing
with case weights are also provided. Correlation
matrices may be saved and used as input into
other procedures.
Regression Analysis
Statit provides procedures for simple, multiple,
stepwise, all possible subset, binomial, Poisson,
Weibull, and nonlinear regression.
Statit’s simple and multiple linear regression
models use least squares or weighted least squares
methods. Optional statistics and output for
simple regression include:
 Beta covariance and correlation matrices,
variance inflation factor, partial correlations,
and semipartial correlations
 Collinearity diagnostics
 Influence statistics: residual, standard
error of residual, Studentized residual, Studentized
residual with current observation deleted,
Cook’s D influence statistic, leverage,
DurbinWatson, sum of residuals, sum of squared
residuals, press statistic, and the minimum
and maximum residual
 Predicted diagnostics: predicted value,
standard error of the individual predicted
value, standard error of the mean predicted
value, 95% confidence intervals for individual
and mean predicted value
Statit's stepwise multiple regression includes
weighted least squares, using either the forward
selection, backward elimination, stepwise, or
maximum R^{2} method. Options include
those for simple regression and also Mallows'
Cp.
Statit's graphical diagnostics for multiple
regression include:
 Partial residual plots for detecting nonlinearity.
 Leverage plots for detecting observations
which may be having inordinate influence on
the regression fitting.
 Residual analysis which displays either
the fitted values or any one of the independent
variables plotted against any one of: Cook’s
D, leverage values, predicted values, or various
versions of the residuals (standardized, studentized,
studentized based on deletion, etc.).
 Ridge trace analysis which shows how regression
coefficients change in “ridge regression”
as the value of the “ridge parameter”
is increased.
 Linear and Polynomial display the ordinary
least squares line of Y with X, X^{2},
X^{3}, or X^{4} superimposed
over a scatterplot of the data.
Statit’s binomial regression performs
maximum likelihood fitting of regression models
where the data are proportions, following the
binomial distribution, using logistic (logit)
or probit models.
Statit's Poisson regression performs maximum
likelihood fitting of regression models where
the response is a Poisson variable, using a
loglinear model.
Statit's nonlinear regression fits models by
least squares or weighted least squares using
one of four methods: GaussNewton, modified
GaussNewton, Marquardt, or DUD (doesn't use
derivatives). Grid searches for initial estimates
may be requested as well as specifying a loss
function to be minimized.
Statit's all possible subsets regression is
performed using one of four methods: maximizing
R^{2}, maximizing adjusted R^{2},
minimizing mean square error, or minimizing
Mallows’ Cp.
Multivariate
Analysis
Statit provides a variety of multivariate analysis
procedures:
 Multivariate analysis of variance, including
repeated measures and profile analysis.
 Principal components analysis which provides
standardized or unstandardized principal component
scores.
 Factor analysis which provides five methods
of factor extraction: principal components,
iterated principal components, image, alpha
factor analysis, and principal factor analysis.
A scree plot and Bartlett's sphericity test
are also available. There are three methods
of orthogonal rotation: varimax, equamax,
and quartimax.
 The promax oblique rotation is also available.
Plots of all loadings and rotated loadings
can be requested.
 Factor scores can be calculated and saved.
 Canonical correlation analysis and canonical
redundancy analysis whose output consists
of eigenvalues, canonical correlations, variance
ratio, chisquare statistic, and standardized
canonical coefficients. Options are provided
for calculating among and between group correlations,
canonical loadings, cross loadings, Stewart
and Love redundancy analysis, orthogonal rotation
of the loadings, and plots of the loadings.
 Cluster analysis using either centroid linkage
with euclidean, chisquare or phisquare distance
measure or K Means clustering with initial
cluster estimation.
 Discriminant analysis can optionally save
the Mahalanobis’ distances of each observation
to each group mean, probabilities for the
Mahalanobis’ distances, classifications,
posterior probabilities, and the group means
and withingroups covariance matrix.
Other types of discriminate analysis include:
 Stepwise addition of the predictor variables
can optionally save the classifications, posterior
probabilities, and the group means, and within
groups, covariance matrix.
 Quadratic discriminant analysis, in which
the data are assumed to come from a population
that has a multivariate normal distribution
but the equality of the covariance matrices
of the groups is not assumed, can optionally
save the classifications, posterior probabilities,
and the group means.
 K nearest neighbor discriminant analysis
is nonparametric and makes no assumption
about the underlying distribution of the data.
Time Series
Analysis
Statit’s time series analysis procedures
include:
 Estimating the parameters of an ARIMA model
(BoxJenkins) and generating forecasts for
seasonal and nonseasonal models.
 Analyzing autoregressive vector models.
This is suitable for forecasting, where typically
one of the coordinates of the time series
is the variable of primary interest and the
others are associated variables which might
aid in the forecast.
 Computing and plotting the autocorrelation
function.
 Computing and plotting seasonal or periodic
averages to assist in identifying seasonal
trends.
 Computing and plotting the crosscorrelation
function.
 Computing the lagged difference of a variable.
 Performing a DifferenceSign test of randomness.
 Computing and plotting the partial autocorrelation
function (used to help identify the AR parameters
for the ARIMA procedure).
 Computing polynomial distributed lag regression,
also known as an Almon lag. A regression is
performed on the dependent variable and its
lags, and optionally, other exogenous variables.
 Performing a test of randomness based on
the ranks of the data for detecting trends
in data.
 Performing one or more of: moving average,
single or double exponential smoothing, Holt’s
two parameter smoothing, Winter’s three
parameter smoothing, and Classical Decomposition
forecasting.
 Performing a test of randomness based on
the number of turning points in the data.
Reliability
And Survival Analysis
These procedures are for the analysis of responsetime
data, also called survival analysis. They include:
 KaplanMeier
estimator of the survival curve from censored
data.
 Cox
regression, which relates response times to
explanatory variables in a way which does
not require specification of the distribution
of the response times.
 Weibull
analysis which offers a one sample procedure
to fit a Weibull distribution to possibly
censored responsetime data, and a regression
procedure for relating response times to explanatory
variables (which could include treatments
and thus be used for twosample problems).
Even though the assumptions are different,
the formulations of models for Weibull and Cox
regression have strong similarities. Either
can be considered as "proportional hazards"
models. For the Weibull case, the hazard function
is assumed to have a simple parametric form,
and for Cox regression this form need not be
specified. Weibull methods will often be more
useful in reliability work and Cox regression
in biostatitics.
