Statit Custom QC Statistics
The univariate statistics procedure computes
various univariate statistics: mean, median,
variance, maximum, minimum, Coefficient of variation,
Corrected sum of squares, Geometric mean, Standard
error of the geometric mean, Harmonic mean,
Standard error of harmonic mean, Interquartile
range, Interquartile range of the median, Kurtosis,
Standard error of the median, Midrange, Number
of missing cases, First quartile, Third quartile,
Range, Sample Size, Skewness, Standard deviation,
Standard error of the mean, Sum, Sum of case
weights, Number of valid cases. Percentiles
may also be computed. The results may be displayed
separately for each variable or in summary form
for all variables. The results may also be saved
for use in other calculations.
These statistics are also available as functions
in Statit expressions.
The frequency distribution procedure computes
a frequency distribution for measurement variables.
Rather than computing counts for individual
values, this procedure computes counts for values
that fall into continuous intervals. The output
consists of: lower and upper endpoints of the
intervals, frequency counts, and relative and
The frequency table procedure produces 1-way
to n-way frequency and crosstabulation tables
and multiple response tables.
Frequency tables show the distribution of the
values of a variable with the number of occurrences
of each unique value of the variable. Crosstabulation
tables show combined frequencies for two or
more variables. The results of the crosstabulation
may be saved for later use.
When the Statistics Module is licensed, the
frequency table procedure also performs tests
and computes measures of association. For n-way
tables, it does stratified analysis, computing
statistics within and across strata.
The multi-way univariate statistics procedure
provides a technique for examining various statistics
for dependent or analysis variables among various
groupings in a sample or population. The groupings
are determined by using categorical class variables;
e.g., group the dependent variable GPA by Sex
The default statistics are: frequency count,
mean, standard deviation, and number of valid
cases. The following statistics may be computed:
C.O.V., maximum, mean, midrange, minimum, missing
cases, valid cases, range, standard deviation,
standard error, sum, sum of case weights, and
The Statit tabular report procedure builds tables
of descriptive statistics from classification
variables and analysis variables. Tables are
constructed in up to three dimensions: stub,
banner, and page. The stub (row dimension) and
banner (column dimension) may have multiple
variables, nested or concatenated.
The body of the table is made up of cells,
which contain the information in the tablefrequency
counts, percentages, means, or other statistics.
The cells are defined by the values of the variable,
or combination of variables, for the table.
In a one-dimensional table, the cells are formed
by rows, in a two-dimensional table they are
formed by the intersection of rows and columns,
and in a three-dimensional table, cells are
formed by the intersection of rows, columns,
Statistics for each cell are calculated on
values from all cases defined by that cell.
That is, each value of a classification variable
such as Academic_class, freshman, sophomore,
etc., defines a cell. When calculating statistics
for an analysis variable such as GPA, statistics
are calculated for the values of GPA that correspond
to the different academic classes.
Statit provides procedures to graphically explore
the shapes, patterns, and relationships of your
data. Graphics are available for:
- scatter or curve, contour, bubble, sunflower
- scatterplot matrix
One- And Two-Sample
Statit provides procedures for testing and estimation
in one- or two-sample problems. This includes
both “continuous” responses and exact
tests and other inferences for proportions.
For the one-sample case, a confidence interval
for the population mean is provided, along with
an optional test of an hypothesized mean.
For the two-sample case and the paired-data
case, a test for equal population means is provided
along with confidence limits for the differences
in means. Some diagnostics are provided, indicating
when the procedures may not be appropriate.
In these situations more robust procedures may
be used, such as the Location procedure, which
provides inference about either the population
mean or median; the Dispersion procedure provides
inference about either the population standard
deviation or interquartile range (IQR) of a
population based on a single sample.
The Location and Dispersion procedures include
diagnostics to indicate when methods for normally-distributed
data are not suitable, along with suggestions
as to how to proceed in such cases. An approximation
to the Shapiro-Wilk W test is used to test for
The following procedures are also available
- Compare Location provides inferences comparing
either the population means, medians, or geometric
- Compare Dispersion provides inferences comparing
either the population standard deviations
or the interquartile ranges.
- Guided Compare provides “interactive”
measure of location for two samples with guidance.
Each of these procedures includes diagnostics
to indicate when methods for normally distributed
data are not suitable, and suggestions as to
how to proceed in such cases. The following
rank methods are included:
- Wilcoxon test for comparing two independent
- Sign and Signed Rank test for paired data
- Median test for two independent samples
- Runs test
- Kolmorogov-Smirnov test for comparing two
Statit provides enumerative data procedures
- Binomial data which includes both one- and
two-sample applications and regression models.
The binomial regression performs maximum likelihood
fitting of regression models where the data
are proportions, following the binomial distribution,
using logistic (logit) or probit models.
- Poisson Regression for maximum likelihood
fitting using a loglinear model.
- Contingency tables, including one-way to
n-way frequency and crosstabulation tables
and multiple response tables.
For n-way tables, Statit does stratified analysis,
computing statistics within and across strata.The
following statistics can be requested:
- Likelihood Ratio Chi-square
- Mantel-Haenszel Chi-square
- Phi Coefficient
- Contingency Coefficient
- Cramer's V
For 2 X 2 tables, the following are also computed:
- Continuity Adjusted Chi-square
- Fisher Exact Test (l-tail and 2-tail)
- McNemar's Test (+ continuity adjusted)
For tests across strata, the Cochran-Mantel-Haenszel
correlation statistic (df=l) may be computed
for an n-way table. If all of the tables are
2 X 2, then summary estimates of the relative
risk are also computed.
The following measures of association and their
asymptotic standard error can be requested:
- Gamma Kendall’s Tau b
- Stuart’s Tau c
- Somers’ D
- Pearson's Correlation
- Lambda Asymmetric
- Uncertainty Coefficient
- Uncertainty Coefficient Symmetric
For 2 X 2 tables, relative risk estimates plus
confidence intervals are computed. Also, loglinear
models may be fitted via:
- The Parameter Estimates procedure which
uses a Newton-Raphson method to find parameter
estimates and standard errors for such models.
- The Fitted Values procedure which uses iterative
proportional fitting and does not give parameter
estimates. It is mainly used to determine
whether interactions are significant, and
to fit models assuming specified higher order
interactions are absent.
Analysis Of Variance
Statit provides several parametric and nonparametric
procedures for analysis of variance.
- The one-way procedure includes the post-hoc
tests: Fisher's LSD, Tukey's W, Newman-Keuls,
Duncan's New Multiple Range and Scheffe's
- N-way factorial designs with either balanced
or unbalanced data, provided there are no
- Repeated measures such as split-plot and
changeover designs with either balanced or
unbalanced cell sizes; missing cells are not
- Analysis of Covariance for a oneway treatment
design and one numerical covariable.
- The General Linear Models procedure provides
for the use of regression models with factors
specified by matrices; each matrix containing
one or more columns of covariables; also provides
for both univariate and multivariate analysis.
- Kruskal-Wallis one-way rank ANOVA.
- Friedman ANOVA by ranks for randomized block
designs, including Kendall’s coefficient
- Cochran’s Q test for matched frequencies.
Statit provides both parametric and nonparametric
procedures for computing correlation analysis.
The Pearson product-moment and Spearman rank
order correlation coefficients are calculated.
Options for calculating t-tests and computing
with case weights are also provided. Correlation
matrices may be saved and used as input into
Statit provides procedures for simple, multiple,
stepwise, all possible subset, binomial, Poisson,
Weibull, and nonlinear regression.
Statit’s simple and multiple linear regression
models use least squares or weighted least squares
methods. Optional statistics and output for
simple regression include:
- Beta covariance and correlation matrices,
variance inflation factor, partial correlations,
and semi-partial correlations
- Collinearity diagnostics
- Influence statistics: residual, standard
error of residual, Studentized residual, Studentized
residual with current observation deleted,
Cook’s D influence statistic, leverage,
Durbin-Watson, sum of residuals, sum of squared
residuals, press statistic, and the minimum
and maximum residual
- Predicted diagnostics: predicted value,
standard error of the individual predicted
value, standard error of the mean predicted
value, 95% confidence intervals for individual
and mean predicted value
Statit's stepwise multiple regression includes
weighted least squares, using either the forward
selection, backward elimination, stepwise, or
maximum R2 method. Options include
those for simple regression and also Mallows'
Statit's graphical diagnostics for multiple
- Partial residual plots for detecting nonlinearity.
- Leverage plots for detecting observations
which may be having inordinate influence on
the regression fitting.
- Residual analysis which displays either
the fitted values or any one of the independent
variables plotted against any one of: Cook’s
D, leverage values, predicted values, or various
versions of the residuals (standardized, studentized,
studentized based on deletion, etc.).
- Ridge trace analysis which shows how regression
coefficients change in “ridge regression”
as the value of the “ridge parameter”
- Linear and Polynomial display the ordinary
least squares line of Y with X, X2,
X3, or X4 superimposed
over a scatterplot of the data.
Statit’s binomial regression performs
maximum likelihood fitting of regression models
where the data are proportions, following the
binomial distribution, using logistic (logit)
or probit models.
Statit's Poisson regression performs maximum
likelihood fitting of regression models where
the response is a Poisson variable, using a
Statit's nonlinear regression fits models by
least squares or weighted least squares using
one of four methods: Gauss-Newton, modified
Gauss-Newton, Marquardt, or DUD (doesn't use
derivatives). Grid searches for initial estimates
may be requested as well as specifying a loss
function to be minimized.
Statit's all possible subsets regression is
performed using one of four methods: maximizing
R2, maximizing adjusted R2,
minimizing mean square error, or minimizing
Statit provides a variety of multivariate analysis
- Multivariate analysis of variance, including
repeated measures and profile analysis.
- Principal components analysis which provides
standardized or unstandardized principal component
- Factor analysis which provides five methods
of factor extraction: principal components,
iterated principal components, image, alpha
factor analysis, and principal factor analysis.
A scree plot and Bartlett's sphericity test
are also available. There are three methods
of orthogonal rotation: varimax, equamax,
- The promax oblique rotation is also available.
Plots of all loadings and rotated loadings
can be requested.
- Factor scores can be calculated and saved.
- Canonical correlation analysis and canonical
redundancy analysis whose output consists
of eigenvalues, canonical correlations, variance
ratio, chi-square statistic, and standardized
canonical coefficients. Options are provided
for calculating among and between group correlations,
canonical loadings, cross loadings, Stewart
and Love redundancy analysis, orthogonal rotation
of the loadings, and plots of the loadings.
- Cluster analysis using either centroid linkage
with euclidean, chi-square or phi-square distance
measure or K Means clustering with initial
- Discriminant analysis can optionally save
the Mahalanobis’ distances of each observation
to each group mean, probabilities for the
Mahalanobis’ distances, classifications,
posterior probabilities, and the group means
and within-groups covariance matrix.
Other types of discriminate analysis include:
- Stepwise addition of the predictor variables
can optionally save the classifications, posterior
probabilities, and the group means, and within
groups, covariance matrix.
- Quadratic discriminant analysis, in which
the data are assumed to come from a population
that has a multivariate normal distribution
but the equality of the covariance matrices
of the groups is not assumed, can optionally
save the classifications, posterior probabilities,
and the group means.
- K nearest neighbor discriminant analysis
is non-parametric and makes no assumption
about the underlying distribution of the data.
Statit’s time series analysis procedures
- Estimating the parameters of an ARIMA model
(Box-Jenkins) and generating forecasts for
seasonal and nonseasonal models.
- Analyzing auto-regressive vector models.
This is suitable for forecasting, where typically
one of the coordinates of the time series
is the variable of primary interest and the
others are associated variables which might
aid in the forecast.
- Computing and plotting the autocorrelation
- Computing and plotting seasonal or periodic
averages to assist in identifying seasonal
- Computing and plotting the cross-correlation
- Computing the lagged difference of a variable.
- Performing a Difference-Sign test of randomness.
- Computing and plotting the partial autocorrelation
function (used to help identify the AR parameters
for the ARIMA procedure).
- Computing polynomial distributed lag regression,
also known as an Almon lag. A regression is
performed on the dependent variable and its
lags, and optionally, other exogenous variables.
- Performing a test of randomness based on
the ranks of the data for detecting trends
- Performing one or more of: moving average,
single or double exponential smoothing, Holt’s
two parameter smoothing, Winter’s three
parameter smoothing, and Classical Decomposition
- Performing a test of randomness based on
the number of turning points in the data.
And Survival Analysis
These procedures are for the analysis of response-time
data, also called survival analysis. They include:
estimator of the survival curve from censored
regression, which relates response times to
explanatory variables in a way which does
not require specification of the distribution
of the response times.
analysis which offers a one sample procedure
to fit a Weibull distribution to possibly
censored response-time data, and a regression
procedure for relating response times to explanatory
variables (which could include treatments
and thus be used for two-sample problems).
Even though the assumptions are different,
the formulations of models for Weibull and Cox
regression have strong similarities. Either
can be considered as "proportional hazards"
models. For the Weibull case, the hazard function
is assumed to have a simple parametric form,
and for Cox regression this form need not be
specified. Weibull methods will often be more
useful in reliability work and Cox regression