Solutions Company Statit Training Home

Statit Custom QC Statistics

Descriptive Statistics

Univariate Statistics
The univariate statistics procedure computes various univariate statistics: mean, median, variance, maximum, minimum, Coefficient of variation, Corrected sum of squares, Geometric mean, Standard error of the geometric mean, Harmonic mean, Standard error of harmonic mean, Interquartile range, Interquartile range of the median, Kurtosis, Standard error of the median, Midrange, Number of missing cases, First quartile, Third quartile, Range, Sample Size, Skewness, Standard deviation, Standard error of the mean, Sum, Sum of case weights, Number of valid cases. Percentiles may also be computed. The results may be displayed separately for each variable or in summary form for all variables. The results may also be saved for use in other calculations.

These statistics are also available as functions in Statit expressions.

Frequency Distributions
The frequency distribution procedure computes a frequency distribution for measurement variables. Rather than computing counts for individual values, this procedure computes counts for values that fall into continuous intervals. The output consists of: lower and upper endpoints of the intervals, frequency counts, and relative and cumulative percentages.

Frequency Tables
The frequency table procedure produces 1-way to n-way frequency and crosstabulation tables and multiple response tables.

Frequency tables show the distribution of the values of a variable with the number of occurrences of each unique value of the variable. Crosstabulation tables show combined frequencies for two or more variables. The results of the crosstabulation may be saved for later use.

When the Statistics Module is licensed, the frequency table procedure also performs tests and computes measures of association. For n-way tables, it does stratified analysis, computing statistics within and across strata.

Multi-Way Univariate Statistics
The multi-way univariate statistics procedure provides a technique for examining various statistics for dependent or analysis variables among various groupings in a sample or population. The groupings are determined by using categorical class variables; e.g., group the dependent variable GPA by Sex and Class.

The default statistics are: frequency count, mean, standard deviation, and number of valid cases. The following statistics may be computed: C.O.V., maximum, mean, midrange, minimum, missing cases, valid cases, range, standard deviation, standard error, sum, sum of case weights, and variance.

Tabular Reporting
The Statit tabular report procedure builds tables of descriptive statistics from classification variables and analysis variables. Tables are constructed in up to three dimensions: stub, banner, and page. The stub (row dimension) and banner (column dimension) may have multiple variables, nested or concatenated.

The body of the table is made up of cells, which contain the information in the tablefrequency counts, percentages, means, or other statistics. The cells are defined by the values of the variable, or combination of variables, for the table. In a one-dimensional table, the cells are formed by rows, in a two-dimensional table they are formed by the intersection of rows and columns, and in a three-dimensional table, cells are formed by the intersection of rows, columns, and pages.

Statistics for each cell are calculated on values from all cases defined by that cell. That is, each value of a classification variable such as Academic_class, freshman, sophomore, etc., defines a cell. When calculating statistics for an analysis variable such as GPA, statistics are calculated for the values of GPA that correspond to the different academic classes.

Graphics Capabilities
Statit provides procedures to graphically explore the shapes, patterns, and relationships of your data. Graphics are available for:

  • pie
  • bar
  • histogram
  • dot
  • box
  • probability
  • percentile
  • scatter or curve, contour, bubble, sunflower
  • scatterplot matrix

Inferential Statistics

One- And Two-Sample Inference
Statit provides procedures for testing and estimation in one- or two-sample problems. This includes both “continuous” responses and exact tests and other inferences for proportions.

For the one-sample case, a confidence interval for the population mean is provided, along with an optional test of an hypothesized mean.

For the two-sample case and the paired-data case, a test for equal population means is provided along with confidence limits for the differences in means. Some diagnostics are provided, indicating when the procedures may not be appropriate. In these situations more robust procedures may be used, such as the Location procedure, which provides inference about either the population mean or median; the Dispersion procedure provides inference about either the population standard deviation or interquartile range (IQR) of a population based on a single sample.

The Location and Dispersion procedures include diagnostics to indicate when methods for normally-distributed data are not suitable, along with suggestions as to how to proceed in such cases. An approximation to the Shapiro-Wilk W test is used to test for normality.

The following procedures are also available for two-samples:

  • Compare Location provides inferences comparing either the population means, medians, or geometric means.
  • Compare Dispersion provides inferences comparing either the population standard deviations or the interquartile ranges.
  • Guided Compare provides “interactive” measure of location for two samples with guidance.

Each of these procedures includes diagnostics to indicate when methods for normally distributed data are not suitable, and suggestions as to how to proceed in such cases. The following rank methods are included:

  • Wilcoxon test for comparing two independent samples
  • Sign and Signed Rank test for paired data
  • Median test for two independent samples
  • Runs test
  • Kolmorogov-Smirnov test for comparing two samples

Enumerative Data
Statit provides enumerative data procedures for:

  • Binomial data which includes both one- and two-sample applications and regression models. The binomial regression performs maximum likelihood fitting of regression models where the data are proportions, following the binomial distribution, using logistic (logit) or probit models.
  • Poisson Regression for maximum likelihood fitting using a loglinear model.
  • Contingency tables, including one-way to n-way frequency and crosstabulation tables and multiple response tables.

For n-way tables, Statit does stratified analysis, computing statistics within and across strata.The following statistics can be requested:

  • Chi-square
  • Likelihood Ratio Chi-square
  • Mantel-Haenszel Chi-square
  • Phi Coefficient
  • Contingency Coefficient
  • Cramer's V

For 2 X 2 tables, the following are also computed:

  • Continuity Adjusted Chi-square
  • Fisher Exact Test (l-tail and 2-tail)
  • McNemar's Test (+ continuity adjusted)

For tests across strata, the Cochran-Mantel-Haenszel correlation statistic (df=l) may be computed for an n-way table. If all of the tables are 2 X 2, then summary estimates of the relative risk are also computed.

The following measures of association and their asymptotic standard error can be requested:

  • Gamma Kendall’s Tau b
  • Stuart’s Tau c
  • Somers’ D
  • Pearson's Correlation
  • Lambda Asymmetric
  • Uncertainty Coefficient
  • Uncertainty Coefficient Symmetric

For 2 X 2 tables, relative risk estimates plus confidence intervals are computed. Also, loglinear models may be fitted via:

  • The Parameter Estimates procedure which uses a Newton-Raphson method to find parameter estimates and standard errors for such models.
  • The Fitted Values procedure which uses iterative proportional fitting and does not give parameter estimates. It is mainly used to determine whether interactions are significant, and to fit models assuming specified higher order interactions are absent.

Analysis Of Variance
Statit provides several parametric and nonparametric procedures for analysis of variance.

  • The one-way procedure includes the post-hoc tests: Fisher's LSD, Tukey's W, Newman-Keuls, Duncan's New Multiple Range and Scheffe's S.
  • N-way factorial designs with either balanced or unbalanced data, provided there are no empty cells.
  • Repeated measures such as split-plot and changeover designs with either balanced or unbalanced cell sizes; missing cells are not supported.
  • Analysis of Covariance for a oneway treatment design and one numerical covariable.
  • The General Linear Models procedure provides for the use of regression models with factors specified by matrices; each matrix containing one or more columns of covariables; also provides for both univariate and multivariate analysis.
  • Kruskal-Wallis one-way rank ANOVA.
  • Friedman ANOVA by ranks for randomized block designs, including Kendall’s coefficient of concordance.
  • Cochran’s Q test for matched frequencies.

Correlation Analysis
Statit provides both parametric and nonparametric procedures for computing correlation analysis. The Pearson product-moment and Spearman rank order correlation coefficients are calculated.

Options for calculating t-tests and computing with case weights are also provided. Correlation matrices may be saved and used as input into other procedures.

Regression Analysis
Statit provides procedures for simple, multiple, stepwise, all possible subset, binomial, Poisson, Weibull, and nonlinear regression.

Statit’s simple and multiple linear regression models use least squares or weighted least squares methods. Optional statistics and output for simple regression include:

  • Beta covariance and correlation matrices, variance inflation factor, partial correlations, and semi-partial correlations
  • Collinearity diagnostics
  • Influence statistics: residual, standard error of residual, Studentized residual, Studentized residual with current observation deleted, Cook’s D influence statistic, leverage, Durbin-Watson, sum of residuals, sum of squared residuals, press statistic, and the minimum and maximum residual
  • Predicted diagnostics: predicted value, standard error of the individual predicted value, standard error of the mean predicted value, 95% confidence intervals for individual and mean predicted value

Statit's stepwise multiple regression includes weighted least squares, using either the forward selection, backward elimination, stepwise, or maximum R2 method. Options include those for simple regression and also Mallows' Cp.

Statit's graphical diagnostics for multiple regression include:

  • Partial residual plots for detecting nonlinearity.
  • Leverage plots for detecting observations which may be having inordinate influence on the regression fitting.
  • Residual analysis which displays either the fitted values or any one of the independent variables plotted against any one of: Cook’s D, leverage values, predicted values, or various versions of the residuals (standardized, studentized, studentized based on deletion, etc.).
  • Ridge trace analysis which shows how regression coefficients change in “ridge regression” as the value of the “ridge parameter” is increased.
  • Linear and Polynomial display the ordinary least squares line of Y with X, X2, X3, or X4 superimposed over a scatterplot of the data.

Statit’s binomial regression performs maximum likelihood fitting of regression models where the data are proportions, following the binomial distribution, using logistic (logit) or probit models.

Statit's Poisson regression performs maximum likelihood fitting of regression models where the response is a Poisson variable, using a loglinear model.

Statit's nonlinear regression fits models by least squares or weighted least squares using one of four methods: Gauss-Newton, modified Gauss-Newton, Marquardt, or DUD (doesn't use derivatives). Grid searches for initial estimates may be requested as well as specifying a loss function to be minimized.

Statit's all possible subsets regression is performed using one of four methods: maximizing R2, maximizing adjusted R2, minimizing mean square error, or minimizing Mallows’ Cp.

Multivariate Analysis
Statit provides a variety of multivariate analysis procedures:

  • Multivariate analysis of variance, including repeated measures and profile analysis.
  • Principal components analysis which provides standardized or unstandardized principal component scores.
  • Factor analysis which provides five methods of factor extraction: principal components, iterated principal components, image, alpha factor analysis, and principal factor analysis. A scree plot and Bartlett's sphericity test are also available. There are three methods of orthogonal rotation: varimax, equamax, and quartimax.
  • The promax oblique rotation is also available. Plots of all loadings and rotated loadings can be requested.
  • Factor scores can be calculated and saved.
  • Canonical correlation analysis and canonical redundancy analysis whose output consists of eigenvalues, canonical correlations, variance ratio, chi-square statistic, and standardized canonical coefficients. Options are provided for calculating among and between group correlations, canonical loadings, cross loadings, Stewart and Love redundancy analysis, orthogonal rotation of the loadings, and plots of the loadings.
  • Cluster analysis using either centroid linkage with euclidean, chi-square or phi-square distance measure or K Means clustering with initial cluster estimation.
  • Discriminant analysis can optionally save the Mahalanobis’ distances of each observation to each group mean, probabilities for the Mahalanobis’ distances, classifications, posterior probabilities, and the group means and within-groups covariance matrix.

Other types of discriminate analysis include:

  • Stepwise addition of the predictor variables can optionally save the classifications, posterior probabilities, and the group means, and within groups, covariance matrix.
  • Quadratic discriminant analysis, in which the data are assumed to come from a population that has a multivariate normal distribution but the equality of the covariance matrices of the groups is not assumed, can optionally save the classifications, posterior probabilities, and the group means.
  • K nearest neighbor discriminant analysis is non-parametric and makes no assumption about the underlying distribution of the data.

Time Series Analysis
Statit’s time series analysis procedures include:

  • Estimating the parameters of an ARIMA model (Box-Jenkins) and generating forecasts for seasonal and nonseasonal models.
  • Analyzing auto-regressive vector models. This is suitable for forecasting, where typically one of the coordinates of the time series is the variable of primary interest and the others are associated variables which might aid in the forecast.
  • Computing and plotting the autocorrelation function.
  • Computing and plotting seasonal or periodic averages to assist in identifying seasonal trends.
  • Computing and plotting the cross-correlation function.
  • Computing the lagged difference of a variable.
  • Performing a Difference-Sign test of randomness.
  • Computing and plotting the partial autocorrelation function (used to help identify the AR parameters for the ARIMA procedure).
  • Computing polynomial distributed lag regression, also known as an Almon lag. A regression is performed on the dependent variable and its lags, and optionally, other exogenous variables.
  • Performing a test of randomness based on the ranks of the data for detecting trends in data.
  • Performing one or more of: moving average, single or double exponential smoothing, Holt’s two parameter smoothing, Winter’s three parameter smoothing, and Classical Decomposition forecasting.
  • Performing a test of randomness based on the number of turning points in the data.

Reliability And Survival Analysis
These procedures are for the analysis of response-time data, also called survival analysis. They include:

  • Kaplan-Meier estimator of the survival curve from censored data.
  • Cox regression, which relates response times to explanatory variables in a way which does not require specification of the distribution of the response times.
  • Weibull analysis which offers a one sample procedure to fit a Weibull distribution to possibly censored response-time data, and a regression procedure for relating response times to explanatory variables (which could include treatments and thus be used for two-sample problems).

Even though the assumptions are different, the formulations of models for Weibull and Cox regression have strong similarities. Either can be considered as "proportional hazards" models. For the Weibull case, the hazard function is assumed to have a simple parametric form, and for Cox regression this form need not be specified. Weibull methods will often be more useful in reliability work and Cox regression in biostatitics.

Midas+ Statit™ Solutions Group | © MidasPlus Inc., All rights reserved | Privacy Policy