At Statit Software we get a lot of questions
having to do with importing spreadsheets into
Statit. Statit does a good job of importing
these files, however that data may be in a format
that can make it difficult to analyze or chart
the data. What I would like to do is give a
few tips on up-front formatting or designing
of the spreadsheet to make your life easier.
A few minutes thinking about how you will use
the data can make all the difference in the
world.
A primary idea to consider is the way Statit
sees data. Statit views columns of data as variables
and rows of data as cases or observations. Remembering
this will help you to design your spreadsheets
so it is easier to do the analysis. Thinking
of each row as an individual observation in
time will save you time. I will illustrate this
in a few examples.
Statit QC procedures take one data variable
as input for the measures. Each case of this
variable is one observation. In Variable charts,
you might group the observations based on another
variable such as month or doctor to create subgroups,
but each case is still a single observation
taken at that point in time. Other variables
can be used for X axis id and things like comments
or phases.
Spreadsheet formats
A problem we often see is measures by month,
with each month in a separate column. In this
case each row has several observations of the
same measure, one for each month. This violates
the single observation rule and requires some
work either in the spreadsheet or in Statit
to concatenate the different columns into a
single variable so that we have a proper input
variable to QC procs. We can alleviate this
by remembering the one row/one observation rule.
Figure 1 gives an example of a spreadsheet that
will make analysis
unnecessarily difficult.

Figure 1
A good rule of thumb is to have one column
for a date, one column for the observation on
that date. Depending on the data you may also
need a second measure column such as for subgroup
size or denominator. The spreadsheet in Figure
2 illustrates this.

Figure 2
Another situation that occurs is maintaining
the same type of measure in one spreadsheet
but with separate columns for different groups
such as location. In this case, you would not
expect to chart the different locations on the
same chart but would chart each group separately.

Figure 3
If you did find you needed to plot all the
data in one chart, you will need to manipulate
the data in some way. In the case illustrated
above you would probably want the sum of the
measure. In Statit Express QC, you would create
these sums in the spreadsheet by creating two
new columns, one with the sum of the numerators
and one with the sum of the denominators. In
other Statit products you also have the option
of creating the variables in the Statit workspace
as you need them by summing the Num variables
into Tot_Num and summing the Den variables into
Tot_Den.
As another example, suppose you had p chart
data in a spreadsheet such as the following:

Figure 4
If you wanted a p chart of Measure = Floor2
and were using Statit Express QC, you would
need to rearrange your spreadsheet to something
similar to Figure 3. Express does not have the
select capability to choose which subset of
measures you want to graph. However, the other
Statit products allow either a global selection
or a local selection. This format of spreadsheet
is workable in these situations, but it still
makes it a bit more difficult to aggregate the
data so that you can produce a p chart of both
Floor2 and Floor3 data.
Data Formatting
Notice that in both the sample spreadsheets
above, the Date Column is formatted as a date
and the other data variables are entered as
numeric. This is very important. For the data
variable, Statit requires numeric data. If the
spreadsheet does not have the measure column
formatted as numeric, you would need to redo
the spreadsheet for Statit Express QC or manipulate
the data in the other Statit products.
It should be pointed out that changing the
column format in a spreadsheet, does not ensure
that the column contains numeric values. Be
sure to check this as well.
For any type of sorting on date, the date variable
needs to be numeric. Otherwise you may have
April coming before January. This is a simple
thing, but something that can be forgotten.
Statit has several date formats you can use
to display the date as you would like. Since
the data above is taken on a monthly basis,
you would probably want to format it in a Mon-YY
(Jan_05) type format. Notice also that the actual
date (the first of the month) will not make
much difference if the date is formatted as
Mon-YY.
Additionally, it is always a good idea to have
the spreadsheet sorted in the manner that you
would use the data. Although this step can be
done in Statit, it will actually help in the
process of entering data in the spreadsheet.
We often see spreadsheets with these issues.
It is important, when designing a spreadsheet
to consider how you expect to use the spreadsheet
and the data within it. This is true whether
the data comes from a spreadsheet or from an
industrial strength database. If you cant
easily get the information you need and use
it effectively, no amount of normalized database
design will help you.
If you would like additional information, please
call our Support staff at (541) 752-4100 or
send email to
.