Solutions Company Statit Training Home

Random Samples of Data

Abstract: Random Samples of your data are relatively easy to obtain using some of the builtin functions of Statit

Occasionally you may want to obtain a random sample of your data. For example, you have 1000 cases of a variable and you would like a random sample of approximately 200 or 1/5.

Let's look at a Statit function that would be helpful. ranunfrm(a,b) will give you a random number between "a" and "b" from the uniform distribution. Either or both "a" and "b" can be Statit variables. Using the 1000 cases example, we could:

## Create tmp with 1000 cases with value 100
assi tmp 1000*100
##Create a random value between 0 and 100 for each of the 1000 cases
let ran = ranunfrm(0,tmp)
## Select the some range of those values that would represent about 1/5
select push (ran > 20 and ran <=40)

Now suppose we have samples in our data. Perhaps we have a variable called sample_id where each sample_id has 3 measurements:

 Sample_ID Measurement 1 45 1 56 1 43 2 82 2 93 2 45 3 56 3 44 3 93

The following macro script would select about twenty percent of the samples randomly.

## Group by sample_id and save
group measurement by sample_id /save
## How many samples do we have
let \$case = case(group.mean)
## Create a random value for each of these samples
assi ran = \$case * 100
let ran = ranunfrm(0,ran)
## Identify the samples we don't want by assigning the missing value to them
if ran > 20 then ran = #_sysmiss
## Do a match merge to integrate the two sets of data. See match-merge in the
## Edit -> Data Management Menu
match measurement with ran by sample_id Group.sample_id
## Now use global select, local select or select permanent as you wish
stats measurement /select=(ran != #_sysmiss)

Now, create some data in the workspace and work through the example above to see how it works step by step.

This is only one of many ways to get a random sample. There are many others, as well as several other random number generators that you could use.