
Occasionally you may want to obtain a random
sample of your data. For example, you have 1000
cases of a variable and you would like a random
sample of approximately 200 or 1/5.
Let's look at a Statit function that would
be helpful. ranunfrm(a,b) will give you a random
number between "a" and "b"
from the uniform distribution. Either or both
"a" and "b" can be Statit
variables. Using the 1000 cases example, we
could:
##
Create tmp with 1000 cases with value 100
assi tmp 1000*100
##Create
a random value between 0 and 100 for each of
the 1000 cases
let ran = ranunfrm(0,tmp)
##
Select the some range of those values that would
represent about 1/5
select push (ran > 20 and ran <=40)
Now suppose we have samples in our data. Perhaps
we have a variable called sample_id where each
sample_id has 3 measurements:
| Sample_ID |
Measurement |
| 1 |
45 |
| 1 |
56 |
| 1 |
43 |
| 2 |
82 |
| 2 |
93 |
| 2 |
45 |
| 3 |
56 |
| 3 |
44 |
| 3 |
93 |
The following macro script would select about
twenty percent of the samples randomly.
##
Group by sample_id and save
group measurement by sample_id /save
##
How many samples do we have
let $case = case(group.mean)
##
Create a random value for each of these samples
assi ran = $case * 100
let ran = ranunfrm(0,ran)
##
Identify the samples we don't want by assigning
the missing value to them
if ran > 20 then ran = #_sysmiss
##
Do a match merge to integrate the two sets of
data. See match-merge in the
## Edit -> Data Management Menu
match measurement with ran by sample_id Group.sample_id
##
Now use global select, local select or select
permanent as you wish
stats measurement /select=(ran != #_sysmiss)
Now, create some data in the workspace and
work through the example above to see how it
works step by step.
This is only one of many ways to get a random
sample. There are many others, as well as several
other random number generators that you could
use.
|