Introduction
There are many reasons you might need to be able to generate sequential data in a dataset. In this blog, we’ll show you how to do so in R and provide a contextual example.
Subjects: An Example
Imagine that you’ve copied in data from 200 subjects without actually numbering subjects. For instance, imagine that you have height and weight data for 200 subjects, but no subject numbers. Note that we used the following code to create this data, which, in itself, shows you how to use R to generate normally distributed numbers with specified means and standard deviations:
height<-rnorm(200, mean=70, sd=3)
weight<-rnorm(200, mean=170, sd=15)
Understandably, you don’t want to create a variable called subject and type in 1 to 200 by hand. Fortunately, R has powerful data generation capabilities that can help. Simply type in the following code:
subj<-1:200
What you will want to do, if you haven’t done so already, is to knit these three single variables together into what R calls a data frame:
heightweight<-data.frame(height,weight,subj)
If you want to see what these data look like, type:
print(heightweight)
Let’s say that, for whatever reason, your subject numbering doesn’t run sequentially from 1 to n, with n being the total number of subjects. Let’s say your subject numbering starts at 10 and ends at 210. If that’s the case, you would code the variable of subject in this way:
subj<-10:210
What if, for an equally unusual reason, your numbering goes by interval of 10? How could you number subjects 10, 20, 30, etc.? You could try this code:
subj<-seq(10,2000, by = 10)
In future blogs, we will explore how R can be used for more varied and complex forms of data creation.
BridgeText can help you with all of your statistical analysis needs.