Introduction
A dummy variable designates subgroups within your analysis, typically based on 0 and 1 values. In this blog, we’ll how you how to create dummy variables from both continuous variables and binary strings in R.
Load Data
Let’s enter some data into R to experiment on. You can copy and paste the following code into R:
subject <- 1:20
gender <- c(rep('male',10),rep('female',10))
books <- c(12, 15, 0, 23, 18, 10, 10, 9, 8, 10,
25, 21, 13, 18, 3, 21, 22, 12, 12, 11)
df <- data.frame(subject, gender, books)
print(df)
We can designate these as data on the number of books read by 20 subjects, 10 male and 10 female, in 2022.
Create Dummy Variable from Binary String Variable
If we want to run a statistical analysis such as an ordinary least squares regression on the number of books as a function of gender, we cannot, as gender is a string variable. Confirm by trying the following code:
lmbooks = lm(books ~ gender, data = df)
lmbooks
We could, however, turn the string variable of gender into a dummy variable with male = 0, female = 1. Thereafter, the data would be amenable to regression analysis.
Try the following code:
df$gender.dummy<-ifelse(df$gender=="female",1,0)
print(df)
As you can see, the dataset now contains a new dummy variable in which females = 1, males = 0:
We could now run that regression:
lmbooks = lm(books ~ gender.dummy, data = df)
summary(lmbooks)
Here’s what you get:
Because we coded females as 1 and males as 0, we see that women read 4.3 books more than men, but this effect of gender is not significant, p = .154.
Create Dummy Variable from Continuous Variable
Still working with the previous data, let’s say we want to create a new dummy variable from the continuous variable of books. We want to designate 0 to 10 books as 0 and 11 and above books as 1. We can use the following R code to create a dummy variable accordingly:
df$books.dummy<-ifelse(df$books > 10,1,0)
print(df)
Here’s what your dataset looks like now:
You were successfully able to create a new dummy variable based on your cutoff value of 10 books.
BridgeText can help you with all of your statistical analysis needs.