Introduction
There is sometimes a need for continuously measured variables to be converted into factor variables for statistical analysis—for example, for independent t tests or ANOVAs. In this blog, we’ll show you how to use R to create factor variables from continuous variables.
Load Data
You can load the cereals dataset in R as follows:
cereal_data <- read.csv("https://r-data.pmagunia.com/system/files/datasets/dataset-51500.csv")
Check the Dataset
head(cereal_data)
These are data on the nutritional values of cereals:
Create a Factor Variable
Let’s say we want to convert the currently continuous variable of calories into a factor variable with three values: Low, moderate, and high in calories.
Try the following code:
cereal_data$calorie.levels <- cut(cereal_data$Calories, breaks = c(0,60,120,160), labels = c('Low','Moderate','High'))
We created a new variable, calorie.levels, designed to serve as the factor variable. Confirm that this new variable is indeed a factor:
is.factor(cereal_data$calorie.levels)
We wanted to ensure that this variable is part of the existing data frame of cereal_data, so we prefaced calorie.levels with cereal_data$. Next, we used the cut command to tell R to mark calories between 0 and 60 as low, 60 to 120 as moderate, and 120 to 160 as high. We used the labels command to specify what to call these levels of the new factor variable. Note what the data frame looks like now:
head(cereal_data)
We now have a factor variable for calorie levels in the data frame. Because this factor variable has three levels (low, moderate, and high), it could be used as the predictor variable for an ANOVA. If the factor variable had had two levels, it could have been used in an independent samples t test. The code above created a three-factor variable. You could also create a two-factor variable using the following code:
cereal_data$calorie.levels.dual <- cut(cereal_data$Calories, breaks = c(0,80,160), labels = c('Low','High'))
head(cereal_data)
Here is what you get with that option:
BridgeText can help you with all of your statistical analysis needs.