Introduction
For purposes of statistical analysis, you will often want to use qualitative labels such as, for the variable of marital status, ‘Single,’ ‘Married,’ or ‘Divorced.’ However, you also want these labels to appear as numbers, rather than string variables, so that you can conduct statistical analysis. In this blog, we’ll show you how to label values in Stata in a manner that lets you see qualitative labels while retaining numbers in the dataset.
Example
Let’s say you have data on the income levels of 150 people who completed their undergraduate studies with one of three majors—English, philosophy, or business. First, let’s create the income data, then show you why you can’t label the individual values using a string format.
set obs 150
drawnorm a, mean(40000) sd(5000)
drawnorm b, mean(70000) sd(10000)
replace a = . in 101/150
replace b = . in 1/100
egen income = rowmax(a b)
drop a b
gen major = "English" in 1/50
replace major = "philosophy" in 51/100
replace major = "business" in 101/150
order major income
Here are the first 20 values:
Note that the red font is Stata’s way of indicating that a variable is a string variable. Now try running an analysis of variance (ANOVA) on the dataset. Stata won’t let you, because you can’t have a string variable in an ANOVA.
Creating Labels
If you’ve already created a string variable that you need to turn into a numeric variable, you can use this code in Stata:
encode major, gen(major2)
Here’s what you get:
Note that you keep your original string variable, but there is now a version of it, major2, that you can use in an ANOVA and other forms of statistical analysis. Try:
anova income major2
However, if you haven’t already created the string variable, what you need is code to create numeric labels for the three majors. Here’s the right approach:
gen major3 = 1
replace major3 = 2 in 51/100
replace major3 = 3 in 101/150
label define major3 1 "English" 2 "Philosophy" 3 "Business"
label value major3 major3
As you can see, you ended up with the very same labels. However, one approach lets you create those labels from scratch, whereas the use of encode lets you convert the string variable, if you already have it.
Let’s clean up this dataset now.
drop major major2
rename major3 major
order major income
BridgeText can help you with all of your statistical analysis needs.