Introduction
In the language of statistical software, string variables are variables encoded in names rather than numbers. In writing your academic essay, research paper, or thesis, you might be using string data that subsequently create difficulty when you attempt to run a statistical analysis. In this blog, we’ll show you how to use Stata to overcome this difficulty.
Example Data and Code
Here’s an easy example. You have data from 30 subjects, 15 men and 15 women, pertaining to the number of dates each subject has gone on in the past 12 months. Perhaps your goal is to conduct an independent samples t-test in order to measure whether, as you strongly suspect, women and men differ in how many dates they tend to go on.
However, in order to conduct an independent samples t-test, you need your predictor variable (here, gender) to be in numeric rather than string format. Unfortunately, you have your gender values in strong format, as you wrote Male or Female for each subject. Here’s an example of what that dataset might look like in Stata:
set obs 30
gen gender = "Male"
replace gender = "Female" in 16/30
gen dates2 = runiform(0,20)
gen dates = round(dates2)
drop dates2
label variable dates "Number of Dates"
Stata makes it easy to recode a string variable like gender into a numeric variable with the same values. Assuming you’ve already used the code above to create the mock dataset, try adding the following code into the Stata command prompt and pressing Enter:
encode gender, gen(newgen)
drop gender
rename newgen gender
label variable gender "Gender"
ttest dates, by(gender)
ciplot dates, by(gender)
Here, the encode command turns your previous string variable, gender, into a new numerically coded variable, newgen. After that transformation, you can drop the string variable of gender, rename your new variable, label it, run your independent samples t-test and generate your 95% confidence interval plot.
Model Results
As it turned out, there was no statistically significant difference between the number of women’s dates (M = 10.87, SD = 6.06) and the number of men’s dates (M = 8.93, SD = 6.15), t(28) = 0.87, p = .39.
In order to get these results, you had to be able to recode your previous string variable of gender into a new numeric version of the same variable. Note how easily Stata was able to achieve this goal.
BridgeText can help you with all of your statistical analysis needs.