Introduction
Filtering out unnecessary data values can simplify data analysis. In this blog entry, we’ll show you how to use Stata’s keep command to retain only the variables of interest to you in a given dataset.
Create Data
First, we’ll create mock data, then we’ll show you how to use the keep command.
set obs 30
gen subj = _n
label variable subj "Subject #"
gen q1_a = runiform(1,7)
gen q2_a = runiform(1,7)
gen q3_a = runiform(1,7)
gen q4_a = runiform(1,7)
gen q1 = round(q1_a)
gen q2 = round(q2_a)
gen q3 = round(q3_a)
gen q4 = round(q4_a)
drop q1_a q2_a q3_a q4_a
egen total = rowtotal (q1 q2 q3 q4)
list in 1/30
Keep Variables of Interest
Let’s say that you only want the subject number and the total score. Try:
keep subj total
list in 1/30
You are now left with only the subject number and the total score. You might wonder why you should use the keep command instead of using the drop command to delete data. Keep this in mind: If you have a dataset with, say, 200 variables, and you only want to keep two variables, using the keep command for this purpose is a lot more efficient than dropping 198 variables.
BridgeText provides statistical testing, analysis, coding, and interpretation services.