Introduction
Often, statistical analysis requires you to be able to generate descriptive and inferential statistics (and even graphics) separately for different groups. When you need to do so in Stata, the by command is indispensable. In this blog, you’ll learn how to apply Stata’s by command for various analytical and graphical needs.
Load Data
Let’s use Stata’s built-in auto dataset, using the following code to load and then describe the data:
sysuse auto
describe
You can see that the dataset has 12 variables.
Summary Statistics
Let’s say that you’d like a summary of price by car origin. Try the following code:
by foreign, sort: sum price, detail
Here’s what you get:
If you wanted summary statistics on more than one variable using the by command, you would simply list all of the variables before the sum command. For example, if you wanted both price and mpg sorted by car origin, you could use the following code:
by foreign, sort: sum price mpg, detail
Inferential Statistics
Let’s see you wanted to test the correlation between mpg and weight, but you wanted to do so separately by car origin. Try the following code:
by foreign, sort: pwcorr mpg weight, sig
Here’s what you get:
As you can see, the correlation coefficients are different for foreign vs. domestic cars.
Graphics
For most graphics, putting the by command at the end of code can generate group-sorted images. For example, if you wanted separate histograms for price based on car origin, you could try:
hist price, by(foreign)
And you would get:
BridgeText can help you with all of your statistical analysis needs.