Introduction
Earlier, we showed you what correlation looks like on scatter plots and described how the Pearson correlation coefficient, r, can vary from -1 to 1. In this post, we’ll provide some details on how to enter data, get the correlation coefficient, and get the p (significance) value for the correlation coefficient in Stata.
What You’ll Need
Any flavor of Stata.
Entering Data
You can enter data into Stata using the prompt command window, or you can go into the data editor view to enter data as you would in Microsoft Excel. The easiest approaches are to enter data into the editor view or import data from Excel.
Here, let’s show you how to enter data using the command prompt.
Let’s say you have data on the heights (in inches) and weights (in pounds) of 15 people. The heights, in sequential order of your 15 subjects, are as follows: 67, 72, 75, 80, 60, 65, 68, 69, 69, 70, 70, 80, 76, 60, 60. The weights, in the same sequential order of the 15 subjects, are as follows: 150, 240, 270, 300, 160, 180, 170, 175, 175, 190, 190, 260, 240, 140, 130. Try typing these values into the data editor. Alternatively, paste this code into the command prompt to load the data:
set obs 15
gen height = .
gen weight = .
replace height = 67 in 1
replace height = 72 in 2
replace height = 75 in 3
replace height = 80 in 4
replace height = 60 in 5
replace height = 65 in 6
replace height = 68 in 7
replace height = 69 in 8
replace height = 69 in 9
replace height = 70 in 10
replace height = 70 in 11
replace height = 80 in 12
replace height = 76 in 13
replace height = 60 in 14
replace height = 60 in 15
replace weight = 150 in 1
replace weight = 240 in 2
replace weight = 270 in 3
replace weight = 300 in 4
replace weight = 160 in 5
replace weight = 180 in 6
replace weight = 170 in 7
replace weight = 175 in 8
replace weight = 175 in 9
replace weight = 190 in 10
replace weight = 190 in 11
replace weight = 260 in 12
replace weight = 240 in 13
replace weight = 140 in 14
replace weight = 130 in 15
Getting the R Value
Having entered these variables into Stata, you can use the following code to generate your correlation coefficient and its accompanying p value:
pwcorr height weight, sig
Therefore, r = .9156, p < .0001. These two variables are positively and significantly correlated. Remember, you can square your r value to get the coefficient of determination, which happens to be 0.8383. In other words, in your dataset, (0.9156)^2 or approximately 83.83% of the variation in weight is explained by variation in height.
A Simple Scatterplot
If you want to create a scatterplot with height as the x axis and weight as the y axis, you can enter the following code into Stata:
scatter weight height
The variable that comes second after you type scatterplot (in this case, height) will be on the x axis. You can adjust many of the properties of the scatterplot by opening up the graph editor function within Stata.
Conclusion
Entering data manually into the Stata command prompt is risky. You can mistype a number or lose your sequence. For most practical purposes, therefore, you will want to enter data into Stata through Stata’s own data editor or by importing a Microsoft Excel spreadsheet.
BridgeText can help you with all of your statistical analysis needs.