Introduction
Earlier, we showed you what correlation looks like on scatter plots and described how the Pearson correlation coefficient, r, can vary from -1 to 1. In this post, we’ll provide some details on how to manually enter data, get the correlation coefficient, and get the p (significance) value for the correlation coefficient in R.
What You’ll Need
Entering Data Manually
In the simplest scenario, you can enter data into R manually, using the console box at the bottom left of your RStudio window:
You can start typing where the cursor is.
Let’s say you have data on the heights (in inches) and weights (in pounds) of 15 people. The heights, in sequential order of your 15 subjects, are as follows: 67, 72, 75, 80, 60, 65, 68, 69, 69, 70, 70, 80, 76, 60, 60. The weights, in the same sequential order of the 15 subjects, are as follows: 150, 240, 270, 300, 160, 180, 170, 175, 175, 190, 190, 260, 240, 140, 130. R treats each of these variables as vectors, and you can enter the following code into your RStudio console to load the data:
height <- c(67, 72, 75, 80, 60, 65, 68, 69, 69, 70, 70, 80, 76, 60, 60)
weight <- c(150, 240, 270, 300, 160, 180, 170, 175, 175, 190, 190, 260, 240, 140, 130)
Getting the R Value
Having entered these variables into R, you can use the following code to generate your correlation coefficient and its accompanying p value:
result <- cor.test(height, weight, method = "pearson")
result
Therefore, r = .9156, p < .0001. These two variables are positively and significantly correlated. Remember, you can square your r value to get the coefficient of determination, which happens to be 0.8383. In other words, in your dataset, (0.9156)^2 or approximately 83.83% of the variation in weight is explained by variation in height.
A Simple Scatterplot
You can use the code below to generate a scatterplot of these two variables in relation to each other.
plot(height, weight, main="Correlation Example",
xlab="Height ", ylab="Weight", pch=19)
The variable that comes first after you type plot (in this case, height) will be on the x axis. You can title your scatterplot and axes within the code. Note that pch19 is a solid circle, which is what shows up in the scatterplot (check your plots window in RStudio).
Here’s a handy list of other pch symbols in R.
Conclusion
Entering data manually into R is risky. You can mistype a number or lose your sequence. For most practical purposes, therefore, you will want to enter data into R through a Microsoft Excel spreadsheet, which has several advantages. Check out our blog entry on carrying out correlation in R using a spreadsheet to load your data.
BridgeText can help you with all of your statistical analysis needs.