Introduction
Ordinary least squares (OLS) regression, also known as linear regression, is a common statistical procedure. Many academic papers, research essays, and theses include OLS regression. In this blog entry, we’ll show you how to use R to generate basic OLS findings. In another blog entry, we’ve included more advanced concepts related to OLS regression in R, including residuals testing, leverage, multicollinearity, and other diagnostics. Our purpose here is to show you how to generate, interpret, and illustrate basic OLS findings in R.
What You’ll Need
Entering Data Manually
In the simplest scenario, you can enter data into R manually, using the console box at the bottom left of your RStudio window:
You can start typing where the cursor is.
Let’s say you have data on the heights (in inches) and weights (in pounds) of 15 people. The heights, in sequential order of your 15 subjects, are as follows: 67, 72, 75, 80, 60, 65, 68, 69, 69, 70, 70, 80, 76, 60, 60. The weights, in the same sequential order of the 15 subjects, are as follows: 150, 240, 270, 300, 160, 180, 170, 175, 175, 190, 190, 260, 240, 140, 130. R treats each of these variables as vectors, and you can enter the following code into your RStudio console to load the data:
height <- c(67, 72, 75, 80, 60, 65, 68, 69, 69, 70, 70, 80, 76, 60, 60)
weight <- c(150, 240, 270, 300, 160, 180, 170, 175, 175, 190, 190, 260, 240, 140, 130)
Running the Regression
Having entered these variables into R, you can use the following code to generate your OLS regression model:
model = lm(weight ~ height)
summary(model)
Your OLS model is significant. In APA format, you could write that there is a significant linear relationship between weight and height, F(1, 13) = 67.41, p < .0001. Looking at the coefficients, you would write your regression equation as follows:
Weight = 7.19(Height) – 301.08
Thus, every inch of added height corresponds with 7.19 added pounds of bodyweight. You could use the equation above to predict weight given height. For example, a person 71 inches tall would be predicted to have the following weight:
Weight = 7.19(71) – 301.08, or 209.41 pounds
Don’t forget, you can square your r value to get the coefficient of determination, which happens to be 0.8383. In other words, in your dataset, (0.9156)^2 or approximately 83.83% of the variation in weight is explained by variation in height.
Scatterplot
You should take advantage of R’s customized graphing features to generate regression scatterplots that contain (a) the OLS line of best fit and (b) the 95% confidence interval (CI). The line of best fit is the prediction line that demonstrates the linear trend relating your data, and the 95% CI illustrates the precision of the model’s fit. The code for creating this graph in R, using the data and variable names above, is as follows:
olsdata <- data.frame(weight, height)
install.packages("ggplot2")
library(ggplot2)
ggplot(olsdata, aes(weight, height)) +
geom_smooth(method = "lm", se = TRUE, col = "black") +
geom_point(size = 3, col = "firebrick") +
labs(x = "Height", y = "Weight") +
theme_classic()
Conclusion
OLS regression is a common statistical procedure in many academic papers, research essays, and theses. In this blog, we demonstrated how to run a simple OLS regression in R. In another blog entry, we’ve demonstrated some of R’s more advanced regression features.
BridgeText can help you with all of your statistical analysis needs.