**Introduction**

Ordinary least squares (OLS) regression, also known as linear regression, is a common statistical procedure. Many academic papers, research essays, and theses include OLS regression. In this blog entry, we’ll show you how to use R to generate basic OLS findings. In another blog entry, we’ve included more advanced concepts related to OLS regression in R, including residuals testing, leverage, multicollinearity, and other diagnostics. Our purpose here is to show you how to generate, interpret, and illustrate basic OLS findings in R.

**What You’ll Need**

**Entering Data Manually**

In the simplest scenario, you can enter data into R manually, using the console box at the bottom left of your **RStudio** window:

You can start typing where the cursor is.

Let’s say you have data on the heights (in inches) and weights (in pounds) of 15 people. The heights, in sequential order of your 15 subjects, are as follows: 67, 72, 75, 80, 60, 65, 68, 69, 69, 70, 70, 80, 76, 60, 60. The weights, in the same sequential order of the 15 subjects, are as follows: 150, 240, 270, 300, 160, 180, 170, 175, 175, 190, 190, 260, 240, 140, 130. R treats each of these variables as vectors, and you can enter the following code into your RStudio console to load the data:

height <- c(67, 72, 75, 80, 60, 65, 68, 69, 69, 70, 70, 80, 76, 60, 60)

weight <- c(150, 240, 270, 300, 160, 180, 170, 175, 175, 190, 190, 260, 240, 140, 130)

**Running the Regression**

Having entered these variables into R, you can use the following code to generate your OLS regression model:

model = lm(weight ~ height)

summary(model)

Your OLS model is significant. In APA format, you could write that there is a significant linear relationship between weight and height, *F*(1, 13) = 67.41, *p *< .0001. Looking at the coefficients, you would write your regression equation as follows:

Weight = 7.19(Height) – 301.08

Thus, every inch of added height corresponds with 7.19 added pounds of bodyweight. You could use the equation above to predict weight given height. For example, a person 71 inches tall would be predicted to have the following weight:

Weight = 7.19(71) – 301.08, or 209.41 pounds

Don’t forget, you can square your *r* value to get the coefficient of determination, which happens to be 0.8383. In other words, in your dataset, (0.9156)^2 or approximately 83.83% of the variation in weight is explained by variation in height.

**Scatterplot**

You should take advantage of R’s customized graphing features to generate regression scatterplots that contain (a) the OLS line of best fit and (b) the 95% confidence interval (CI). The line of best fit is the prediction line that demonstrates the linear trend relating your data, and the 95% CI illustrates the precision of the model’s fit. The code for creating this graph in R, using the data and variable names above, is as follows:

olsdata <- data.frame(weight, height)

install.packages("ggplot2")

library(ggplot2)

ggplot(olsdata, aes(weight, height)) +

geom_smooth(method = "lm", se = TRUE, col = "black") +

geom_point(size = 3, col = "firebrick") +

labs(x = "Height", y = "Weight") +

theme_classic()

**Conclusion**

OLS regression is a common statistical procedure in many academic papers, research essays, and theses. In this blog, we demonstrated how to run a simple OLS regression in R. In another blog entry, we’ve demonstrated some of R’s more advanced regression features.

BridgeText can help you with all of your **statistical analysis needs**.