**Introduction**

The purpose of a Chi squared test is to determine whether two categorical variables are independent of each other. In this blog, we’ll show you how to use R to conduct a Chi squared test.

**Install Libraries**

You can begin by installing this R library if you haven’t already:

install.packages("ggplot")

**Access Dataset**

Let’s call up this dataset on diamonds:

library(ggplot2)

View(diamonds)

In RStudio, the dataset will now appear at the top left of your screen:

**Generate a Table and State Hypotheses**

A good beginning for Chi squared analysis is to generate a table of the values you’re comparing. Let’s say that we want to determine whether diamond color and diamond cut are independent of each other. State your hypotheses:

H0: Diamond color and diamond cut are independent of each other.

HA: Diamond color and diamond cut are not independent of each other.

Next, let’s generate a table using the following code:

table(diamonds$cut, diamonds$color)

Here’s what you get:

Thus, for example, 163 diamonds are color D and have a fair cut, 224 diamonds are color E and have a fair cut, etc. From here, you can get the Chi squared results as follows (note that we will capture the results using a variable name, res):

res <- chisq.test(table(diamonds$cut, diamonds$color))

print(res)

Here are the results of the chi squared test:

Therefore, we reject the null hypothesis, as *p *< .05. The variables of diamond color and diamond cut are not independent of each other. What does this mean? In order to get closer to a real-world interpretation of Chi squared findings, you can generate a table of what the results would have looked like had the variables truly been independent of each other. Try the following code:

res$expected

Here is the table of expected values:

You can now compare the observed and expected values to make specific inferences based on colors and cuts.

BridgeText can help you with all of your **statistical analysis needs**.