How to Create & Interpret a Q-Q Plot in R (2024)

AQ-Q plot, short for “quantile-quantile” plot, is a type of plot that we can use to determine whether or not a set of data potentially came from some theoretical distribution.

Many statistical tests make the assumption that a set of data follows a normal distribution, and a Q-Q plot is often used to assess whether or not this assumption is met.

Although a Q-Q plot isn’t a formal statistical test, it does provide an easy way to visually check whether a dataset follows a normal distribution, and if not, how this assumption is violated and which data points potentially cause this violation.

We can create a Q-Q plot by plotting two sets of quantiles against one another. If both sets of quantiles came from the same distribution, then the points on the plot should roughly form a straight diagonal line.

How to Create & Interpret a Q-Q Plot in R (1)

Quantilesrepresent points in a dataset below which a certain portion of the data fall. For example, the 0.9 quantile represents the point below which 90% of the data fall below. The 0.5 quantile represents the point below which 50% of the data fall below, and so on.

Q-Q plots identify the quantiles in your sample data and plot them against the quantiles of a theoretical distribution. In most cases the normal distribution is used, but a Q-Q plot can actually be created for any theoretical distribution.

If the data points fall along a straight diagonal line in a Q-Q plot, then the dataset likely follows a normal distribution.

How to Create a Q-Q Plot in R

We can easily create a Q-Q plot to check if a dataset follows a normal distribution by using the built-inqqnorm()function.

For example, the following code generates a vector of 100 random values that follow a normal distribution and creates a Q-Q plot for this dataset to verify that it does indeed follow a normal distribution:

#make this example reproducibleset.seed(11)#generate vector of 100 values that follows a normal distributiondata <- rnorm(100)#create Q-Q plot to compare this dataset to a theoretical normal distributionqqnorm(data)

How to Create & Interpret a Q-Q Plot in R (2)

To make it even easier to see if the data falls along a straight line, we can use theqqline()function:

#create Q-Q plotqqnorm(data)#add straight diagonal line to plotqqline(data)

How to Create & Interpret a Q-Q Plot in R (3)

We can see that the data points near the tails don’t fall exactly along the straight line, but for the most part this sample data appears to be normally distributed (as it should be since we told R to generate the data from a normal distribution).

Consider instead the following code that generates a vector of 100 random values that follow a gamma distribution and creates a Q-Q plot for this data to check if it follows a normal distribution:

#make this example reproducibleset.seed(11)#generate vector of 100 values that follows a gamma distributiondata <- rgamma(100, 1)#create Q-Q plot to compare this dataset to a theoretical normal distributionqqnorm(data)qqline(data)

How to Create & Interpret a Q-Q Plot in R (4)

We can see the clear departure from the straight line in this Q-Q plot, indicating that this dataset likely does not follow a normal distribution.

Consider another chunk of code that generates a vector of 100 random values that follow a Chi-Square distribution with 5 degrees of freedom and creates a Q-Q plot for this data to check if it follows a normal distribution:

#make this example reproducibleset.seed(11)#generate vector of 100 values that follows a Chi-Square distributiondata <- rchisq(100, 5)#create Q-Q plot to compare this dataset to a theoretical normal distributionqqnorm(data)qqline(data)

How to Create & Interpret a Q-Q Plot in R (5)

Once again we can see that this dataset does not appear to follow a normal distribution, especially near the tails.

Modifying the Aesthetics of a Q-Q Plot in R

We can modify some of the aesthetics of the Q-Q plot in R including the title, axis labels, data point colors, line color, and line width.

The following code modifies the titles, axis labels, and color of the points in the plot:

#make this example reproducibleset.seed(11)#generate vector of 100 values that follows a normal distributiondata <- rnorm(100)#create Q-Q plotqqnorm(data, main = 'Q-Q Plot for Normality', xlab = 'Theoretical Dist', ylab = 'Sample dist', col = 'steelblue')

How to Create & Interpret a Q-Q Plot in R (6)

Next, the following code adds a straight diagonal line to the plot with a color of red, a line width of 2 (lwd = 2, default is 1), and a dashed line (lty = 2, default is 1):

qqline(data, col = 'red', lwd = 2, lty = 2)

How to Create & Interpret a Q-Q Plot in R (7)

Technical Notes

Keep in mind that a Q-Q plot is simply a way tovisuallycheck if a dataset follows a theoretical distribution. To formally test whether or not a dataset follows a particular distribution, the following tests can be performed (assuming you’re comparing your dataset to a normal distribution):

Anderson-Darling Test
Shapiro-Wilk Test
Kolmogorov-Smirnov Test

How to Create & Interpret a Q-Q Plot in R (2024)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Carmelo Roob

Last Updated:

Views: 6515

Rating: 4.4 / 5 (45 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Carmelo Roob

Birthday: 1995-01-09

Address: Apt. 915 481 Sipes Cliff, New Gonzalobury, CO 80176

Phone: +6773780339780

Job: Sales Executive

Hobby: Gaming, Jogging, Rugby, Video gaming, Handball, Ice skating, Web surfing

Introduction: My name is Carmelo Roob, I am a modern, handsome, delightful, comfortable, attractive, vast, good person who loves writing and wants to share my knowledge and understanding with you.