Comparison of Pearson vs Spearman Correlation Coefficients

Sereno 23 Apr, 2024 • 5 min read

Pearson and Spearman correlation coefficients are two widely used statistical measures when measuring the relationship between variables. The Pearson correlation coefficient assesses the linear relationship between variables, while the Spearman correlation coefficient evaluates the monotonic relationship. In this article, we will delve into a comprehensive comparison of these correlation coefficients. We will explore their calculation methods, interpretability, strengths, and limitations. Understanding the differences between Pearson and Spearman correlation coefficients is crucial for selecting the appropriate measure based on the nature of the data and the research objectives. Let’s explore the difference between Pearson vs Spearman Correlation Coefficients!

What is Correlation?
Pearson vs Spearman Correlation
What is Pearson Correlation Coefficient?
What is Spearman Correlation Coefficient?
Example of Spearman’s Rank Correlation
Practical application of correlation using R?
Pearson vs Spearman Correlation – Final Verdict
Frequently Asked Questions

What is Correlation?

Correlation is a statistical measure that tells us about the association between the two variables. It describes how one variable behaves if there is some change in the other variable.

If the two variables are increasing or decreasing in parallel then they have a positive correlation between them and if one of the variables is increasing and another one is decreasing then they have a negative correlation with each other. If the change of one variable has no effect on another variable then they have a zero correlation between them.

Pearson vs Spearman Correlation

	Pearson Correlation Coefficient	Spearman Correlation Coefficient
Purpose	Measures linear relationships	Measures monotonic relationships
Assumptions	Variables are normally distributed, linear relationship	Variables have monotonic relationship, no assumptions on distribution
Calculation Method	Based on covariance and standard deviations	Based on ranked data
Range of Values	-1 to 1	-1 to 1
Interpretation	Strength and direction of linear relationship	Strength and direction of monotonic relationship
Sensitivity to Outliers	Sensitive to outliers	Less sensitive to outliers
Data Types	Appropriate for interval and ratio data	Appropriate for ordinal and non-normally distributed data
Usage	Assessing linear associations, parametric tests	Assessing monotonic associations, non-parametric tests

What is Pearson Correlation Coefficient?

The Pearson correlation coefficient is a statistical measure that quantifies the strength and direction of a linear relationship between two variables. It ranges from -1 to 1, with values close to -1 indicating a strong negative linear relationship, values close to 1 indicating a strong positive linear relationship, and 0 indicating no linear relationship.

What is Spearman Correlation Coefficient?

The Spearman correlation coefficient is a statistical measure that assesses the strength and direction of a monotonic relationship between two variables. It ranks the data rather than relying on their actual values, making it suitable for non-normally distributed or ordinal data. It ranges from -1 to 1, where values close to -1 or 1 indicate a strong monotonic relationship, and 0 indicates no monotonic relationship. Spearman correlation is valuable for detecting and quantifying associations when linear relationships are not assumed or when dealing with ranked or ordinal data.

Example of Spearman’s Rank Correlation

Spearman’s Rank Correlation:

Let’s say we want to determine the relationship between the study time (in hours) and the exam scores (out of 100) of a group of students. We have the following data for five students:

Student	Study Time (hours)	Exam Score
A	10	75
B	8	60
C	12	85
D	6	55
E	9	70

First, we rank the study time and exam scores separately:

Student	Study Time (hours)	Rank (Study Time)	Exam Score	Rank (Exam Score)
A	10	3	75	3
B	8	4	60	5
C	12	1	85	1
D	6	5	55	6
E	9	2	70	4

Now, we calculate the differences between the ranks for each pair of data points:

P=Rank of Study Time−Rank of Exam Score, Di=Rank of Study Timei−Rank of Exam Scorei

Student	Di
A	0
B	-1
C	0
D	-1
E	-2

Next, we square each (Di) value:

Student	2Di2
A	0
B	1
C	0
D	1
E	4

The sum of ��2Di2 is 0+1+0+1+4=60+1+0+1+4=6.

Finally, we use the Spearman’s Rank Correlation formula:

ρ=1−n(n2−1)6∑(Di2)

Where:

n is the number of data points (in this case, 5)
∑(Di2) is the sum of the squared differences

Plugging in the values:

ρ=1−5(52−1)6×6 P=1−365(25−1)ρ=1−5(25−1)36 p=1−365(24)ρ=1−5(24)36 p=1−36120ρ=1−12036 p=1−0.3ρ=1−0.3 p=0.7ρ=0.7

So, the Spearman’s Rank Correlation coefficient (ρ) between study time and exam scores is 0.7, indicating a strong positive correlation.

Practical application of correlation using R?

Determining the association between Girth and Height of Black Cherry Trees (Using the existing dataset “trees” which is already present in r and can be accessed by typing the name of the dataset, list of all the data set can be seen by using the command data() )

Below is the code to compute the correlation:

1. Loading the Dataset

> data <- trees
> head(data, 3)
  Girth Height Volume
1   8.3     70   10.3
2   8.6     65   10.3
3   8.8     63   10.2

2. Creating a Scatter Plot Using ggplot2 Library

> library(ggplot2)
> ggplot(data, aes(x = Girth, y = Height)) + geom_point() + 
+   geom_smooth(method = "lm", se =TRUE, color = 'red')

3. Test for Assumptions of Correlation

Here two assumptions are checked which need to be fulfilled before performing the correlation (Shapiro test, which is test to check the input variable is following the normal distribution or not, is used to check whether the variables i.e. Girth and Height are normally distributed or not)

> shapiro.test(data$Girth)

	Shapiro-Wilk normality test

data:  data$Girth
W = 0.94117, p-value = 0.08893

> shapiro.test(data$Height)

	Shapiro-Wilk normality test

data:  data$Height
W = 0.96545, p-value = 0.4034

p–value is greater than 0.05, so we can assume the normality

4. Correlation

> cor(data$Girth,data$Height, method = "pearson")
[1] 0.5192801
> cor(data$Girth,data$Height, method = "spearman")
[1] 0.4408387

5. Testing the Significance of the Correlation

For Pearson

> Pear <- cor.test(data$Girth, data$Height, method = 'pearson')
> Pear

	Pearson's product-moment correlation

data:  data$Girth and data$Height
t = 3.2722, df = 29, p-value = 0.002758
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.2021327 0.7378538
sample estimates:
      cor 
0.5192801

For Spearman

> Spear <- cor.test(data$Girth, data$Height, method = 'spearman')
> Spear

	Spearman's rank correlation rho

data:  data$Girth and data$Height
S = 2773.4, p-value = 0.01306
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.4408387

Since the p-value is less than 0.05 (For Pearson it is 0.002758 and for Spearman, it is 0.01306, we can conclude that the Girth and Height of the trees are significantly correlated for both the coefficients with the value of 0.5192801 (Pearson) and 0.4408387 (Spearman).

Pearson vs Spearman Correlation – Final Verdict

As we can see both the correlation coefficients give the positive correlation value for Girth and Height of the trees but the value given by them is slightly different because Pearson correlation coefficients measure the linear relationship between the variables while Spearman correlation coefficients measure only monotonic relationships, relationship in which the variables tend to move in the same/opposite direction but not necessarily at a constant rate whereas the rate is constant in a linear relationship.

Frequently Asked Questions

Q1. What is the purpose of Pearson and Spearman correlation?

A. The Pearson and Spearman correlation measures the strength and direction of the relationship between variables. Pearson correlation assesses linear relationships, while Spearman correlation evaluates monotonic relationships.

Q2. When should I use Spearman correlation?

A. Spearman correlation is useful when the relationship between variables is not strictly linear but can be described by a monotonic function. It is commonly used when dealing with ordinal or non-normally distributed data.

Q3. Are Spearman correlations more powerful than Pearson correlations?

It is inaccurate to say that Spearman correlations are inherently more powerful than Pearson correlations. The choice between the two depends on the specific characteristics and assumptions of the data and the research question being addressed.

Q4. Is Spearman always higher than Pearson?

A. Spearman correlation is not always higher than Pearson correlation. The magnitude and direction of the correlation can differ between the two measures, especially when the relationship between variables is nonlinear or influenced by outliers. The choice between the two should be based on the data and the research objectives.

blogathon

Sereno 23 Apr 2024

Beginner R Statistics Structured Data Technique

Comparison of Pearson vs Spearman Correlation Coefficients

Table of contents

What is Correlation?

Pearson vs Spearman Correlation

What is Pearson Correlation Coefficient?

What is Spearman Correlation Coefficient?

Example of Spearman’s Rank Correlation

Practical application of correlation using R?

1. Loading the Dataset

2. Creating a Scatter Plot Using ggplot2 Library

3. Test for Assumptions of Correlation

4. Correlation

5. Testing the Significance of the Correlation

For Pearson

For Spearman

Pearson vs Spearman Correlation – Final Verdict

Frequently Asked Questions

Frequently Asked Questions

Responses From Readers

Write for us

Machine Learning

Comparison of Pearson vs Spearman Correlation Coefficients

Table of contents

What is Correlation?

Pearson vs Spearman Correlation

What is Pearson Correlation Coefficient?

What is Spearman Correlation Coefficient?

Example of Spearman’s Rank Correlation

Practical application of correlation using R?

1. Loading the Dataset

2. Creating a Scatter Plot Using ggplot2 Library

3. Test for Assumptions of Correlation

4. Correlation

5. Testing the Significance of the Correlation

For Pearson

For Spearman

Pearson vs Spearman Correlation – Final Verdict

Frequently Asked Questions

Frequently Asked Questions

Responses From Readers

Write for us

Machine Learning

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

NaÃ¯ve Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices