In this tutorial, we will learn how to calculate the coefficient of correlation in Python using two common methods: Pearson’s correlation coefficient and Spearman’s rank correlation coefficient. These coefficients help us understand the strength and direction of a linear relationship between two variables.
Step 1: Loading the Dataset
First, let’s create a sample dataset consisting of two variables X and Y.
1 2 |
X = [10, 20, 30, 40, 50] Y = [15, 25, 35, 45, 55] |
To calculate the coefficient of correlation, we will be using the scipy
library in Python. If you don’t have it installed, you can install it using the following command:
1 |
!pip install scipy |
Step 2: Calculating Pearson’s Correlation Coefficient
Pearson’s correlation coefficient is a measure of the linear relationship between two variables. It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation) with 0 being no correlation.
To calculate Pearson’s correlation coefficient in Python, we can use the pearsonr
function from the scipy.stats
module.
1 2 3 4 |
from scipy.stats import pearsonr correlation_coefficient, p_value = pearsonr(X, Y) print("Pearson's correlation coefficient:", correlation_coefficient) |
Step 3: Calculating Spearman’s Rank Correlation Coefficient
Spearman’s rank correlation coefficient is a non-parametric measure of the strength and direction of the association between two ranked variables. It ranges from -1 (perfect inverse relationship) to 1 (perfect positive relationship) with 0 being no correlation.
To calculate Spearman’s rank correlation coefficient in Python, we can use the spearmanr
function from the scipy.stats
module.
1 2 3 4 |
from scipy.stats import spearmanr correlation_coefficient, p_value = spearmanr(X, Y) print("Spearman's rank correlation coefficient:", correlation_coefficient) |
Full Code
1 2 3 4 5 6 7 8 9 10 |
X = [10, 20, 30, 40, 50] Y = [15, 25, 35, 45, 55] from scipy.stats import pearsonr correlation_coefficient, p_value = pearsonr(X, Y) print("Pearson's correlation coefficient:", correlation_coefficient) from scipy.stats import spearmanr correlation_coefficient, p_value = spearmanr(X, Y) print("Spearman's rank correlation coefficient:", correlation_coefficient) |
Output
Pearson's correlation coefficient: 1.0 Spearman's rank correlation coefficient: 1.0
Conclusion
In this tutorial, we learned how to calculate the coefficient of correlation in Python using two common methods: Pearson’s correlation coefficient and Spearman’s rank correlation coefficient. We used the scipy.stats
module to calculate these coefficients and thus, analyze the relationship between two variables.