In this tutorial, we will learn how to draw a regression line on a scatter plot in Python. A regression line is a line that represents the relationship between two variables. In our case, we’ll be using a scatter plot to visualize the relationship between two variables, and we’ll draw a regression line that best fits the data points.
We’ll make use of Python’s popular data visualization library, matplotlib, and a widely-used statistical library called SciPy for this tutorial. If you don’t have them already, you can install them using pip:
1 |
pip install matplotlib scipy |
After installing the required libraries, let’s begin our tutorial.
Step 1: Import Libraries
First, let’s import the necessary libraries:
1 2 3 |
import numpy as np import matplotlib.pyplot as plt from scipy import stats |
Step 2: Create Sample Data
Next, we’ll create some sample data we can use for our scatter plot and regression line. We’ll do this using NumPy, which is a powerful library for numerical and mathematical operations in Python. In this step, we’ll generate some random data points for our two variables, x
and y
.
1 2 3 4 |
np.random.seed(0) # This ensures our random data is consistent across each run of the code. x = np.random.randint(1, 40, 20) # Generate 20 random integers between 1 and 40 y = 2 * x + np.random.normal(0, 10, 20) # Generate y values as a function of x, with some random noise |
Step 3: Calculate Regression Line Coefficients
Now that we have our x
and y
variables, we can calculate the slope and intercept of our regression line. We’ll use the stats.linregress()
function from the SciPy library to do this. This function returns five values: the slope, intercept, the correlation coefficient, the p-value, and the standard error. We only need the slope and intercept for drawing the regression line.
1 |
slope, intercept, _, _, _ = stats.linregress(x, y) |
With the calculated slope and intercept, our regression line will have the equation:
y = slope * x + intercept
Step 4: Draw Scatter Plot and Regression Line
Finally, we’ll plot our data points and the regression line we’ve calculated. We’ll use the plt.plot()
function for drawing the regression line and plt.scatter()
function for the scatter plot.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Scatter plot plt.scatter(x, y, color='blue', label='Data Points') # Regression line plt.plot(x, slope * x + intercept, color='red', label='Regression Line') # Labels and legend plt.xlabel('x') plt.ylabel('y') plt.legend() # Display the plot plt.show() |
This code will generate a visualization with blue data points representing our data and a red regression line. The labels and legend provide context for the plot.
Full Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
import numpy as np import matplotlib.pyplot as plt from scipy import stats # Step 1 & 2: Import libraries and create sample data np.random.seed(0) x = np.random.randint(1, 40, 20) y = 2 * x + np.random.normal(0, 10, 20) # Step 3: Calculate regression line coefficients slope, intercept, _, _, _ = stats.linregress(x, y) # Step 4: Draw scatter plot and regression line plt.scatter(x, y, color='blue', label='Data Points') plt.plot(x, slope * x + intercept, color='red', label='Regression Line') plt.xlabel('x') plt.ylabel('y') plt.legend() plt.show() |
Output:
The output will display a scatter plot with a red regression line representing the relationship between the x
and y
variables.
Conclusion
In this tutorial, we have learned how to draw a regression line on a scatter plot in Python using the matplotlib and SciPy libraries. This is a useful technique for visualizing the relationship between two variables, and it can be easily extended to more complex data and models.
Practice the concepts discussed in this tutorial using different datasets to improve your understanding of drawing regression lines on scatter plots.