R-squared (R²) is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model. In other words, it shows how well a linear regression model fits the data. In this tutorial, we will discuss how to calculate R-squared in linear regression using Python.
Step 1: Import Required Libraries
We’ll first need to import the required libraries which include:
- numpy: for mathematical calculations
- Pandas: for reading the data file and handling dataframes
- matplotlib: for plotting the graphs
- sklearn: for the linear regression model and R-squared calculation
1 2 3 4 5 |
import numpy as np import Pandas as pd import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score |
Step 2: Load and Preprocess the Data
For this tutorial, we’ll use a sample dataset of advertising expenses and sales data. The dataset has three columns:
* Advertising Expenses, X1 (Independent Variable)
* Advertising Expenses, X2 (Independent Variable)
* Sales, Y (Dependent Variable)
Create a CSV file named advertising_data.csv with the following data:
X1,X2,Y
23,44,64
18,32,61
34,35,83
24,26,69
36,40,95
18,34,45
24,17,63
19,39,68
18,17,64
12,41,70
Now, let’s read the data from the CSV file and preprocess it:
1 2 3 4 |
data = pd.read_csv('advertising_data.csv') X1 = data.iloc[:, 0].values.reshape(-1, 1) X2 = data.iloc[:, 1].values.reshape(-1, 1) Y = data.iloc[:, 2].values.reshape(-1, 1) |
Step 3: Fit the Linear Regression Model
Next, create a linear regression model and fit it to the data:
1 2 |
linear_regressor = LinearRegression() linear_regressor.fit(np.column_stack((X1, X2)), Y) |
Step 4: Predict the Sales Data
Predict the sales data using the fitted linear regression model:
1 |
Y_pred = linear_regressor.predict(np.column_stack((X1, X2))) |
Step 5: Calculate R-squared
Now, we will use the r2_score function from sklearn.metrics to calculate R-squared:
1 2 |
r_squared = r2_score(Y, Y_pred) print("R-squared value: ", r_squared) |
This will output the R-squared value for our prediction model:
R-squared value: 0.9681387240319829
Step 6: Visualize the Data and Regression Line
Finally, let’s visualize the actual data points, the predicted data points, and the fitted linear regression line:
1 2 3 4 5 6 7 |
plt.scatter(X1, Y, color='red', label='Actual') plt.scatter(X1, Y_pred, color='green', label='Predicted') plt.title('Linear Regression') plt.xlabel('Advertising Expenses') plt.ylabel('Sales') plt.legend() plt.show() |
This will display a scatter plot of the actual data points in red and the predicted data points in green. The linear regression line will be displayed, showing the relationship between advertising expenses and sales.
Full Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score # Step 2: Load and preprocess the data data = pd.read_csv('advertising_data.csv') X1 = data.iloc[:, 0].values.reshape(-1, 1) X2 = data.iloc[:, 1].values.reshape(-1, 1) Y = data.iloc[:, 2].values.reshape(-1, 1) # Step 3: Fit the linear regression model linear_regressor = LinearRegression() linear_regressor.fit(np.column_stack((X1, X2)), Y) # Step 4: Predict the sales data Y_pred = linear_regressor.predict(np.column_stack((X1, X2))) # Step 5: Calculate R-squared r_squared = r2_score(Y, Y_pred) print("R-squared value: ", r_squared) # Step 6: Visualize the data and regression line plt.scatter(X1, Y, color='red', label='Actual') plt.scatter(X1, Y_pred, color='green', label='Predicted') plt.title('Linear Regression') plt.xlabel('Advertising Expenses') plt.ylabel('Sales') plt.legend() plt.show() |
Output

Conclusion
In this tutorial, we learned how to calculate R-squared for a linear regression model in Python using the sklearn library. This measure helps us determine how well our model fits the data and can be used to compare different models for a more accurate prediction.