How To Calculate R-Squared In Linear Regression Python

R-squared (R²) is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model. In other words, it shows how well a linear regression model fits the data. In this tutorial, we will discuss how to calculate R-squared in linear regression using Python.

Step 1: Import Required Libraries

We’ll first need to import the required libraries which include:

  • numpy: for mathematical calculations
  • Pandas: for reading the data file and handling dataframes
  • matplotlib: for plotting the graphs
  • sklearn: for the linear regression model and R-squared calculation

Step 2: Load and Preprocess the Data

For this tutorial, we’ll use a sample dataset of advertising expenses and sales data. The dataset has three columns:
* Advertising Expenses, X1 (Independent Variable)
* Advertising Expenses, X2 (Independent Variable)
* Sales, Y (Dependent Variable)

Create a CSV file named advertising_data.csv with the following data:

X1,X2,Y
23,44,64
18,32,61
34,35,83
24,26,69
36,40,95
18,34,45
24,17,63
19,39,68
18,17,64
12,41,70

Now, let’s read the data from the CSV file and preprocess it:

Step 3: Fit the Linear Regression Model

Next, create a linear regression model and fit it to the data:

Step 4: Predict the Sales Data

Predict the sales data using the fitted linear regression model:

Step 5: Calculate R-squared

Now, we will use the r2_score function from sklearn.metrics to calculate R-squared:

This will output the R-squared value for our prediction model:

R-squared value: 0.9681387240319829

Step 6: Visualize the Data and Regression Line

Finally, let’s visualize the actual data points, the predicted data points, and the fitted linear regression line:

This will display a scatter plot of the actual data points in red and the predicted data points in green. The linear regression line will be displayed, showing the relationship between advertising expenses and sales.

Full Code:

Output

Conclusion

In this tutorial, we learned how to calculate R-squared for a linear regression model in Python using the sklearn library. This measure helps us determine how well our model fits the data and can be used to compare different models for a more accurate prediction.