Calculating the residual sum of squares (RSS) in Python is a useful method to evaluate the accuracy of a regression model. In this tutorial, we will walk through the steps to calculate RSS for a given dataset using Python.
By the end of this tutorial, you will be able to compute RSS for your own regression models and datasets.
Step 1: Import Necessary Libraries
To begin, we need to import the necessary libraries for our calculations. This includes NumPy and scikit-learn. If you do not have these installed, you can install them using pip:
1 |
pip install numpy scikit-learn |
Next, import the required libraries in your Python script:
1 2 3 |
import numpy as np from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error |
Step 2: Create a Dataset
For this tutorial, we will create a simple dataset using NumPy. However, you can also use your own dataset if desired.
1 2 3 |
np.random.seed(0) X = np.random.rand(100, 1) y = 2 + 3 * X + np.random.randn(100, 1) |
This code creates an input array X containing 100 random values, and a corresponding output array y.
Step 3: Fit a Linear Regression Model
Now that we have our dataset, let’s fit a linear regression model using scikit-learn’s LinearRegression class.
1 2 |
model = LinearRegression() model.fit(X, y) |
This code creates a LinearRegression object and fits it to our input and output arrays.
Step 4: Calculate RSS
To calculate the RSS, we first need to compute the predicted values for our input array using the .predict() method of our trained model. Then, we can use the mean_squared_error and np.sum functions from scikit-learn and NumPy.
1 2 3 |
y_pred = model.predict(X) mse = mean_squared_error(y, y_pred) rss = np.sum(mse * len(y)) |
This snippet calculates the mean squared error, multiplies it by the number of samples in our input array, and finally computes the residual sum of squares.
Output
Let’s display the computed value of RSS:
1 |
print("RSS:", rss) |
RSS: 83.26288780212968
Full Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import numpy as np from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error # Create dataset np.random.seed(0) X = np.random.rand(100, 1) y = 2 + 3 * X + np.random.randn(100, 1) # Fit a linear regression model model = LinearRegression() model.fit(X, y) # Calculate RSS y_pred = model.predict(X) mse = mean_squared_error(y, y_pred) rss = np.sum(mse * len(y)) # Output print("RSS:", rss) |
In conclusion, this tutorial has demonstrated how to calculate the residual sum of squares in Python using scikit-learn and NumPy. This is a valuable tool when evaluating the accuracy of regression models, and can be easily adapted to your own datasets and models. Happy coding!