Before we dive into calculating the training error using Python, it is imperative to know that the training error is a measure of how well your model performs on the training data.
Improved training performance often leads to increased performance on unseen “test” data. Here, we’ll use machine-learning libraries sklearn
and numpy
, ensuring to highlight key elements as we progress.
Step 1: Install Required Libraries
This tutorial will utilize several Python libraries. If you don’t have them installed, you can do so using the pip command:
For the code,
1 |
pip install sklearn numpy pandas |
Step 2: Import the Libraries
We’ll start by importing the libraries necessary for this task.
1 2 3 4 |
import numpy as np from sklearn.model_selection import train_test_split from sklearn import datasets, linear_model from sklearn.metrics import mean_squared_error |
Step 3:
Load and Split Dataset
We’re going to use the diabetes dataset available in sklearn for this tutorial. We’ll split it into training and testing data.
1 2 3 4 |
diabetes = datasets.load_diabetes() X = diabetes.data y = diabetes.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) |
Step 4:
Fit the Model
We will use the linear regression model from sklearn, and fit it on the training data.
1 2 |
model = linear_model.LinearRegression() model.fit(X_train, y_train) |
Step 5:
Calculate the Training Error
We will predict the output on the training data and then calculate the MEAN SQUARED ERROR(MSE) which is our training error.
1 2 |
y_train_predict = model.predict(X_train) training_error = mean_squared_error(y_train, y_train_predict) |
Full Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import numpy as np from sklearn.model_selection import train_test_split from sklearn import datasets, linear_model from sklearn.metrics import mean_squared_error diabetes = datasets.load_diabetes() X = diabetes.data y = diabetes.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) model = linear_model.LinearRegression() model.fit(X_train, y_train) y_train_predict = model.predict(X_train) training_error = mean_squared_error(y_train, y_train_predict) print('Training Error(MSE): ', training_error) |
Output:
Training Error(MSE): 2844.73157308
Conclusion
With Python’s sklearn, calculating the training error becomes a straightforward task. Understanding these metrics is crucial as they provide insight into how well the model is learning from the training data. Keep in mind that although a low training error is a good indicator, testing the model on unseen data is still vital to ensure that the model is not overfitting.