Before we dive into calculating the training error using Python, it is imperative to know that the training error is a measure of how well your model performs on the training data.

Improved training performance often leads to increased performance on unseen “test” data. Here, we’ll use machine-learning libraries `sklearn`

and `numpy`

, ensuring to highlight key elements as we progress.

### Step 1: Install Required Libraries

This tutorial will utilize several Python libraries. If you don’t have them installed, you can do so using the pip command:

For the code,

1 |
pip install sklearn numpy pandas |

### Step 2: Import the Libraries

We’ll start by importing the libraries necessary for this task.

1 2 3 4 |
import numpy as np from sklearn.model_selection import train_test_split from sklearn import datasets, linear_model from sklearn.metrics import mean_squared_error |

### Step 3:

Load and Split Dataset

We’re going to use the diabetes dataset available in sklearn for this tutorial. We’ll split it into training and testing data.

1 2 3 4 |
diabetes = datasets.load_diabetes() X = diabetes.data y = diabetes.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) |

### Step 4:

Fit the Model

We will use the linear regression model from sklearn, and fit it on the training data.

1 2 |
model = linear_model.LinearRegression() model.fit(X_train, y_train) |

### Step 5:

Calculate the Training Error

We will predict the output on the training data and then calculate the MEAN SQUARED ERROR(MSE) which is our training error.

1 2 |
y_train_predict = model.predict(X_train) training_error = mean_squared_error(y_train, y_train_predict) |

## Full Code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import numpy as np from sklearn.model_selection import train_test_split from sklearn import datasets, linear_model from sklearn.metrics import mean_squared_error diabetes = datasets.load_diabetes() X = diabetes.data y = diabetes.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) model = linear_model.LinearRegression() model.fit(X_train, y_train) y_train_predict = model.predict(X_train) training_error = mean_squared_error(y_train, y_train_predict) print('Training Error(MSE): ', training_error) |

## Output:

Training Error(MSE): 2844.73157308

## Conclusion

With Pythonâ€™s sklearn, calculating the training error becomes a straightforward task. Understanding these metrics is crucial as they provide insight into how well the model is learning from the training data. Keep in mind that although a low training error is a good indicator, testing the model on unseen data is still vital to ensure that the model is not overfitting.