In Machine Learning, training a model to make predictions can be very useful. However, it’s essential to determine how accurately your model is performing to validate its effectiveness. In this tutorial, we will learn how to calculate train accuracy in Python.
Step 1: Create the “dataset.csv” file
Create a file and put the following data inside:
feature_1, feature_2, feature_3, feature_4, target 1.23, 4.56, 2.87, 0.95, 1 2.11, 3.45, 1.98, 0.76, 0 3.01, 5.67, 2.11, 1.02, 1 4.12, 2.34, 1.45, 0.80, 0 2.67, 3.89, 1.23, 0.55, 0
Step 2: Import Necessary Libraries
Firstly, we will need to import the necessary libraries for the process. Here is a list of the required libraries that we would be importing:
- numpy: It is used for mathematical computation.
- pandas: It is used for data processing.
- sklearn: It contains machine learning algorithms.
1 2 3 4 5 |
import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn import metrics |
Step 3: Load the Dataset
We would be using a hypothetical dataset for this tutorial. If you are using your own dataset, substitute accordingly.
1 2 3 |
data = pd.read_csv('dataset.csv') X = data.iloc[:, :-1].values y = data.iloc[:, -1].values |
Step 4: Split the Dataset
Once the data is loaded, we need to split it into two sections, training and test data. The function train_test_split() helps with this. Let’s divide the dataset into 80% training data and 20% test data.
1 |
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) |
Step 5: Create the Model
After splitting the data, let’s initialize a Logistic Regression model (or any other model of your choice) and fit it with our training data.
1 2 |
model = LogisticRegression() model.fit(X_train, y_train) |
Step 6: Calculate Train Accuracy
We can then calculate the train accuracy of our model. This is done using the method model.score() which returns the mean accuracy on the given test data and labels.
1 2 |
train_accuracy = model.score(X_train, y_train) print(train_accuracy) |
The Full Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn import metrics data = pd.read_csv('dataset.csv') X = data.iloc[:, :-1].values y = data.iloc[:, -1].values X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) model = LogisticRegression() model.fit(X_train, y_train) train_accuracy = model.score(X_train, y_train) print(train_accuracy) |
Conclusion
That’s it! You have successfully computed the training accuracy of a machine-learning model using Python. It’s important to note that this metric alone may not determine the model’s overall performance, other measures like Precision and Recall, and F1 Score are useful for a more comprehensive evaluation.