Support Vector Machines (SVM) are effective predictive models for classification and regression problems. However, by default, SVM only outputs the predicted label (either 0 or 1 in most cases) without the associated probability.
This tutorial will guide you on how to extract probability estimates from SVM in Python using the Scikit-learn library.
Step 1: Import necessary libraries
Initially, we need to import the necessary Python libraries to our workspace. We are primarily interested in ‘numpy’ for array manipulations, and ‘sklearn’ for implementing SVM.
1 2 3 |
import numpy as np from sklearn import svm from sklearn.model_selection import train_test_split |
Step 2: Create and split the dataset
Next, we will create a random binary dataset to establish and evaluate our SVM. We chose a binary dataset because in this tutorial we’re focused on binary classification (0 or 1). We will also split our dataset into training and test data using the ‘train_test_split’ function from ‘sklearn.model_selection’.
1 2 3 4 5 6 7 8 |
# Create a dataset np.random.seed(0) X = np.random.rand(100,10) # 100 instances with 10 attributes y = np.random.randint(0,2,size = (100,1)) #Split the dataset X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) |
Step 3: Implement SVM with probability output
This is the crucial step where we create our SVM model. We have to ensure that the ‘probability’ parameter is set to ‘True’ so that the model can output probabilities. After creating the SVM model, we then fit it to our training data.
1 2 3 4 5 |
# Create the SVM model with probability_output clf = svm.SVC(probability=True) # Fit the SVM model clf.fit(X_train, np.ravel(y_train)) |
Step 4: Extract the probabilities
Now that our model is trained, we can extract the probabilities using the ‘predict_proba’ function. This will give us a two-dimensional array where the first index refers to the probability that the data belong to class 0, and the second index refers to the probability that the data belong to class 1.
1 2 |
# Extract probabilities predictions = clf.predict_proba(X_test) |
Full Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
import numpy as np from sklearn import svm from sklearn.model_selection import train_test_split # Create a dataset np.random.seed(0) X = np.random.rand(100, 10) y = np.random.randint(0, 2, size=(100, 1)) #Split the dataset X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create the SVM model with probability_output clf = svm.SVC(probability=True) # Fit the SVM model clf.fit(X_train, np.ravel(y_train)) # Extract probabilities predictions = clf.predict_proba(X_test) print(predictions) |
[[0.52995557 0.47004443] [0.55155201 0.44844799] [0.51856043 0.48143957] [0.51430923 0.48569077] [0.52408975 0.47591025] [0.49170712 0.50829288] [0.52216502 0.47783498] [0.5 0.5 ] [0.52163795 0.47836205] [0.52158963 0.47841037] [0.53134319 0.46865681] [0.54082436 0.45917564] [0.52005945 0.47994055] [0.51803652 0.48196348] [0.5193151 0.4806849 ] [0.54809178 0.45190822] [0.53539258 0.46460742] [0.53050686 0.46949314] [0.54508657 0.45491343] [0.53447186 0.46552814] [0.53908067 0.46091933] [0.5292297 0.4707703 ] [0.4862986 0.5137014 ] [0.53909105 0.46090895] [0.53088539 0.46911461] [0.5 0.5 ] [0.53154065 0.46845935] [0.50852081 0.49147919] [0.53778123 0.46221877] [0.5 0.5 ]]
Make sure that the version of the Scikit-learn library is up-to-date to avoid any compatibility issues. You can check the version of scikit-learn using the following command:
1 2 |
import sklearn print('The scikit-learn version is {}.'.format(sklearn.__version__)) |
Conclusion
Now you can extract probability information from SVM. This is especially useful when you’re more interested in the certainty (as probabilities) of the predictions instead of focusing only on the hard labels (0 or 1).
Keep in mind that the ‘probability’ parameter of the SVC function in Scikit-learn implements a built-in method for probability estimates, which is different from directly calculating probabilities based on decision function values.
For the latter, you would need additional steps and considerations but the approach mentioned in this tutorial is the most straightforward and commonly used.