How to Get Probability from SVM in Python

Support Vector Machines (SVM) are effective predictive models for classification and regression problems. However, by default, SVM only outputs the predicted label (either 0 or 1 in most cases) without the associated probability.

This tutorial will guide you on how to extract probability estimates from SVM in Python using the Scikit-learn library.

Step 1: Import necessary libraries

Initially, we need to import the necessary Python libraries to our workspace. We are primarily interested in ‘numpy’ for array manipulations, and ‘sklearn’ for implementing SVM.

Step 2: Create and split the dataset

Next, we will create a random binary dataset to establish and evaluate our SVM. We chose a binary dataset because in this tutorial we’re focused on binary classification (0 or 1). We will also split our dataset into training and test data using the ‘train_test_split’ function from ‘sklearn.model_selection’.

Step 3: Implement SVM with probability output

This is the crucial step where we create our SVM model. We have to ensure that the ‘probability’ parameter is set to ‘True’ so that the model can output probabilities. After creating the SVM model, we then fit it to our training data.

Step 4: Extract the probabilities

Now that our model is trained, we can extract the probabilities using the ‘predict_proba’ function. This will give us a two-dimensional array where the first index refers to the probability that the data belong to class 0, and the second index refers to the probability that the data belong to class 1.

Full Code

[[0.52995557 0.47004443]
 [0.55155201 0.44844799]
 [0.51856043 0.48143957]
 [0.51430923 0.48569077]
 [0.52408975 0.47591025]
 [0.49170712 0.50829288]
 [0.52216502 0.47783498]
 [0.5        0.5       ]
 [0.52163795 0.47836205]
 [0.52158963 0.47841037]
 [0.53134319 0.46865681]
 [0.54082436 0.45917564]
 [0.52005945 0.47994055]
 [0.51803652 0.48196348]
 [0.5193151  0.4806849 ]
 [0.54809178 0.45190822]
 [0.53539258 0.46460742]
 [0.53050686 0.46949314]
 [0.54508657 0.45491343]
 [0.53447186 0.46552814]
 [0.53908067 0.46091933]
 [0.5292297  0.4707703 ]
 [0.4862986  0.5137014 ]
 [0.53909105 0.46090895]
 [0.53088539 0.46911461]
 [0.5        0.5       ]
 [0.53154065 0.46845935]
 [0.50852081 0.49147919]
 [0.53778123 0.46221877]
 [0.5        0.5       ]]

Make sure that the version of the Scikit-learn library is up-to-date to avoid any compatibility issues. You can check the version of scikit-learn using the following command:

Conclusion

Now you can extract probability information from SVM. This is especially useful when you’re more interested in the certainty (as probabilities) of the predictions instead of focusing only on the hard labels (0 or 1).

Keep in mind that the ‘probability’ parameter of the SVC function in Scikit-learn implements a built-in method for probability estimates, which is different from directly calculating probabilities based on decision function values.

For the latter, you would need additional steps and considerations but the approach mentioned in this tutorial is the most straightforward and commonly used.