The Kolmogorov-Smirnov Test (K-S Test) is a statistical test that can be used to compare a sample with a reference probability distribution (one-sample K-S test) or to compare two samples (two-sample K-S test).
In this tutorial, we will guide you through the process of how to perform the K-S Test in Python, specifically using the SciPy library.
Step 1: Installing the Required Libraries
For this guide, you’ll mainly require SciPy and Numpy libraries. If not already installed in your Python environment, install them with the following command:
1 2 |
pip install scipy numpy |
Step 2: Importing the Libraries
We’ll begin by importing the necessary libraries and modules:
1 2 |
import numpy as np from scipy.stats import kstest |
Step 3: Generating Data
Next up, we need to generate sample data for our K-S test. We’ll go ahead and create this data using Numpy:
1 2 3 4 5 |
np.random.seed(12345678) # fix random seed to get same numbers n1 = 200 # size of first sample n2 = 300 # size of second sample rvs1 = np.random.normal(size=n1, loc=0., scale=1) rvs2 = np.random.normal(size=n2, loc=0.5, scale=1.5) |
Step 4: Performing the K-S Test
The Scipy library’s kstest function allows us to perform a Kolmogorov-Smirnov test for goodness of fit. Thus comparing our sample dataset (rvs1) to a normal distribution:
1 2 |
ks_result = kstest(rvs1, 'norm') print('K-S test result is:', ks_result) |
This will provide a test statistic and a p-value. The P-value indicates the probability of an unobserved result. If your p-value is below 0.05, we reject the null hypothesis that the sample comes from a normal distribution.
Step 5: Comparing Two Samples
We can also perform a two-sample K-S Test to test if the distribution of two independent samples is the same:
1 2 |
ks_result = kstest(rvs1, rvs2) print('K-S test result for two samples is:', ks_result) |
Full Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import numpy as np from scipy.stats import kstest np.random.seed(12345678) n1 = 200 n2 = 300 rvs1 = np.random.normal(size=n1, loc=0., scale=1) rvs2 = np.random.normal(size=n2, loc=0.5, scale=1.5) ks_result = kstest(rvs1, 'norm') print('K-S test result is:', ks_result) ks_result = kstest(rvs1, rvs2) print('K-S test result for two samples is:', ks_result) |
Output
K-S test result is: KstestResult(statistic=0.0691059988535429, pvalue=0.2819866816254518, statistic_location=-0.24273331471460374, statistic_sign=-1) K-S test result for two samples is: KstestResult(statistic=0.20833333333333334, pvalue=5.1292795977908046e-05, statistic_location=1.079426875125683, statistic_sign=1)
Conclusion
Python, with its wide range of statistical libraries and functions, allows for a concise way to perform complex statistical tests, such as the Kolmogorov-Smirnov test. Utilizing the SciPy library, we’ve detailed how to set up and perform the one-sample and two-sample K-S tests. This should provide a solid foundation for your explorations of statistical testing in Python.