In this tutorial, we will learn how to perform the Anderson-Darling Test in Python. The Anderson-Darling test is a statistical test that can be used to determine if a sample of data follows a specific distribution, commonly the normal distribution.
This test is useful in various fields such as finance, engineering, or social sciences, helping to assert the assumptions made about the data distribution. We will use Python’s SciPy library to perform the Anderson-Darling Test.
1. Install Required Libraries
First, let’s install the required libraries using pip
. Open up your terminal and execute the following command:
1 |
pip install scipy |
This will install the SciPy library, which contains various statistical testing functions, including the Anderson-Darling test.
2. Import Libraries
Now that we have installed SciPy, let’s go ahead and import it, as well as the NumPy library.
1 2 |
import numpy as np from scipy.stats import anderson |
3. Generate Sample Data
For this tutorial, we will create a sample dataset that approximately follows a normal distribution. You may also use your own dataset as input for the Anderson-Darling test.
1 2 |
np.random.seed(42) sample_data = np.random.normal(loc=0, scale=1, size=100) |
Above, we create a sample dataset of size 100 using NumPy’s random.normal
function, which generates random samples from a normal distribution with a mean of 0 and a standard deviation of 1.
4. Perform Anderson-Darling Test
Finally, it’s time to perform the Anderson-Darling test using the anderson
function from the SciPy library.
1 |
result = anderson(sample_data, dist='norm') |
The function accepts two arguments: the data and the desired distribution type to test against. In this case, we choose 'norm'
for the normal distribution. The function returns a named tuple with the following attributes:
- statistic: The Anderson-Darling test statistic
- critical_values: an array of critical values
- significance_level: an array of significance levels (or confidence levels) in percentages
5. Analyze the Output
Now let’s analyze the output of the Anderson-Darling test.
1 2 3 |
print('Test Statistic:', result.statistic) print('Critical Values:', result.critical_values) print('Significance Levels:', result.significance_level) |
Output:
Test Statistic: 0.21985364574468682 Critical Values: [0.555 0.632 0.759 0.885 1.053] Significance Levels: [15. 10. 5. 2.5 1. ]
We can compare the test statistic with the critical values at various significance levels. If the test statistic is less than the critical value at a certain significance level, we can conclude that the data follows the specified distribution at that level of confidence.
In this example, since the test statistic (0.22) is less than the critical value (0.555) at a 15% significance level, we can conclude that the data follows a normal distribution with an 85% level of confidence.
Full Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import numpy as np from scipy.stats import anderson # Generate Sample Data np.random.seed(42) sample_data = np.random.normal(loc=0, scale=1, size=100) # Perform Anderson-Darling Test result = anderson(sample_data, dist='norm') # Analyze the Output print('Test Statistic:', result.statistic) print('Critical Values:', result.critical_values) print('Significance Levels:', result.significance_level) |
Conclusion
In this tutorial, we discussed the Anderson-Darling Test and how it can be used to test if the data follow a specific distribution, particularly the normal distribution. We demonstrated how to perform this test using Python’s SciPy library. Upon performing the test and analyzing its output, we were able to determine if the data follows a normal distribution with a certain level of confidence.