In this tutorial, we will explore how to test for ‘Not a Number’ (NaN) values in Python. NaN values usually arise from undefined or unrepresentable data like 0/0.
They can cause a variety of problems if not properly handled when performing calculations or data analysis.
Python provides several methods for identifying these NaN values within a dataset, so let’s get started!
Step 1: Import Necessary Libraries
The primary library we will use for this tutorial is the pandas library. This library provides the isnull function, which returns True for each NaN or None value in a given series or dataframe.
1 |
import pandas as pd |
Step 2: Create a DataFrame with NaN Values
Let’s create an exemplar dataframe with NaN values to work on. We can create NaN values using the numpy library.
1 2 3 4 |
import numpy as np data = {'A': [1,2,np.nan], 'B': [5,np.nan,1], 'C': [1,2,3]} df = pd.DataFrame(data) |
Step 3: Use the pandas isnull Function
Now we will use the isnull function, which returns a Boolean value for each element in the DataFrame or Series. If the element is NaN or None, isnull returns True; otherwise, it returns False.
1 |
print(df.isnull()) |
Full Code
1 2 3 4 5 6 7 8 9 |
import pandas as pd import numpy as np # Create a DataFrame data = {'A': [1,2,np.nan], 'B': [5,np.nan,1], 'C': [1,2,3]} df = pd.DataFrame(data) # Use isnull function to check for NaN values print(df.isnull()) |
Output
A B C 0 False False False 1 False True False 2 True False False
From the output above, the True values represent NaN or None values found in the DataFrame.
Conclusion
This tutorial has shown you how to test for NaN values in Python using the pandas isnull function. Remember, always handling these kinds of values in your datasets is crucial as they can cause unexpected results during your data analysis or when developing machine learning models.
A proper understanding and handling of NaN values are necessary for developing more accurate models.