In data analysis and machine learning preparations, it is common to come across CSV files or datasets with empty or null records.
These empty records or rows can disrupt data analysis or machine learning models, making it necessary to clean the data by eliminating them. This tutorial will guide you on how to delete these empty rows using Python and the popular Python libraries Pandas and NumPy.
Step 1: Installing Required Libraries
We need both Pandas and NumPy for this task. To install these libraries, run these commands:
1 |
pip install pandas numpy |
Step 2: Importing Libraries
1 2 |
import pandas as pd import numpy as np |
Step 3: Creating a Dataframe
Let’s create a dataframe for our tutorial with some empty rows.
1 2 3 4 5 6 |
data = {'Name': ['John', 'Anna', np.nan, 'Steve'], 'Age': [27, np.nan, np.nan, 23], 'Gender': ['Male', np.nan, np.nan, 'Male']} df = pd.DataFrame(data) print(df) |
The output will be:
Name Age Gender 0 John 27.0 Male 1 Anna NaN NaN 2 NaN NaN NaN 3 Steve 23.0 Male
Step 4: Deleting Empty Rows
Now, let’s remove the empty rows using the dropna method from pandas.
1 2 |
df = df.dropna(how='all') print(df) |
The output will be:
Name Age Gender 0 John 27.0 Male 1 Anna NaN NaN 3 Steve 23.0 Male
As we can see, the third row which was entirely empty has been removed.
Full Python Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import pandas as pd import numpy as np # Create DataFrame data = {'Name': ['John', 'Anna', np.nan, 'Steve'], 'Age': [27, np.nan, np.nan, 23], 'Gender': ['Male', np.nan, np.nan, 'Male']} df = pd.DataFrame(data) print("Before:") print(df) # Delete empty rows df = df.dropna(how='all') print("\nAfter:") print(df) |
Name Age Gender 0 John 27.0 Male 1 Anna NaN NaN 2 NaN NaN NaN 3 Steve 23.0 Male After: Name Age Gender 0 John 27.0 Male 1 Anna NaN NaN 3 Steve 23.0 Male
Conclusion
Cleaning data is a critical step in data analysis projects and it may often require the deletion of empty records or rows.
In Python, the process of deleting empty rows is made much easier thanks to powerful libraries such as Pandas and NumPy.
With the simple use of the dropna method, you could easily accomplish this task. It’s always great practice to keep the data clean and ready for analysis to ensure reliable results.