Counting rows in datasets is a fundamental step in data analysis. Often, you may need to know how many rows you are dealing with, whether to understand the size of your dataset or to pre-process your data.
Two common ways we can do this in Python are using either pandas or the CSV module. With their help, you can easily count the rows in a dataset. In this tutorial, we will explore these two methods.
Example Dataset
Say you have a CSV file named ‘data.csv’ with the following content:
a,b,c,d 1,2,3,4 5,6,7,8 9,10,11,12
Method 1: Using the Pandas Library
Pandas is a high-level data manipulation tool. It is built on the Numpy package and its key data structure is called the DataFrame. The number of rows in a DataFrame is equal to the number of items in the first array.
First, you have to import the pandas library. The “pd” is a convention to shorten pandas.
1 |
import pandas as pd |
Then, you read the CSV file using pandas:
1 |
data = pd.read_csv('data.csv') |
Finally, you can use the built-in Python function len() to count the number of rows:
1 2 |
num_rows = len(data) print('Number of rows:', number_of_rows) |
Method 2: Using the CSV Module
The CSV module in Python implements classes to read and write tabular data in CSV format. Here is how you can use it to count the number of rows.
First, import the csv module:
1 |
import csv |
Next, open your file and create a csv reader object:
1 2 |
with open('data.csv', 'r') as csv_file: csv_reader = csv.reader(csv_file) |
Finally, by simply iterating over the csv reader object, you can count the rows:
1 2 |
number_of_rows = sum(1 for row in csv_reader) print('Number of rows:', number_of_rows) |
Full Code
Pandas Method:
1 2 3 4 5 |
import pandas as pd data = pd.read_csv('data.csv') number_of_rows = len(data) + 1 print('Number of rows:', number_of_rows) |
CSV Module Method:
1 2 3 4 5 6 |
import csv with open('data.csv', 'r') as csv_file: csv_reader = csv.reader(csv_file) number_of_rows = sum(1 for row in csv_reader) print('Number of rows:', number_of_rows) |
Conclusion
Counting the number of rows in a dataset is a common task when dealing with data in Python. You learned how to count rows using the pandas library and the csv module. You can give it a try with your own datasets too!