In this tutorial, we will learn how to count unique values in a Python DataFrame using the powerful data manipulation library, Pandas. Counting unique values in a DataFrame is a common operation, often required during data analysis and manipulation tasks such as outlier detection and data aggregation.
To follow along with this tutorial, you should have a basic understanding of Python and the Pandas library. If you’re new to Pandas, consider checking out the official 10 Minutes to Pandas guide for a quick introduction.
Step 1: Import Libraries and Create Sample Data
First, let’s import the necessary libraries and create a sample DataFrame with some data to work with. In this case, we will create a DataFrame containing information about employees and their departments:
1 2 3 4 5 6 7 8 9 |
import pandas as pd data = { 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Heidi', 'Ivan', 'Judy'], 'Department': ['HR', 'IT', 'HR', 'IT', 'HR', 'IT', 'HR', 'IT', 'HR', 'IT'] } df = pd.DataFrame(data) print(df) |
Name Department 0 Alice HR 1 Bob IT 2 Charlie HR 3 David IT 4 Eve HR 5 Frank IT 6 Grace HR 7 Heidi IT 8 Ivan HR 9 Judy IT
Step 2: Count Unique Values with Pandas
Now that we have our sample DataFrame, we can start counting unique values. To do this, we will use the Pandas nunique()
method, which returns the number of unique elements in a DataFrame or Series object.
Let’s count the unique values in the Department
column:
1 2 |
unique_departments = df['Department'].nunique() print(f"Number of unique departments: {unique_departments}") |
Number of unique departments: 2
As we can see, there are two unique values (HR and IT) in the Department
column.
Step 3: Counting Unique Values for Each Column
What if we want to count unique values for each column in our DataFrame? We can achieve this by simply using the nunique()
method on the entire DataFrame, as shown below:
1 2 3 |
unique_values = df.nunique() print("Unique values in each column:") print(unique_values) |
Unique values in each column: Name 10 Department 2 dtype: int64
This output tells us that there are 10 unique names and 2 unique departments in our DataFrame.
Step 4: Getting a List of Unique Values
In some cases, we might want to get a list of the unique values themselves, rather than just the count. To do this, we can use the unique()
method, as shown below:
1 2 3 |
unique_departments_list = df['Department'].unique() print("Unique departments:") print(unique_departments_list) |
Unique departments: ['HR' 'IT']
With this, we have successfully extracted the list of unique values in the Department
column.
Full Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import pandas as pd data = { 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Heidi', 'Ivan', 'Judy'], 'Department': ['HR', 'IT', 'HR', 'IT', 'HR', 'IT', 'HR', 'IT', 'HR', 'IT'] } df = pd.DataFrame(data) print(df) unique_departments = df['Department'].nunique() print(f"Number of unique departments: {unique_departments}") unique_values = df.nunique() print("Unique values in each column:") print(unique_values) unique_departments_list = df['Department'].unique() print("Unique departments:") print(unique_departments_list) |
Conclusion
In this tutorial, we have learned how to count unique values in a Python DataFrame using Pandas. We covered the usage of the nunique()
function to count unique values in a DataFrame column, and the unique()
function to get a list of the unique values themselves. With these methods, you can easily perform data analysis and aggregation tasks on your DataFrames. Happy coding!