In this tutorial, we are going to learn how to categorize age group in Python. This is useful in various applications, such as data analysis, machine learning, and statistical modeling, where grouping similar age individuals together can provide valuable insights. We will use the popular pandas library to handle our data and accomplish our goal.
Step 1: Import pandas library
First, make sure you have the pandas library installed in your Python environment. If you don’t have it, you can install it using pip:
bash
pip install pandas
Now, we will import the pandas library.
1 |
import pandas as pd |
Step 2: Create a sample dataset
We will create a sample dataset containing the ages of different individuals. You can replace this dataset with your own data if needed.
1 2 3 |
data = {'Name': ['John', 'Paul', 'George', 'Ringo', 'Mick', 'Keith', 'Charlie', 'Ronnie'], 'Age': [40, 77, 58, 80, 77, 78, 80, 74]} df = pd.DataFrame(data) |
Our sample dataset looks like this:
Name Age 0 John 40 1 Paul 77 2 George 58 3 Ringo 80 4 Mick 77 5 Keith 78 6 Charlie 80 7 Ronnie 74
Step 3: Define age categories and create a new column
Next, we need to define the age categories that we want to group our data into. In this example, we will use the following categories:
– Youth: 0 to 17 years
– Adult: 18 to 64 years
– Senior: 65 years and above
We can use the pd.cut()
function from pandas to categorize our age data into the defined categories.
1 2 3 |
bins = [0, 17, 64, float('inf')] labels = ['Youth', 'Adult', 'Senior'] df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels) |
Step 4: Display the updated dataset
Our dataset will now have an additional column showing the age group each individual belongs to. Let’s display the updated dataset.
1 |
print(df) |
The output will look like this:
Name Age AgeGroup 0 John 40 Adult 1 Paul 77 Senior 2 George 58 Adult 3 Ringo 80 Senior 4 Mick 77 Senior 5 Keith 78 Senior 6 Charlie 80 Senior 7 Ronnie 74 Senior
Now, our dataset has been categorized based on age groups.
Full code
Below is the full code for this tutorial:
1 2 3 4 5 6 7 8 9 10 11 |
import pandas as pd data = {'Name': ['John', 'Paul', 'George', 'Ringo', 'Mick', 'Keith', 'Charlie', 'Ronnie'], 'Age': [40, 77, 58, 80, 77, 78, 80, 74]} df = pd.DataFrame(data) bins = [0, 17, 64, float('inf')] labels = ['Youth', 'Adult', 'Senior'] df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels) print(df) |
Output
Name Age AgeGroup 0 John 40 Adult 1 Paul 77 Senior 2 George 58 Adult 3 Ringo 80 Senior 4 Mick 77 Senior 5 Keith 78 Senior 6 Charlie 80 Senior 7 Ronnie 74 Senior
Conclusion
In this tutorial, we have learned how to categorize age groups in Python using the pandas library. This method can be applied to any dataset containing age data, and you can customize the age bins and labels as needed. By categorizing age groups, you can gain insights into your data and perform further analysis according to the defined age groups.