Age binning involves the process of dividing a set of ages into discrete intervals, which can be used for a variety of analytical purposes such as data analysis and visualization, demographics studies, and machine learning algorithms. In this tutorial, you’ll learn how to bin ages using the Python programming language and its powerful libraries like Pandas and NumPy.
Step 1: Importing Necessary Libraries
Firstly, we need to import the necessary libraries. We’ll be using Pandas for data manipulation and NumPy for numerical operations.
1 2 |
import pandas as pd import numpy as np |
Step 2: Creating Age Data
Let’s create a sample age data using NumPy for this tutorial.
1 2 3 |
np.random.seed(10) age_data = np.random.randint(10, 100, 100) df = pd.DataFrame(age_data, columns=['Age']) |
Step 3: Defining Bins and Labels
We define the age bins that we want to use. For instance, we’ll use these bins: 0-20, 21-40, 41-60, 61-80, 81-100. Each bin defines a range of ages. We also need to define labels for these bins.
1 2 |
age_bins = [0, 20, 40, 60, 80, 100] labels = ['0-20', '21-40', '41-60', '61-80', '81-100'] |
Step 4: Using Pandas’ cut function
We’ll use the cut function in Pandas to segment and sort the data into these bins. This function is used to separate the array elements into different bins
1 |
df['Age Range'] = pd.cut(df['Age'], bins=age_bins, labels=labels) |
Step 5: Displaying the Binned Data
Finally, let’s print our dataframe to see the results.
1 |
print(df.head(10)) |
Full Code
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pandas as pd import numpy as np np.random.seed(10) age_data = np.random.randint(10, 100, 100) df = pd.DataFrame(age_data, columns=['Age']) age_bins = [0, 20, 40, 60, 80, 100] labels = ['0-20', '21-40', '41-60', '61-80', '81-100'] df['Age Range'] = pd.cut(df['Age'], bins=age_bins, labels=labels) print(df.head(10)) |
Output:
Age Age Range 0 19 0-20 1 25 21-40 2 74 61-80 3 38 21-40 4 99 81-100 5 39 21-40 6 18 0-20 7 83 81-100 8 10 0-20 9 50 41-60
Conclusion
Binning of age (or any numerical data) is a powerful technique used in data analysis when you need to categorize numerical data into different ranges. Python provides easy-to-use functions like cut in the pandas library to execute this. Hope you found this tutorial helpful. Happy coding!