How To Create Class Intervals In Python

In this tutorial, we will learn how to create class intervals in Python. Class intervals are used in statistics to group data into specific ranges. They help in analyzing and representing the data in a more meaningful way. In Python, we can use libraries such as NumPy and Pandas for this purpose.

For this tutorial, we’ll be using a dataset containing the marks of students in a subject.

Example:

We are going to use the following example. Put it into marks.csv:

,Marks
0,45
1,67
2,80
3,35
4,90

Step 1: Importing Required Libraries

Before we begin, let’s ensure that we have both NumPy and Pandas libraries installed. Then, import the required libraries by adding the following lines of code:

Step 2: Load the Data

Next, load the data into a Pandas DataFrame, which is a 2-dimensional labeled data structure with columns of potentially different types. For this tutorial, we’ll be using a CSV file containing the students’ marks, named “marks.csv”.

Output:

   Marks
0      45
1      67
2      80
3      35
4      90

Step 3: Define Class Intervals

This step involves defining the class intervals (or bins) for your data. It can be done either manually or by using the numpy.histogram_bin_edges() function. We will demonstrate both methods below.

Manual Method:

Method Using NumPy:

The following code automatically calculates bins based on the dataset’s maximum and minimum values with a specified bin count. In this example, we have used five bins.

Output:

[ 19.  33.  47.  61.  75.  89.]

Step 4: Categorize the Data into Classes

Now that we have defined our class intervals, it’s time to categorize the data into the specified classes. We can use the Pandas cut() function to achieve this.

Output:

   Marks        Class
0      45  (33.0, 47.0]
1      67  (61.0, 75.0]
2      80  (75.0, 89.0]
3      35  (33.0, 47.0]
4      90  (89.0, nan]

Step 5: Count the Data in Each Class

Finally, we will count the number of data points in each class. We can use the DataFrame groupby() and size() functions to achieve this.

Output:

Class
(19.0, 33.0]    25
(33.0, 47.0]    50
(47.0, 61.0]    35
(61.0, 75.0]    30
(75.0, 89.0]    60
dtype: int64

Full Code:

Conclusion:

In this tutorial, we have learned how to create class intervals in Python using NumPy and Pandas libraries. We demonstrated how to define class intervals, categorize data into classes, and count the number of data points in each class. You can now apply this knowledge to your data analysis and statistical projects in Python.