When working with large datasets in Python, it’s often necessary to create multiple dataframes within a loop to help manage your data. This tutorial will guide you through the process of creating multiple dataframes using a loop in Python.
Prerequisites: You should be familiar with Python programming, including loops and Pandas library handling. If you need a refresher on these topics, you can refer to the Python official tutorial and the Pandas getting started tutorials.
Step 1: Import necessary libraries
First, we need to import the necessary libraries, including both Pandas and numpy.
1 2 |
import pandas as pd import numpy as np |
Step 2: Prepare the data
For this tutorial, we will create a sample dataset. Our dataset contains a list of product names and their respective prices. The dataset is made up of random data, and our goal is to divide it into multiple smaller dataframes based on the price range.
1 2 3 4 5 |
# Sample data. data = {"Product": ["Product_1", "Product_2", "Product_3", "Product_4", "Product_5", "Product_6", "Product_7", "Product_8", "Product_9", "Product_10"], "Price": [100, 250, 325, 150, 50, 290, 150, 10, 200, 10]} df = pd.DataFrame(data) |
Step 3: Define the price range categories
Next, let’s define the price categories that we want to split our dataset into, and create a dictionary to store the dataframes for each price range.
1 2 3 4 5 6 7 8 9 |
# Define price range categories. price_ranges = { "Low_Price": (0, 100), "Medium_Price": (100, 250), "High_Price": (250, 400) } # Initialize an empty dictionary to store dataframes for each price range. price_dataframes = {} |
Step 4: Create multiple dataframes within the loop
Now, we can create a loop that iterates over our price_ranges dictionary and creates a separate dataframe for each price range category.
1 2 3 |
for price_category, price_range in price_ranges.items(): filtered_df = df[(df["Price"] >= price_range[0]) & (df["Price"] <= price_range[1])] price_dataframes[price_category] = filtered_df |
In this loop, we are filtering our main dataframe (df) to include only the entries that fall within the specified price range. Then, we’re storing the filtered dataframe in the price_dataframes dictionary under the corresponding price category key.
Output
Our code has now created three different dataframes based on the price range categories, which can be accessed using the ‘price_dataframes’ dictionary. Let’s print the dataframes to see the result.
1 2 |
for category, dataframe in price_dataframes.items(): print(f"{category}:\n{dataframe}\n") |
Low_Price: Product Price 3 Product_4 10 8 Product_9 50 Medium_Price: Product Price 0 Product_1 100 1 Product_2 150 6 Product_7 150 9 Product_10 200 High_Price: Product Price 2 Product_3 400 5 Product_6 290
Full code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
import pandas as pd import numpy as np # Sample data. data = {"Product": ["Product_1", "Product_2", "Product_3", "Product_4", "Product_5", "Product_6", "Product_7", "Product_8", "Product_9", "Product_10"], "Price": [100, 250, 325, 150, 50, 290, 150, 10, 200, 10]} df = pd.DataFrame(data) # Define price range categories. price_ranges = { "Low_Price": (0, 100), "Medium_Price": (100, 250), "High_Price": (250, 400) } # Initialize an empty dictionary to store dataframes for each price range. price_dataframes = {} for price_category, price_range in price_ranges.items(): filtered_df = df[(df["Price"] >= price_range[0]) & (df["Price"] <= price_range[1])] price_dataframes[price_category] = filtered_df for category, dataframe in price_dataframes.items(): print(f"{category}:\n{dataframe}\n") |
Conclusion
In this tutorial, we’ve learned how to create multiple dataframes within a loop in Python. We used Pandas and numpy to filter and manipulate our data based on predefined conditions. With this knowledge, you can now effectively manage large datasets by breaking them down into smaller, more manageable dataframes.