Working with Excel spreadsheets can be a time-consuming manual process, but Python’s vast library of packages makes it easier and more efficient to manipulate, analyze, and modify Excel files. In this tutorial, we’ll learn how to split rows in an Excel sheet using Python. To accomplish this, we will be using the popular Python library pandas and openpyxl.
Step 1: Install the required packages
In order to work with Excel files using Python, you’ll need to install two packages – pandas
and openpyxl
. You can install these packages using pip by running the following commands in your terminal:
1 2 |
pip install pandas pip install openpyxl |
These packages will give us the tools we need to work with Excel files in Python.
Step 2: Read the Excel file
After installing the necessary packages, we can start reading data from an Excel file using pandas.
Assume that we have a sample Excel file called example.xlsx
with the following content:
Name Age City John 25 New York Alice 30 Los Angeles Bob 22 Chicago Eve 28 Boston
First, let’s import the required libraries and read the Excel file:
1 2 3 4 5 6 7 |
import pandas as pd from openpyxl import load_workbook file_path = "example.xlsx" sheet_name = "Sheet1" data = pd.read_excel(file_path, sheet_name=sheet_name) |
Replace the file_path
variable with the path of your Excel file and sheet_name
with the specific sheet that contains the data you want to split.
Step 3: Split rows based on a condition or a value
Now let’s split the rows into two separate DataFrames based on age. In this case, we want to separate people who are 25 or younger and those who are older than 25:
1 2 |
younger_than_25 = data[data['Age'] <= 25] older_than_25 = data[data['Age'] > 25] |
Here, we’re using the pandas DataFrame syntax to create two new DataFrames based on the condition of the Age
column.
Step 4: Write the separated DataFrames to the Excel file
After splitting the rows, we can now write the separated DataFrames to our destination Excel file. We will use the openpyxl
package to load the workbook and the ExcelWriter
class from pandas to write the DataFrames to separate sheets:
1 2 3 4 5 6 7 8 9 10 11 |
# Load the workbook workbook = load_workbook(file_path) writer = pd.ExcelWriter(file_path, engine='openpyxl') writer.book = workbook # Write DataFrames to separate sheets younger_than_25.to_excel(writer, index=False, sheet_name='Younger than 26') older_than_25.to_excel(writer, index=False, sheet_name='Older than 25') # Save the Excel file writer.save() |
This code will create two new sheets in the Excel file: ‘Younger than 26’ and ‘Older than 25’, with the respective DataFrames written to those sheets.
Full Code
Here’s the complete code to split rows in an Excel file using Python:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
import pandas as pd from openpyxl import load_workbook file_path = "example.xlsx" sheet_name = "Sheet1" data = pd.read_excel(file_path, sheet_name=sheet_name) younger_than_25 = data[data['Age'] <= 25] older_than_25 = data[data['Age'] > 25] workbook = load_workbook(file_path) writer = pd.ExcelWriter(file_path, engine='openpyxl') writer.book = workbook younger_than_25.to_excel(writer, index=False, sheet_name='Younger than 26') older_than_25.to_excel(writer, index=False, sheet_name='Older than 25') writer.save() |
Output
Conclusion
In this tutorial, we learned how to split rows in an Excel sheet using Python. This technique can be useful when organizing data into separate sheets based on certain conditions, making it easier to analyze and manipulate data.
By utilizing the power of Python packages such as pandas and openpyxl, we can automate splitting rows and save time when working with Excel files.