In this tutorial, we will learn how to add a new column to a Microsoft Excel file using Python. By combining the capabilities of both, we can efficiently add columns to Excel files programmatically without manually opening and editing the files.
For this task, we will be using the pandas and openpyxl libraries. Pandas is an open-source library that provides high-performance, easy-to-use data structures and data analysis tools. Openpyxl is a library to read/write Excel files. If you haven’t installed these libraries, use the following command to install:
1 |
pip install pandas openpyxl |
Now, let’s proceed with the tutorial. We will first load a sample Excel file, then add a new column, and finally save the modified file.
Step 1: Load an Excel file into a Pandas DataFrame
First, we will read an existing Excel file and load its content into a Pandas DataFrame. For this tutorial, let’s assume we already have a sample Excel file named “sample_data.xlsx” with the following content:
File: sample_data.xlsx:
ID, Name, Age 1, Alice, 25 2, Bob, 30 3, Carol, 21 4, Dan, 27
Now, let’s load the Excel file into a Pandas DataFrame:
1 2 3 4 5 |
import pandas as pd input_file = 'sample_data.xlsx' df = pd.read_excel(input_file) print(df) |
Output:
ID Name Age 0 1 Alice 25 1 2 Bob 30 2 3 Carol 21 3 4 Dan 27
Step 2: Add a new column to the DataFrame
Now, we will add a new column to the DataFrame. For this example, let’s add a column called “Salary” that contains random salary values for each employee:
1 2 3 4 |
import numpy as np df['Salary'] = np.random.randint(50000, 100000, df.shape[0]) print(df) |
Output:
ID Name Age Salary 0 1 Alice 25 80179 1 2 Bob 30 85736 2 3 Carol 21 91364 3 4 Dan 27 58718
Step 3: Save the modified DataFrame to a new Excel file
Finally, save the modified DataFrame (with the newly added column) to a new Excel file. We will use the to_excel function from pandas for this purpose:
1 2 |
output_file = 'modified_data.xlsx' df.to_excel(output_file, index=False) |
With these steps, you have successfully added a new column to the given Excel file using Python. Here is the full code for reference:
1 2 3 4 5 6 7 8 9 |
import pandas as pd import numpy as np input_file = 'sample_data.xlsx' output_file = 'modified_data.xlsx' df = pd.read_excel(input_file) df['Salary'] = np.random.randint(50000, 100000, df.shape[0]) df.to_excel(output_file, index=False) |
The newly created file, “modified_data.xlsx”, will look like this:
ID Name Age Salary 0 1 Alice 25 80179 1 2 Bob 30 85736 2 3 Carol 21 91364 3 4 Dan 27 58718
Conclusion
In this tutorial, we have learned how to add a new column to an Excel file using Python with the help of pandas and openpyxl libraries. This example demonstrates how easily Python can be used to manipulate Excel files programmatically, making it an excellent tool for automating data analysis and manipulation tasks.