Python has a wide range of libraries and modules that make it easy to import and analyze data in various formats. A particularly popular library for this purpose is Pandas, which provides high-performance, easy-to-use data structures, and data analysis tools.
In this tutorial, we will be using both Pandas and the openpyxl library to load and manipulate data.
Step 1: Installing Required Libraries
Before we begin, ensure that you have Python installed on your machine. Also, you need two major Python packages installed – Pandas for data manipulation and Openpyxl for working with Excel files. If you do not have these packages, you can install them using pip:
1 |
pip install pandas openpyxl |
Step 2: Loading Data into a Pandas DataFrame
Pandas is built for data manipulation and it represents data in a tabular structure called DataFrame. The first step is to load our external data into a Pandas DataFrame for easy processing. The data can be in different formats like CSV, JSON, or even databases. In this tutorial, we’ll be using a CSV file. Suppose we have a CSV file called ‘sample_data.csv’ with the following content:
Name,Age,Country John,35,USA Kim,28,Canada Boris,30,UK Alia,29,India
We’ll start by loading this data into a Pandas DataFrame. Here’s the code to do so:
1 2 3 4 5 6 7 |
import pandas as pd # Read data from CSV file data = pd.read_csv('sample_data.csv') # Display the DataFrame print(data) |
The output should look like this:
Name Age Country 0 John 35 USA 1 Kim 28 Canada 2 Boris 30 UK 3 Alia 29 India
Step 3: Creating a New Excel Workbook
Next, we’ll create a new Excel workbook using openpyxl:
1 2 3 4 |
from openpyxl import Workbook # Initialize a new Excel workbook workbook = Workbook() |
Step 4: Writing Pandas DataFrame to Excel Sheet
Now, we’ll transfer the data from our Pandas DataFrame to the Excel sheet:
1 2 3 4 5 6 7 8 |
from openpyxl.utils.dataframe import dataframe_to_rows # Get the active sheet in the Excel workbook sheet = workbook.active # Write data from DataFrame to the sheet for r in dataframe_to_rows(data, index=False, header=True): sheet.append(r) |
In this code snippet, we first obtain the active sheet in the workbook (usually the first sheet) and then use dataframe_to_rows method from openpyxl.utils to convert the DataFrame into rows that can be appended to the sheet.
Step 5: Saving the Excel Workbook
Finally, we’ll save the Excel workbook to a file:
1 2 |
# Save the Excel workbook to a file workbook.save('output.xlsx') |
The full code for importing data into Excel using python:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
import pandas as pd from openpyxl import Workbook from openpyxl.utils.dataframe import dataframe_to_rows # Read data from CSV file data = pd.read_csv('sample_data.csv') # Initialize a new Excel workbook workbook = Workbook() # Get the active sheet in the Excel workbook sheet = workbook.active # Write data from DataFrame to the sheet for r in dataframe_to_rows(data, index=False, header=True): sheet.append(r) # Save the Excel workbook to a file workbook.save('output.xlsx') |
Conclusion
In this tutorial, we have demonstrated how to import data into Excel using Python. By leveraging the power of Pandas and Openpyxl, you can now easily import and manipulate data from various sources and save it into Excel format for further analysis. This method is highly customizable and can be extended for more complex scenarios like working with multiple sheets or data sources, applying data transformations or validations, and many more.