Extracting data from Excel spreadsheets is a common task in data analysis and web development.
This involves reading and parsing Excel files, converting them to data structures, and manipulating or analyzing that data. Python programming language provides various libraries to perform these tasks efficiently.
In this tutorial, we will use openpyxl and pandas libraries to extract data from Excel files.
Step 1: Install Necessary Libraries
Python has several libraries for reading and writing Excel files. For this tutorial, we will use openpyxl and pandas. These libraries can be installed using pip, a package manager for Python.
1 |
pip install openpyxl pandas |
Step 2: Read an Excel File
Once the necessary libraries are installed, we can use them to read an Excel file.
For the purpose of this tutorial, suppose we have an Excel file named “sample.xlsx”.
1 2 |
import pandas as pd df = pd.read_excel('sample.xlsx') |
Here, the pd.read_excel() function is used to read an Excel file and convert it into a pandas DataFrame.
Replace ‘sample.xlsx’ with the path of your Excel file.
Step 3: View the Data from Excel
After successfully loading the data into a DataFrame, we can view it using the head() method. This method returns the first n rows from the DataFrame.
1 |
df.head() |
Step 4: Access Data in Specific Cells
To access data in specific cells of an Excel spreadsheet, we can use the Excel cell references. For example, to get the value in cell A1, execute the following command.
1 |
value = df.at[0, 'A'] |
Here, 0 is the row index, and ‘A’ is the column name. Python uses zero-based indexing, so the index 0 corresponds to the first row.
The Full Code:
1 2 3 4 5 6 7 8 9 10 11 |
import pandas as pd # Read an Excel file df = pd.read_excel('sample.xlsx') # View the data from Excel print(df.head()) # Access data in a specific cell value = df.at[0, 'A'] print("Value in cell A1: ", value) |
Output
A B C 0 5 25 40 1 15 30 8 2 8 10 15 3 20 5 22 4 12 18 7 Value in cell A2: 5
Conclusion
Python, with the help of libraries such as openpyxl and pandas, makes it easy to extract data from an Excel file. These libraries also provide various methods to filter, manipulate, and visualize the extracted data. The applications of these features are immeasurable in the field of data science and web development.