How to Extract Data from Excel Using Python

Extracting data from Excel spreadsheets is a common task in data analysis and web development.

This involves reading and parsing Excel files, converting them to data structures, and manipulating or analyzing that data. Python programming language provides various libraries to perform these tasks efficiently.

In this tutorial, we will use openpyxl and pandas libraries to extract data from Excel files.

Step 1: Install Necessary Libraries

Python has several libraries for reading and writing Excel files. For this tutorial, we will use openpyxl and pandas. These libraries can be installed using pip, a package manager for Python.

Step 2: Read an Excel File

Once the necessary libraries are installed, we can use them to read an Excel file.

For the purpose of this tutorial, suppose we have an Excel file named “sample.xlsx”.

Here, the pd.read_excel() function is used to read an Excel file and convert it into a pandas DataFrame.

Replace ‘sample.xlsx’ with the path of your Excel file.

Step 3: View the Data from Excel

After successfully loading the data into a DataFrame, we can view it using the head() method. This method returns the first n rows from the DataFrame.

Step 4: Access Data in Specific Cells

To access data in specific cells of an Excel spreadsheet, we can use the Excel cell references. For example, to get the value in cell A1, execute the following command.

Here, 0 is the row index, and ‘A’ is the column name. Python uses zero-based indexing, so the index 0 corresponds to the first row.

The Full Code:

Output

    A   B   C
0   5  25  40
1  15  30   8
2   8  10  15
3  20   5  22
4  12  18   7
Value in cell A2:  5

Conclusion

Python, with the help of libraries such as openpyxl and pandas, makes it easy to extract data from an Excel file. These libraries also provide various methods to filter, manipulate, and visualize the extracted data. The applications of these features are immeasurable in the field of data science and web development.