How to Filter Data in Excel Using Python

Microsoft Excel is a powerful tool for storing, organizing, and processing data. However, it can be cumbersome and time-consuming if you need to filter or manipulate large datasets.

This is where the Python programming language comes in, which offers powerful and efficient tools for data processing tasks. In this tutorial, we will learn how to filter data in Excel using Python.

Step 1: Install Required Libraries

Python provides several libraries to work with Excel files. The libraries we are going to use are Pandas for data manipulation and openpyxl for reading and writing Excel files.
Use the following commands to install them:

Step 2: Load the Excel File

We first need to import the necessary libraries and load our Excel file. For the purposes of this tutorial, we assume that our Excel file is named ‘sample.xlsx’ and located in the same directory as our script.

This will load the data from the Excel file into a pandas DataFrame, which is a 2-dimensional labeled data structure with columns of potentially different types.

Step 3: Filter the Data

We can now apply filters to our data. Let’s say we want to filter our data so that we only have rows in which the value of column ‘A’ is greater than 10:

This will give us a new DataFrame, filtered_data, containing only the rows where the condition is true.

Step 4: Write the Filtered Data to an Excel File

Once we have our filtered data, we can write it back to an Excel file using the to_excel method provided by pandas:

This will create a new Excel file named ‘filtered.xlsx’, containing only the filtered data.

Full Code:

Conclusion

By using Python’s powerful suite of data processing libraries, we can easily filter and manipulate data in Excel files, even if the datasets are large and complex. This makes Python a valuable tool for anyone who regularly works with Excel files and is looking to automate and improve their workflow.