How to Validate an Excel File in Python

In this tutorial, we are going to learn how to validate an Excel file in Python. As Python developers, you may come across tasks where you need to validate the data in the Excel files before processing it further.

Validations can include checking if required columns exist, if no null values are present in certain columns and more. This tutorial will guide you through easy steps to achieve this using Python.

Step 1: Install Important Libraries

To validate an Excel file in Python, we will use two important libraries – pandas and openpyxl.

Pandas is an open-source data analysis and manipulation tool, while openpyxl enables Python to read/write Excel 2010 xlsx/xlsm/xltx/xltm files.
You can install them using the pip package installer:

Step 2: Read the Excel File

Let’s say the Excel file (‘example.xlsx’) you wish to validate has the following content:

NameAgeEmail
John34[email protected]
Annanull[email protected]
Tomnull[email protected]

The first step is to read the Excel file and load it as a pandas DataFrame.

Step 3: Validate Data in Excel File

The next step is to validate the data in the Excel file. Let’s start by checking if there are null values in the ‘Age’ column and if the ‘Email’ column exists.

Step 4: Validate More Cases

You can add as many validations as required for your task. For example, you may wish to check if data in a specific column follows a certain pattern.

Full Code

Below is the full code combining the above steps together.

Null value found in 'Age' column

Conclusion

As shown in this tutorial, Python’s pandas and openpyxl libraries make it easy to validate an Excel file. Being able to validate Excel sheets can be incredibly handy when dealing with datasets where the quality and correctness of the data are crucial.

The steps shown above can be extended and modified as per your requirements to create a robust Excel file checker for your projects.