How to Read Data From an Excel File in Python

This tutorial will teach you a fundamental skill for data analysis and manipulation in Python: reading data from an Excel file. You will learn how to import Excel spreadsheets and load them as Python data structures using essential libraries like pandas.

This capability is critical in data science and machine learning, where datasets are often stored and shared in Excel files.

Don’t worry if you have no previous experience; we will guide you through this tutorial step-by-step.

Step 1: Install Required Libraries

Python offers several excellent packages for reading Excel files. We’ll be using pandas, an open-source library that is the go-to tool for data manipulation and analysis in Python. It’s not included in the standard Python library, so we’ll need to install it using pip:

Step 2: Import the Pandas Library

After installing the pandas library, we need to import it into our Python environment. We do this using the import statement:

Step 3: Read the Excel File

Reading an Excel file is as easy as calling the read_excel function. Note that you should specify the file’s full path if it’s not in the same directory as your Python script:

With this line of code, the Excel file ‘filename.xlsx’ will be read and its content loaded into the DataFrame ‘data’. A DataFrame is a 2-dimensional labeled data structure with columns and rows. It can contain heterogeneous data types, comparable to a spreadsheet or SQL table.

Step 4: Working with the Data

Once the data is loaded into a DataFrame, we can perform multiple operations on it. For instance, let’s view the first five records in the DataFrame:

Sample Excel File Content

Assuming our Excel file, filename.xlsx, has the following content:

The complete Python script therefore becomes:

Expected Output

    Name  Age Profession
0   John   34   Engineer
1  Alice   29     Doctor
2    Tom   45    Teacher
3   Emma   39     Writer
4  Harry   42  Scientist

Conclusion

That’s it for this tutorial! You have learned how to read data from an Excel file in Python using the pandas library. As you can see, Python makes it incredibly straightforward to import and manipulate Excel data.

Understanding how to do this is a foundational skill for any data science, machine learning, or analytics project.

As an exercise, you can try reading different Excel files and explore data manipulations like filtering, sorting, and grouping.