In this tutorial, we will learn how to read xlsb (Excel Binary Format) files in Python using the popular library pyxlsb.
Xlsb files are more efficient for storage and quicker in reading and writing in comparison to typical Excel xlsx files, but they are not supported by many third-party libraries. Pyxlsb is a very useful library to read the xlsb file format in Python. Let’s dive into the steps.
Step 1: Install the required library
First, you need to install the pyxlsb library. You can install the library using pip by running the following command:
1 |
pip install pyxlsb |
Step 2: Read the xlsb file in Python
Once you have installed the required library, you can read an xlsb file by opening it with the help of the pyxlsb package. To read a file, follow these steps:
- Import the pyxlsb package.
- Open the xlsb file using the ‘with’ statement.
- Fetch and print the sheet names using the ‘get_sheet_names()’ function.
Here’s an example code:
1 2 3 4 |
import pyxlsb with pyxlsb.open_workbook('example.xlsb') as wb: print(wb.get_sheet_names()) |
Step 3: Read data from a specific sheet
To read data from a specific sheet, you can follow these steps:
- Get the sheet by its name or index using the ‘get_sheet()’ function.
- Iterate through the rows using the ‘rows()’ function.
- Access the cell value using the ‘r.Cells’ property.
Here’s an example code:
1 2 3 4 5 6 |
import pyxlsb with pyxlsb.open_workbook('example.xlsb') as wb: with wb.get_sheet('Sheet1') as sheet: for row in sheet.rows(): print([item.v for item in row]) # Returns a list of values in each row |
You can replace ‘Sheet1’ with your sheet name or use the index number (e.g., 1 for the first sheet, 2 for the second sheet, etc.) to access the sheet.
Step 4: Store the data in a DataFrame using pandas
Sometimes, it’s easier to manipulate the data if it’s stored in a pandas DataFrame. To store the data in a DataFrame, follow these steps:
- Install and import the pandas package.
- Create a list to store all rows from the sheet.
- Convert the list of rows into a pandas DataFrame.
Here’s the example code:
1 2 3 4 5 6 7 8 9 10 11 12 |
import pyxlsb import pandas as pd data = [] with pyxlsb.open_workbook('example.xlsb') as wb: with wb.get_sheet('Sheet1') as sheet: for row in sheet.rows(): data.append([item.v for item in row]) df = pd.DataFrame(data) print(df.head()) |
Here’s an example of the content of the ‘example.xlsb’ file:
1 |
1, 'John', 'Doe', 30<br>2, 'Jane', 'Doe', 28<br>3, 'Alice', 'Smith', 25<br>4, 'Bob', 'Johnson', 22 |
The output of the code will be:
0 1 2 3 0 1 John Doe 30 1 2 Jane Doe 28 2 3 Alice Smith 25 3 4 Bob Johnson 22
Full Code
1 2 3 4 5 6 7 8 9 10 11 12 |
import pyxlsb import pandas as pd data = [] with pyxlsb.open_workbook('example.xlsb') as wb: with wb.get_sheet('Sheet1') as sheet: for row in sheet.rows(): data.append([item.v for item in row]) df = pd.DataFrame(data) print(df.head()) |
Remember to replace ‘example.xlsb’ in the code with the path to your xlsb file.
Conclusion
In this tutorial, we have learned how to read xlsb files in Python using the pyxlsb library. We have also shown how to access a specific sheet in the xlsb file, iterate the rows, and then store the data in a pandas DataFrame for further processing.