When working with data in Python, one common scenario is to read data from multiple CSV (Comma-Separated Values) files located in multiple folders.
Reading and analyzing data from multiple CSV files becomes essential when you are working on projects that involve data analysis, data science, or machine learning.
This tutorial will go through a step-by-step guide on how to read CSV files from multiple folders in Python using the pandas
library and the os module in python.
Step 1: Install the pandas library
Before we start reading CSV files, let’s make sure that we have the necessary pandas
library installed on our system. You can install pandas
using pip
with the following command:
pip install pandas
Step 2: Import the necessary Libraries
In this step, we will import the necessary Python libraries:
1 2 |
import pandas as pd import os |
Step 3: Identify the folders containing the CSV files
Now we need to find the folders where the CSV files are located. For this tutorial, let’s assume that we have a folder named data
and inside that folder, we have two folders named folder1
and folder2
, and each folder contains multiple CSV files.
Example folder structure:
Step 4: Get a list of all folders
In this step, we will get a list of all folders in the data
folder using the os.listdir()
function.
1 2 |
data_folder = "data/" folders = [f for f in os.listdir(data_folder) if os.path.isdir(os.path.join(data_folder, f))] |
Step 5: Iterate through the folders and read CSV files
In this step, we will create a function to read a single CSV file, then iterate through the list of folders, read all the CSV files in each folder, and append them into a single pandas DataFrame.
1 2 3 4 5 6 |
def read_csv_from_folder(folder_path): files_list = os.listdir(folder_path) data = pd.concat(pd.read_csv(os.path.join(folder_path, f)) for f in files_list) return data all_data = pd.concat(read_csv_from_folder(os.path.join(data_folder, folder)) for folder in folders) |
With this we have our final dataframe all_data
containing all the data from multiple CSV files located in multiple folders.
Full Code:
1 2 3 4 5 6 7 8 9 10 11 12 |
import pandas as pd import os def read_csv_from_folder(folder_path): files_list = os.listdir(folder_path) data = pd.concat(pd.read_csv(os.path.join(folder_path, f)) for f in files_list) return data data_folder = "data/" folders = [f for f in os.listdir(data_folder) if os.path.isdir(os.path.join(data_folder, f))] all_data = pd.concat(read_csv_from_folder(os.path.join(data_folder, folder)) for folder in folders) |
Conclusion
In this tutorial, we’ve learned how to read CSV files from multiple folders in Python using the pandas
library and the os
module. This method is quite helpful when you are working with large datasets or when your data is spread across different folders in your project.