Reading multiple files in a loop is a common task in data science and programming. In this tutorial, we will learn how to read multiple files in a loop using Python.
Steps:
1. put three files into the my_dir directory:
filename1.csv
Date,Value 2022-01-01,100 2022-02-01,200
filename2.csv
Name, Age, Location, Description Alice, 30, New York, "Software developer, likes hiking" Bob, 25, San Francisco, "Data scientist, enjoys cooking" Cathy, 28, Los Angeles, "UX designer, loves traveling" David, 32, Boston, "Product manager, passionate about photography"
filename3.csv
Name Age Occupation John Doe aaa Jeo Duck bbb
2. import the necessary library modules that will enable us to handle files. We will use the os
module to get a list of files in a directory, and the pandas
module to read files.
1 2 |
import os import pandas as pd |
3. The next step is to use the os
module to get a list of files in the directory. We will create a function that will take a directory path as an argument, and return a list of file paths.
1 2 3 4 5 6 7 |
def get_file_list(directory_path): file_list = [] for root, directories, files in os.walk(directory_path): for filename in files: file_path = os.path.join(root, filename) file_list.append(file_path) return file_list |
4. Now, let’s use the pandas
module to read each file in the directory. We will create a loop that will iterate through the list of file paths and read each file using the read_csv
function from the pandas
module. In this example, we will assume that all files are in CSV format.
1 2 3 4 5 6 |
directory_path = '/path/to/directory' file_list = get_file_list(directory_path) for file_path in file_list: data = pd.read_csv(file_path) # Do something with the data |
5. You can now do whatever you want with the data. In this example, we are just printing the first 5 rows of each file.
1 2 3 |
for file_path in file_list: data = pd.read_csv(file_path) print(data.head()) |
Output:
Date Value 0 2022-01-01 100 1 2022-02-01 200 Name ... Description Alice 30 ... likes hiking" Bob 25 ... enjoys cooking" Cathy 28 ... loves traveling" David 32 ... passionate about photography" [4 rows x 4 columns] Name\tAge\tOccupation 0 John\tDoe\taaa 1 Jeo\tDuck\tbbb
Conclusion:
In this tutorial, we have learned how to read multiple files in a loop using Python. We used the os
module to get a list of file paths, and the pandas
module to read each file. This method can be used to preprocess data or to analyze data in bulk.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import os import pandas as pd def get_file_list(directory_path): file_list = [] for root, directories, files in os.walk(directory_path): for filename in files: file_path = os.path.join(root, filename) file_list.append(file_path) return file_list directory_path = '/path/to/directory' file_list = get_file_list(directory_path) for file_path in file_list: data = pd.read_csv(file_path) print(data.head()) |