Merging two CSV files is a common task that you might need to perform when working with data in Python. There are different ways to merge CSV files, one popular method is using the Pandas library. However, in this tutorial, we will showcase a simple approach to merging two CSV files in Python without using Pandas.
We will merge two CSV files using Python’s built-in csv
module, which provides functionality to read and write data to and from CSV files.
Step 1: Prepare Two CSV Files
First, let’s prepare two sample CSV files that we want to merge. For this tutorial, we will use the following two CSV files named file1.csv
and file2.csv
.
Contents of file1.csv
:
id, name, age 1, Alice, 28 2, Bob, 32 3, Charlie, 24
Contents of file2.csv
:
id, city, country 1, Paris, France 2, London, United Kingdom 3, New York, United States
We will merge these two CSV files based on the id column.
Step 2: Read CSV Files Using Python’s CSV Module
In this step, we will import the csv
module and read the contents of both CSV files using the csv.reader()
function. This function returns an iterable object which we will convert into a list for easier data manipulation.
1 2 3 4 5 6 7 8 9 |
import csv with open("file1.csv", "r") as file: csv_reader1 = csv.reader(file) file1_data = [row for row in csv_reader1] with open("file2.csv", "r") as file: csv_reader2 = csv.reader(file) file2_data = [row for row in csv_reader2] |
Step 3: Merge CSV Data
Now that we have the contents of both files, we can proceed to merge them. In this tutorial, we will first create a dictionary where the keys are the id
values of the first CSV file, and the values are the remaining data (name, age).
We will then iterate over the second file (city, country information) and add the corresponding data to the dictionary according to the matching id
.
1 2 3 4 5 6 7 8 9 10 |
merged_data = {} for row in file1_data[1:]: id = row[0] merged_data[id] = row[1:] for row in file2_data[1:]: id = row[0] if id in merged_data: merged_data[id].extend(row[1:]) |
Step 4: Write the Merged Data to a New CSV File
Finally, we will write the merged data to a new output CSV file (output.csv
) using the csv.writer()
function.
1 2 3 4 5 6 7 8 9 |
with open("output.csv", "w", newline="") as file: csv_writer = csv.writer(file) # Write header header = file1_data[0] + file2_data[0] csv_writer.writerow(header) # Write merged data for id, data in merged_data.items(): csv_writer.writerow([id] + data) |
Now, after running the code, you will have a new output.csv
file with the merged data.
Contents of output.csv
:
id, name, age, city, country 1, Alice, 28, Paris, France 2, Bob, 32, London, United Kingdom 3, Charlie, 24, New York, United States
Full Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
import csv with open("file1.csv", "r") as file: csv_reader1 = csv.reader(file) file1_data = [row for row in csv_reader1] with open("file2.csv", "r") as file: csv_reader2 = csv.reader(file) file2_data = [row for row in csv_reader2] merged_data = {} for row in file1_data[1:]: id = row[0] merged_data[id] = row[1:] for row in file2_data[1:]: id = row[0] if id in merged_data: merged_data[id].extend(row[1:]) with open("output.csv", "w", newline="") as file: csv_writer = csv.writer(file) # Write header header = file1_data[0] + file2_data[0] csv_writer.writerow(header) # Write merged data for id, data in merged_data.items(): csv_writer.writerow([id] + data) |
Conclusion
In this tutorial, we demonstrated how to merge two CSV files in Python without using the Pandas library. While this method is limited compared to more advanced libraries, it works for simple scenarios and can be useful when Pandas is not available or not required for a specific task.