Merging cells in a csv file can seem like a really complex task especially when dealing with large datasets. However, with Python and its powerful libraries, it can be a simple and straightforward process. This tutorial will guide you on how to merge cells in a csv file using Python.
Creating a file
For this example, we have to create a file called “file.csv”:
Column1,Column2,Column3 A,1,Apple A,1,Banana A,2,Orange B,1,Grape B,2,Apple C,1,Pineapple C,2,Watermelon
Installing Pandas
Before we start, ensure that you have Python installed on your machine. If not, you can download it from the official Python website. You will also need to install the ‘pandas’ library, which is a powerful tool for data manipulation in Python. You can install it by typing the following command in your terminal:
1 |
pip install pandas |
Importing Necessary Libraries
Once you have all the necessary tools installed, the first step is to import the pandas library.
Here is how:
1 |
import pandas as pd |
Loading the CSV file
After importing pandas, you need to load the csv file that you plan to merge its cells. To load the csv file, we use pd.read_csv(‘file.csv’).
1 |
df = pd.read_csv('file.csv') |
Merging the Cells
Now that we have loaded our CSV file into a pandas dataframe, we can now merge cells in the CSV file.
We do this using pandas groupby and agg functions. Below is an example:
1 |
df = df.groupby(['Column1', 'Column2']).agg(lambda x: ', '.join(set(x.dropna()))).reset_index() |
In the example above, cells in the same row under ‘Column1’ and ‘Column2’ are merged together with a comma separating the values.
Saving the Resulting CSV
After merging the cells, it is now time to save our new transformed CSV file.
1 |
df.to_csv('new_file.csv', index=False) |
The Full Code
1 2 3 4 5 6 7 8 9 10 |
import pandas as pd # Loading the CSV file df = pd.read_csv('file.csv') # Merging the Cells df = df.groupby(['Column1', 'Column2']).agg(lambda x: ', '.join(set(x.dropna()))).reset_index() # Saving the Resulting CSV df.to_csv('new_file.csv', index=False) |
Output:
“Banana” and “Apple” are merged into a single row.
Column1,Column2,Column3 A,1,"Banana, Apple" A,2,Orange B,1,Grape B,2,Apple C,1,Pineapple C,2,Watermelon