How To Remove Special Characters In Excel Using Python

In this tutorial, you will learn how to remove special characters in Excel using Python. Removal of such characters is important when working with data that has been collected from different sources and formats. By using Python, you can automate the cleaning process and make your data more reliable and easier to analyze.

To accomplish this task, we will be using the open-source pandas library, which helps in data manipulation and analysis, and openpyxl library, which specializes in working with Excel files.

Step 1: Installing Required Libraries

If you haven’t already installed pandas and openpyxl, you can install them using pip with the following commands:

1	pip install pandas

1	pip install openpyxl

Step 2: Reading the Excel File

Assuming you have an Excel file named data.xlsx with the following content:

Name, Age, Email
        John Doe, 29, [email protected]
        Jane Smith# 22? [email protected]
        Alice;[email protected]

Let’s first read this Excel file into a pandas DataFrame. To do this, import the required libraries and use the pd.read_excel() function.

import pandas as pd

# Read the Excel file into a DataFrame

excel_file = 'data.xlsx'

df = pd.read_excel(excel_file)

Step 3: Defining a Function to Remove Special Characters

Next, create a function to remove special characters from a given string. In our case, we will use the re (regular expressions) library to remove all non-alphanumeric characters except for spaces and the ‘@’ symbol (for email addresses).

import re

def remove_special_chars(input_str):

# Replace all non-alphanumeric characters except for spaces and '@' with an empty string

return re.sub('[^A-Za-z0-9@ ]+', '', input_str)

Step 4: Applying the Function to Each Cell in the DataFrame

Now we need to apply our remove_special_chars() function to each cell in the DataFrame. We can accomplish this using the applymap() function provided by pandas.

1 2	# Apply the function to each cell in the DataFrame df_clean = df.applymap(remove_special_chars)

Step 5: Saving the Cleaned DataFrame as a New Excel File

Finally, we can save the cleaned DataFrame as a new Excel file, which will be free of special characters.

1 2	# Save the cleaned DataFrame to a new Excel file df_clean.to_excel('cleaned_data.xlsx', index=False)

We should now have an Excel file named ‘cleaned_data.xlsx’ containing:

Name, Age, Email
        John Doe, 29, [email protected]
        Jane Smith, 22, [email protected]
        Alice, 25, [email protected]

Full Code