How To Remove Special Characters In Excel Using Python

In this tutorial, you will learn how to remove special characters in Excel using Python. Removal of such characters is important when working with data that has been collected from different sources and formats. By using Python, you can automate the cleaning process and make your data more reliable and easier to analyze.

To accomplish this task, we will be using the open-source pandas library, which helps in data manipulation and analysis, and openpyxl library, which specializes in working with Excel files.

Step 1: Installing Required Libraries

If you haven’t already installed pandas and openpyxl, you can install them using pip with the following commands:

Step 2: Reading the Excel File

Assuming you have an Excel file named data.xlsx with the following content:

Name, Age, Email
        John Doe, 29, [email protected]
        Jane Smith# 22? [email protected]
        Alice;[email protected]

Let’s first read this Excel file into a pandas DataFrame. To do this, import the required libraries and use the pd.read_excel() function.

Step 3: Defining a Function to Remove Special Characters

Next, create a function to remove special characters from a given string. In our case, we will use the re (regular expressions) library to remove all non-alphanumeric characters except for spaces and the ‘@’ symbol (for email addresses).

Step 4: Applying the Function to Each Cell in the DataFrame

Now we need to apply our remove_special_chars() function to each cell in the DataFrame. We can accomplish this using the applymap() function provided by pandas.

Step 5: Saving the Cleaned DataFrame as a New Excel File

Finally, we can save the cleaned DataFrame as a new Excel file, which will be free of special characters.

We should now have an Excel file named ‘cleaned_data.xlsx’ containing:

Name, Age, Email
        John Doe, 29, [email protected]
        Jane Smith, 22, [email protected]
        Alice, 25, [email protected]

Full Code

Conclusion

In this tutorial, you’ve learned how to remove special characters in Excel files using Python’s pandas and openpyxl libraries. By following these steps, you can apply this technique to clean up your own Excel data files and make them more reliable and easier to analyze.