How To Remove Stop Words From Text File In Python Without Nltk

In this tutorial, we’ll learn how to remove stop words from a text file in Python without using the Natural Language Toolkit (NLTK) library. Stop words are commonly used words such as “a”, “an”, and “the” that do not carry significant meaning and are often removed from the text during text preprocessing.

Step 1: Read the Text File

In this step, we’ll read the text file, store its content in a variable, and convert it to lowercase. Make sure the file is in the same directory as your Python script or provide the full path.

This is the content of the file:

Two roads diverged in a wood, and I-
I took the one less traveled by,
And that has made all the difference.

Create a new Python file and add the following code:

Replace your_file.txt with the name of your text file.

Step 2: Tokenize the Text

Now we’ll split the text into a list of words. This process is known as tokenization. We’ll use the split() method to split the text into words. Splitting makes it easier to access individual words and remove the stop words.

Add the following line of code to your Python file:

Step 3: Create a List of Stop Words

In this step, create a list of stop words you want to remove from your text. It is possible to get stop word lists for different languages on the internet, such as from StopWords ISO.

For this tutorial, we’ll create a small list of common English stop words.

Add the following code to your Python file:

You can customize this list by adding or removing words as per your requirements.

Step 4: Removing Stop Words

Now, iterate through the list of words and filter out the stop words using a list comprehension.

Add the following line of code:

Step 5: Combine Words Back Into Text

Once the stop words are removed from the list of words, we need to combine the words back into a single string.

Add the following line of code:

Step 6: Display the Cleaned Text

Finally, let’s display the cleaned text to make sure the stop words have been removed.

Add the following line to your Python file:

Now, run your code to see the cleaned text. The output should look like this:

example cleaned text without stop words

Full Code


In this tutorial, we have learned how to remove stop words from a text file in Python without using the NLTK library. This can be handy for cleaning and pre-processing text data for a variety of natural language processing (NLP) applications.