In this tutorial, you will learn how to extract phone numbers from text in Python. This is a useful technique in data mining, where you need to glean useful information from a large amount of text data. We will walk you through the process step-by-step, using Python’s Regex (Regular Expression) module to identify and extract the phone numbers from any text.
Step 1: Import the Required Libraries
To start, we need to import the regex module. Regex, or Regular Expression, is a sequence of characters that defines a search pattern.
1 |
import re |
Step 2: Define the Text Data
In this example, we will create a simple string, imaginatively named text_data, to act as our raw data. This string will contain some phone numbers.
1 |
text_data = "Call me at 555-123-4567 or 555-321-7654" |
Step 3: Create the Search Pattern
Next, we will craft our search pattern. This is where the regex magic comes in. In our case, we are looking for phone numbers with the format xxx-xxx-xxxx.
1 |
pattern = "\\b\\d{3}-\\d{3}-\\d{4}\\b" |
Step 4: Find Matches
Time to run the search! We will apply our regex pattern to the string using the findall() function from the re module.
1 2 |
matches = re.findall(pattern, text_data) print(matches) |
Step 5: Extract the Phone Numbers
The findall() function will return a list of all matches in the text_data. In this case, it will output the two phone numbers we defined in the string.
['555-123-4567', '555-321-7654']
Full Python Code:
Here’s the full code with all the steps:
1 2 3 4 5 |
import re text_data = "Call me at 555-123-4567 or 555-321-7654" pattern = "\\b\\d{3}-\\d{3}-\\d{4}\\b" matches = re.findall(pattern, text_data) print(matches) |
Conclusion
There you have it – a simple and effective way to extract phone numbers from a text in Python, using Regular Expressions. This is just the tip of the iceberg. Regex is an extremely powerful tool in Python for pattern searching and manipulation. With it, you can extract emails, dates, URLs, and a whole lot more. Have fun experimenting!