Dremendo Tag Line

Regular Expression in Python Programming

Regular Expression in Python

In this lesson, we will understand what is Regular Expression in Python Programming and how to implement them along with some examples.

What is Regular Expression in Python?

Regular Expression is a technique to find a certain pattern in a string using a sequence of characters that forms a search pattern. The regular expression is also known as RegEx.

Let's first see the list of special characters used to create a search pattern in regular expression.

video-poster

Special Characters in Regular Expressions

Character Meaning
\A Return matched characters if found at the beginning of a string.
\Z Return matched characters if found at the end of a string.
\b Return matched characters if found at the beginning or at the end of a word.
\B Return matched characters if they are present but not at the beginning or end of a word.
\d It represents any digit from 0 to 9.
\D It represents any non-digit characters.
\s It represents white space characters like space, tab and newline.
\S It represents non-whitespace characters.
\w It represents any alphanumeric characters.
\W It represents non-alphanumeric characters.
\ It escapes special characters like \d, \A, \B, etc.
^... It represents starts with specified characters.
...$ It represents ends with specified characters.
. It represents any single character except newline character.
? It represents zero or one occurrence of character.
* It represents zero or more occurrences of characters.
+ It represents one or more occurrences of characters.
M|N It (|) represents logical OR. That means either match regex M or regex N.
[...] It represents a set of characters
[^...] It matches every character in a string except the characters mentioned inside the square brackets after the caret (^) symbol.
(...) It matches the regex pattern mentioned inside the parentheses and captures the matched result.
{n} It represents the exact number of occurrences of characters.
{from, to} It represents the number of occurrences of characters between from and to, where from means the minimum number of characters and to means the maximum number of characters.

Raw String

We create regex search patterns using raw string. The raw string is created by adding a character r at the beginning of a string. See the example given below.

Raw String Example

st = r'Hello World'
print(st)

Output

Hello World

re Module

To use regular expression in a python program, we must import the re module that contains the following functions that allow us to search for a regex pattern in a string.

re.match()

The re.match() function only looks for a matching pattern at the beginning of a string. If the match is found, it returns the matching string as a match object; else, return None. We use the group() method of the match object to get matched string.

Example of re.match

import re

st = 'Hello I am learning python regular expression.'
regex = r'He\w+'
result = re.match(regex, st)
if result:
    print(result.group())
else:
    print(result)

Output

Hello

In the above program, the r'He\w+' pattern means any word that starts with the alphabet He.

re.search()

The re.search() function looks for the first occurrence of a matching pattern in the entire string.

Example of re.search

import re

st = 'We know Peter is a good boy. Peter loves to read story books.'
regex = r'Pete\w+'
result = re.search(regex, st)
if result:
    print(result.group())
else:
    print(result)

Output

Peter

re.findall()

The re.findall() function returns all the first occurrences of a matching pattern in a string as a list.

Example of re.findall

import re

st = 'We know Peter is a good boy. Peter loves to read story books.'
regex = r'Pete\w+'
result = re.findall(regex, st)
if len(result)>0:
    print(result)
else:
    print('Not Found')

Output

['Peter', 'Peter']

re.split()

The re.split() function splits a string according to a given regex pattern, and the split pieces are returned as a list.

Example of re.split

import re

st = '12-Aug-2022'
regex = r'-'
result = re.split(regex, st)
if len(result)>0:
    print(result)
else:
    print('Not Found')

Output

['12', 'Aug', '2022']

re.sub()

The re.sub() function searches a given pattern in a string, replaces the string that matched the pattern with a replacement string, and returns the new string.

Example of re.sub

import re

st = 'We know Peter is a good boy. Peter loves to read story books.'
regex = r'Pete\w+'
result = re.sub(regex, 'Thomas', st)
print(result)

Output

We know Thomas is a good boy. Thomas loves to read story books.

re.IGNORECASE

The re.IGNORECASE() flag matches the regex pattern in a string by ignoring the case.

Example of re.sub

import re

st = 'We know Peter is a good boy. Peter loves to read story books.'
regex = r'peter'
result = re.sub(regex, 'Thomas', st, flags=re.IGNORECASE)
print(result)

Output

We know Thomas is a good boy. Thomas loves to read story books.

Create RegEx Search Pattern

Here we will learn to create various types of regex search patterns using the special characters with the help of multiple examples.

Example 1: Create a regex pattern to search if the word Hello exits at the beginning of a given string.

import re

st = 'Hello, we are learning regular expression in python'
regex = r'\AHello'
result = re.search(regex, st)
if result:
    print(result.group())
else:
    print(result)

Output

Hello

Example 2: Create a regex pattern to search if the word python exits at the end of a given string.

import re

st = 'Hello, we are learning regular expression in python'
regex = r'python\Z'
result = re.search(regex, st)
if result:
    print(result.group())
else:
    print(result)

Output

python

Example 3: Create a regex pattern to find those words that begin with the characters gra and those that end with the characters tion in a given string.

import re

st = 'grateful, horse, goat, grade, grammar, promotion, numeric, corporation'
regex1 = r'\bgra\w+'
regex2 = r'\w+tion\b'
result1 = re.findall(regex1, st)
result2 = re.findall(regex2, st)
print(result1)
print(result2)

Output

['grateful', 'grade', 'grammar']
['promotion', 'corporation']

Example 4: Create a regex pattern to find those words where the string ful is present but not at the beginning of a word.

import re

st = 'fulfill, careful, beautiful, grade, election, promotion, numeric'
regex = r'\w*\Bful'
result = re.findall(regex, st)
print(result)

Output

['careful', 'beautiful']

Example 5: Create a regex pattern to find those words where the string gra is present but not at the end of a word.

import re

st = 'grateful, program, promotion, grade, numeric, grammar'
regex = r'\w*gra\B\w*'
result = re.findall(regex, st)
print(result)

Output

['grateful', 'program', 'grade', 'grammar']

Example 6: Create a regex pattern to find only numbers from a given string.

import re

st = 'College registration number is 452159 and account number is 30156492'
regex = r'\d+'
result = re.findall(regex, st)
print(result)

Output

['452159', '30156492']

Example 7: Create a regex pattern to find all words (containing only letters) from a given string.

import re

st = 'Peter=78 Thomas=82 William=97 Alex=47'
regex = r'[a-zA-Z]\w*'
result = re.findall(regex, st)
print(result)

Output

['Peter', 'Thomas', 'William', 'Alex']

Example 8: Create a regex pattern to find all words that start with ra or ex from a given string.

import re

st = 'rain, example, range, car, bike, expert, joker'
regex = r'(ra\w*|ex\w*)'
result = re.findall(regex, st)
print(result)

Output

['rain', 'example', 'range', 'expert']

Example 9: Create a regex pattern to find all words that start with ra or re from a given string.

import re

st = 'rain, example, reason, rank, remark, expert'
regex = r'r[ae]\w*'
result = re.findall(regex, st)
print(result)

Output

['rain', 'reason', 'rank', 'remark']

Example 10: Create a regex pattern to find all dates from a given string.

import re

st = 'Robert: 5-12-2022, Sophia: 18-06-2004, Henry: 24-08-1998'
regex = r'\d{1,2}-\d{1,2}-\d{4}'
result = re.findall(regex, st)
print(result)

Output

['5-12-2022', '18-06-2004', '24-08-1998']

Example 11: Create a regex pattern to find all words that start with a capital letter from a given string.

import re

st = 'Robert: 5-12-2022, Sophia: 18-06-2004, Henry: 24-08-1998'
regex = r'[A-Z][a-z]*'
result = re.findall(regex, st)
print(result)

Output

['Robert', 'Sophia', 'Henry']

Example 12: Create a regex pattern to find all words that contain only three letters in a given string.

import re

st = 'A Quick Brown Fox Jump Over The Lazy Dog'
regex = r'\b\w{3}\b'
result = re.findall(regex, st)
print(result)

Output

['Fox', 'The', 'Dog']

Example 13: Create a regex pattern to find if a string starts with specified characters.

import re

st = 'we are living in a modern age'
regex = r'^we'
result = re.search(regex, st)
if result:
    print(result.group())
else:
    print(result)

Output

we

Example 14: Create a regex pattern to find if a string ends with specified characters.

import re

st = 'we are living in a modern age'
regex = r'ge$'
result = re.search(regex, st)
if result:
    print(result.group())
else:
    print(result)

Output

ge

Example 15: Create a regex pattern to find all the letters from a word except vowels.

import re

st = 'aeroplane'
regex = r'[^aeiou]'
result = re.findall(regex, st)
print(result)

Output

['r', 'p', 'l', 'n']

Example 16: Create a regex pattern to find all the symbols from a string except letters and digits.

import re

st = 'Date: 22/08/2022 and Time: 10:30 am'
regex = r'\W'
result = re.findall(regex, st)
print(result)

Output

[':', ' ', '/', '/', ' ', ' ', ':', ' ', ':', ' ']

Example 17: Create a regex pattern to find all the symbols from a string except letters, digits and space.

import re

st = 'Date: 22/08/2022 and Time: 10:30 am'
regex = r'[^\s\w]'
result = re.findall(regex, st)
print(result)

Output

[':', '/', '/', ':', ':']

Example 18: Create a regex pattern to find all the words in double quotes in a string.

import re

st = '"Date": 22/08/2022 and "Time": 10:30 am'
regex = r'[\"]\w*[\"]'
result = re.findall(regex, st)
print(result)

Output

['"Date"', '"Time"']