A RegEx or Regular Expression, is a sequence of characters that forms a search pattern.
RegEx can be used to check if a string contains the specified search pattern.
Python has a built-in package called re
, which can be used to work with Regular Expressions.
Import the re
module.
# Search the string to see if it starts with "The" and ends with "India": import re txt = "The rain in India" x = bool(re.search("^The.*India$", txt)) print(x) # prints True
RegEx Functions
The re
module offers a set of functions that allows us to search a string for a match.
findall() function
The findall()
function returns a list containing all matches.
Note: If no matches are found, an empty list is returned.
import re txt = "The rain in India" x = re.findall("in", txt) print(x) # prints ['in', 'in']
search() function
The search()
function searches the string for a match, and returns a Match object if there is a match.
If there is more than one match, only the first occurrence of the match will be returned.
Note: If no matches are found, the value None
is returned.
# Search for the first white-space character in the string import re txt = "The rain in India" x = re.search("\s", txt) print("The first white-space character is located in position:", x.start()) # prints The first white-space character is located in position: 3
split() function
The split()
function returns a list where the string has been split at each match.
# Split at each white-space character: import re #Split the string at every white-space character: txt = "The rain in India" x = re.split("\s", txt) print(x) # prints ['The', 'rain', 'in', 'India']
You can control the number of occurrences by specifying the maxsplit
parameter.
# Split the string only at the first occurrence import re txt = "The rain in India" x = re.split("\s", txt, 1) print(x) # prints ['The', 'rain in India']
sub() function
The sub()
function replaces the matches with the text of your choice.
# Replace every white-space character with the number 9: import re txt = "The rain in India" x = re.sub("\s", "9", txt) print(x) # prints The9rain9in9India
You can control the number of replacements by specifying the count
parameter.
import re txt = "The rain in India" x = re.sub("\s", "9", txt, 2) print(x) # prints The9rain9in India
You can learn more about RegEx from here
Try the following example in the editor below.
Given a string txt, perform the operations as defined in the comments.