Regular Expression in Python
A regular expression is a sequence of characters that assists you to match or locate other strings, or set of strings. This is done through special syntax. Regular expressions are also referred to as regexp, regex, etc. They are basically used for matching patterns in text.
Though we can see the usage of regular expressions in a common manner, they are quite tricky. However, with some practice, you will become proficient in it. Even if you find the usage of regular expressions difficult, Python gives you to the option of learning alternatives.
Regular expressions apply two types of characters, including metacharacters and literals. The regular expression package offers several methods to carry out queries on an input string. re.match(), re search(), re findall(), re sub() etc. are some of the most often applied methods of regular expressions.
Syntax:
For instance, if you want to write a regular expression search, you can do it as follows:
match = re.search(pat, str)
Example Program:
Regular expressions (called REs, or regexes, or regex patterns) are essentially a tiny, highly specialized programming language embedded inside Python and made available through the remodule
“easily” is a substring of the string “Regular expressions easily explained!”:
>>> s = “Regular expressions easily explained!”
>>>”easily” in s
True
pattern = “Cookie”
sequence = “Cookie”
if re.match(pattern, sequence):
print(“Match!”)
else: print(“Not a match!”)
Example
import re
# Lets use a regular expression to match a date string. Ignore
# the output since we are just testing if the regex matches.
regex = r”([a-zA-Z]+) (\d+)”
if re.search(regex, “June 24”):
# Indeed, the expression “([a-zA-Z]+) (\d+)” matches the date string
# If we want, we can use the MatchObject’s start() and end() methods
# to retrieve where the pattern matches in the input string, and the
# group() method to get all the matches and captured groups.
match = re.search(regex, “June 24”)
# This will print [0, 7), since it matches at the beginning and end of the
# string
print(“Match at index %s, %s” % (match.start(), match.end()))
# The groups contain the matched values. In particular:
# match.group(0) always returns the fully matched string
# match.group(1), match.group(2), … will return the capture
# groups in order from left to right in the input string
# match.group() is equivalent to match.group(0)
# So this will print “June 24”
print(“Full match: %s” % (match.group(0)))
# So this will print “June”
print(“Month: %s” % (match.group(1)))
# So this will print “24”
print(“Day: %s” % (match.group(2)))
else:
# If re.search() does not match, then None is returned
print(“The regex pattern does not match. :(“)
Example
#!/usr/bin/python
import re
line =”Cats are smarter than dogs”;
searchObj = re.search( r'(.*) are (.*?) .*’, line, re.M|re.I)
if searchObj:
print”searchObj.group() : “, searchObj.group()
print”searchObj.group(1) : “, searchObj.group(1)
print”searchObj.group(2) : “, searchObj.group(2)
else:
print”Nothing found!!”
When the above code is executed, it produces following result −
searchObj.group() : Cats are smarter than dogs
searchObj.group(1) : Cats
searchObj.group(2) : smarter
import re
phone =”2004-959-559 # This is Phone Number”
# Delete Python-style comments
num = re.sub(r’#.*$’,””, phone)
print”Phone Num : “, num
# Remove anything other than digits
num = re.sub(r’\D’,””, phone)
print”Phone Num : “, num
When the above code is executed, it produces the following result −
Phone Num : 2004-959-559
Phone Num : 2004959559
Example of w+ and ^ Expression
“^”: This expression matches the start of a string
“w+”: This expression matches the alphanumeric character in the string
xx=”happy100 is fine”
>>> rl=re.findall(r”^\w+”,xx)
>>> print(rl)
[‘happy100’]
text=”software testing is easy to learn”
patterns = [‘software testing’, ‘abcd’]
>>> for pattern in patterns:
print(‘Looking for “%s” in “%s” ->’ % (pattern, text), end=’ ‘)
if re.search(pattern, text):
print(‘found a match!’)
else:
print(‘no match’)
Looking for “software testing” in “software testing is fun?” -> found a match!
Looking for “abcd” in “software testing is fun?” -> no match
abc = ‘guru99@google.com, careerguru99@hotmail.com, users@yahoomail.com’
>>> emails = re.findall(r'[\w\.-]+@[\w\.-]+’, abc)
>>> for email in emails:
print(email)
guru99@google.com
careerguru99@hotmail.com
users@yahoomail.com