Free-Demo-Class

Regular Expression in Python

A regular expression is a sequence of characters that assists you to match or locate other strings, or set of strings. This is done through special syntax. Regular expressions are also referred to as regexp, regex, etc. They are basically used for matching patterns in text.

Though we can see the usage of regular expressions in a common manner, they are quite tricky. However, with some practice, you will become proficient in it. Even if you find the usage of regular expressions difficult, Python gives you to the option of learning alternatives.

Regular expressions apply two types of characters, including metacharacters and literals. The regular expression package offers several methods to carry out queries on an input string. re.match(), re search(), re findall(), re sub() etc. are some of the most often applied methods of regular expressions.

Syntax:

For instance, if you want to write a regular expression search, you can do it as follows:

match = re.search(pat, str)

Example Program:

Regular expressions (called REs, or regexes, or regex patterns) are essentially a tiny, highly specialized programming language embedded inside Python and made available through the remodule

“easily” is a substring of the string “Regular expressions easily explained!”:

>>> s = “Regular expressions easily explained!”

>>>”easily” in s

True

pattern = “Cookie”

sequence = “Cookie”

if re.match(pattern, sequence):

print(“Match!”)

else: print(“Not a match!”)

Example

import re

# Lets use a regular expression to match a date string. Ignore

# the output since we are just testing if the regex matches.

regex = r”([a-zA-Z]+) (\d+)”

if re.search(regex, “June 24”):

# Indeed, the expression “([a-zA-Z]+) (\d+)” matches the date string

 

# If we want, we can use the MatchObject’s start() and end() methods

# to retrieve where the pattern matches in the input string, and the

# group() method to get all the matches and captured groups.

match = re.search(regex, “June 24”)

 

# This will print [0, 7), since it matches at the beginning and end of the

# string

print(“Match at index %s, %s” % (match.start(), match.end()))

 

# The groups contain the matched values.  In particular:

#    match.group(0) always returns the fully matched string

#    match.group(1), match.group(2), … will return the capture

#            groups in order from left to right in the input string

#    match.group() is equivalent to match.group(0)

 

# So this will print “June 24”

print(“Full match: %s” % (match.group(0)))

# So this will print “June”

print(“Month: %s” % (match.group(1)))

# So this will print “24”

print(“Day: %s” % (match.group(2)))

else:

# If re.search() does not match, then None is returned

print(“The regex pattern does not match. :(“)

Example

#!/usr/bin/python

import re

line =”Cats are smarter than dogs”;

searchObj = re.search( r'(.*) are (.*?) .*’, line, re.M|re.I)

if searchObj:

print”searchObj.group() : “, searchObj.group()

print”searchObj.group(1) : “, searchObj.group(1)

print”searchObj.group(2) : “, searchObj.group(2)

else:

print”Nothing found!!”

When the above code is executed, it produces following result −

searchObj.group() :  Cats are smarter than dogs

searchObj.group(1) :  Cats

searchObj.group(2) :  smarter

import re

phone =”2004-959-559 # This is Phone Number”

# Delete Python-style comments

num = re.sub(r’#.*$’,””, phone)

print”Phone Num : “, num

# Remove anything other than digits

num = re.sub(r’\D’,””, phone)

print”Phone Num : “, num

When the above code is executed, it produces the following result −

Phone Num :  2004-959-559

Phone Num :  2004959559

Example of w+ and ^ Expression

“^”: This expression matches the start of a string

“w+”: This expression matches the alphanumeric character in the string

xx=”happy100 is fine”

>>> rl=re.findall(r”^\w+”,xx)

>>> print(rl)

[‘happy100’]

text=”software testing is easy to learn”

patterns = [‘software testing’, ‘abcd’]

>>> for pattern in patterns:

print(‘Looking for “%s” in “%s” ->’ % (pattern, text), end=’ ‘)

if re.search(pattern, text):

print(‘found a match!’)

else:

print(‘no match’)

Looking for “software testing” in “software testing is fun?” -> found a match!

Looking for “abcd” in “software testing is fun?” -> no match

abc = ‘guru99@google.com, careerguru99@hotmail.com, users@yahoomail.com’

>>> emails = re.findall(r'[\w\.-]+@[\w\.-]+’, abc)

>>> for email in emails:

print(email)

guru99@google.com

careerguru99@hotmail.com

users@yahoomail.com