Regular Expressions with Python: Examples of common functions

Regular Expressions are tools to filter and grab information from a heap of strings. Python has got a lot of toys to deal with it. The module "re". I am going to give you a quick ride through it. The ones that we use/gonna use the most.

So just immediately launch your favourite IDE and type "import re". Let's get started.

re.search:


for the object r where r = re.search(pattern, subject),

r.start() - Gives the start point where the match was found.
r.end() - Gives the end point where the match was found.
r.group(0) - Returns the full matched string
r.group(1) - Returns the matched string for the 1st group
r.group(2) - Returns the matched string for the 2nd group

And so on....

Note : r.group() is same as r.group(0)

The number of groups available depends on the number of groups used in your pattern.

Example:


>>> regex = r"([a-zA-Z]+) (\d+)"
>>> re.search(regex, 'July 4')
<_sre.SRE_Match object at 0x0000000002E4D140>
>>> r = re.search(regex, 'July 4')
>>> r.end()
6
>>> r.start()
0
>>> 'July 4'[0:6]
'July 4'
>>> 'July 4'[0:5]
'July '
>>> r.group(0)
'July 4'
>>> r.group(1)
'July'
>>> r.group(2)
'4'
>>> r.group(3)

Traceback (most recent call last):
  File "<pyshell#12>", line 1, in <module>
    r.group(3)
IndexError: no such group

re.findall


>>> r = re.findall(regex, 'July 4 August 10 Sept')
>>> r
[('July', '4'), ('August', '10')]

Same example with findall...observe how we got two results since there are two patterns in the string .

Noteworthy : If you are using groups with findall, then the result will be a list containing tuples as above.
But if you are just using single regex (no grouping), the result will be a list with strings .

Example:


>>> re.findall('\w+', 'July 4 August 10')
['July', '4', 'August', '10']

re.finditer

The finditer returns an iterator of re.MatchObjects.


>>> it = re.finditer('\w+', 'July 4 August 10')
>>> for i in it:
        print 'July 4 August 10'[i.start():i.end()]

>>> for i in it:
       print i.group()

 
July
4
August
10

re.compile

If you want to repeat searching with a common pattern , its better to create a separate object (known as compiled regular expression object).

Python official help says:

Compile a regular expression pattern, returning a pattern object.


>>> patt = re.compile('abc')


Lets use the 'patt' object now.


>>> r = patt.search('abcXYZabc1234hjhjkabc')
>>> r.group()
'abc'
>>> r.start()
0
>>> r.end()
3

Things to note:
  • Search only needs a target search string as argument. 
  • Functions start() and end() returns the start index number , and end return the position as i + 1 starting from 0.
Now what if you want to know if there are more matches possible in the same string with our pattern.

>>> r = patt.search('abcXYZabc1234hjhjkabc', 1)
>>> r.group()
'abc'
>>> r.start()
6
>>> r.end()
9

As you can see, I can now traverse in the string to find all possible matches.

re.sub

If you want to replace some text that matches a certain pattern , the re.sub functions comes handy.


So by the syntax, we need to provide a pattern , then a text to be replaced if a match is found, and a target string. Let's see a simple example.


>>> string = "Let have a big party on my birthday"
>>> re.sub('(?<=\s)\w', 'X', string)
"Let's Xave X Xig Xarty Xn Xy Xirthday"

Explanation:

  • We have a sentence. I want to replace all first letters with some letter only if that letter has a space before it. 
  • The pattern : (?<=\s)\w finds all letters that have a space before it. 
  • 'x' is the thing to be replaced with.
  • The variable "string" is the target. 
  • The result shown is the return of the re.sub(). It needs to be stored if you want to use it. The variable "string" remains unchanged.


Total Pageviews