Regular Expressions with Python: Examples of common functions

Regular Expressions are tools to filter and grab information from a heap of strings. Python has got a lot of toys to deal with it. The module "re". I am going to give you a quick ride through it. The ones that we use/gonna use the most.

So just immediately launch your favourite IDE and type "import re". Let's get started.

re.search:


for the object r where r = re.search(pattern, subject),

r.start() - Gives the start point where the match was found.
r.end() - Gives the end point where the match was found.
r.group(0) - Returns the full matched string
r.group(1) - Returns the matched string for the 1st group
r.group(2) - Returns the matched string for the 2nd group

And so on....

Note : r.group() is same as r.group(0)

The number of groups available depends on the number of groups used in your pattern.

Example:


>>> regex = r"([a-zA-Z]+) (\d+)"
>>> re.search(regex, 'July 4')
<_sre.SRE_Match object at 0x0000000002E4D140>
>>> r = re.search(regex, 'July 4')
>>> r.end()
6
>>> r.start()
0
>>> 'July 4'[0:6]
'July 4'
>>> 'July 4'[0:5]
'July '
>>> r.group(0)
'July 4'
>>> r.group(1)
'July'
>>> r.group(2)
'4'
>>> r.group(3)

Traceback (most recent call last):
  File "<pyshell#12>", line 1, in <module>
    r.group(3)
IndexError: no such group

re.findall


>>> r = re.findall(regex, 'July 4 August 10 Sept')
>>> r
[('July', '4'), ('August', '10')]

Same example with findall...observe how we got two results since there are two patterns in the string .

Noteworthy : If you are using groups with findall, then the result will be a list containing tuples as above.
But if you are just using single regex (no grouping), the result will be a list with strings .

Example:


>>> re.findall('\w+', 'July 4 August 10')
['July', '4', 'August', '10']

re.finditer

The finditer returns an iterator of re.MatchObjects.


>>> it = re.finditer('\w+', 'July 4 August 10')
>>> for i in it:
        print 'July 4 August 10'[i.start():i.end()]

>>> for i in it:
       print i.group()

 
July
4
August
10

re.compile

If you want to repeat searching with a common pattern , its better to create a separate object (known as compiled regular expression object).

Python official help says:

Compile a regular expression pattern, returning a pattern object.


>>> patt = re.compile('abc')


Lets use the 'patt' object now.


>>> r = patt.search('abcXYZabc1234hjhjkabc')
>>> r.group()
'abc'
>>> r.start()
0
>>> r.end()
3

Things to note:
  • Search only needs a target search string as argument. 
  • Functions start() and end() returns the start index number , and end return the position as i + 1 starting from 0.
Now what if you want to know if there are more matches possible in the same string with our pattern.

>>> r = patt.search('abcXYZabc1234hjhjkabc', 1)
>>> r.group()
'abc'
>>> r.start()
6
>>> r.end()
9

As you can see, I can now traverse in the string to find all possible matches.

re.sub

If you want to replace some text that matches a certain pattern , the re.sub functions comes handy.


So by the syntax, we need to provide a pattern , then a text to be replaced if a match is found, and a target string. Let's see a simple example.


>>> string = "Let have a big party on my birthday"
>>> re.sub('(?<=\s)\w', 'X', string)
"Let's Xave X Xig Xarty Xn Xy Xirthday"

Explanation:

  • We have a sentence. I want to replace all first letters with some letter only if that letter has a space before it. 
  • The pattern : (?<=\s)\w finds all letters that have a space before it. 
  • 'x' is the thing to be replaced with.
  • The variable "string" is the target. 
  • The result shown is the return of the re.sub(). It needs to be stored if you want to use it. The variable "string" remains unchanged.


Popular posts from this blog

Why should you visit Kashmir sooner?

CNTLM in Office

Rajasthan: The Best backpacking destination in India