Regular Expressions

Searching for Digits

\d+ simpler for searching for 1 or more digits than [0-9][0-9]

Example: Find all hastags and links in a tweet:

return re.findall(r'((?:#|http)\S+)', tweet)

Capturing Group



Innermost group is non capturing:


The | is a boolean or so it means “#” or “http”

In its entirety, the above group means “#” or “http” followed by one or more non-whitespace characters.



def match_first_paragraph():
    """Write a regular expression that returns  'pybites != greedy' """
    html = ('<p>pybites != greedy</p>'
            '<p>not the same can be said REgarding ...</p>')
    return re.sub(r'^<p>(.*?)</p>.*$', r'\1', html)



re.compile(pattern, flags=0)

Compiles a regular expression pattern into a regular expression object, which can be used for matching using its match(), search() and other methods, described below.

The expression’s behaviour can be modified by specifying a flags value. Values can be any of the following variables, combined using bitwise OR (the | operator).

The sequence

prog = re.compile(pattern)
result = prog.match(string)

is equivalent to

result = re.match(pattern, string)

but using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.

Regular Expressions Reference

Character set

Match any character in the set.


Negated set

Match any character not in the set



Matches a character having a character code between the two specified characters inclusive.



Matches any character except linebreaks. Equivalent to [^\n\r]. .


Matches any word character (alphanumeric & underscore). Only matches low-ascii characters (no accented or non-roman characters). Equivalent to [A-Za-z0-9_]. \w

not word




not digit




not whitespace






word boundary


not word boundary




line feed




Capturing group

Groups multiple tokens together and creates a capture group for extracting a substring or using a backreference.


numeric reference

Matches the results of a capture group. For example \1 matches the results of the first capture group and \3 matches the third.


Matches 1 or more of the preceding token. b\w+


Matches 0 or more of the preceding token. b\w*


Matches the specified quantity of the previous token. {1,3} will match 1 to 3. {3} will match exactly 3. {3,} will match 3 or more. b\w{2,3}


Acts like a boolean OR. Matches the expression before or after the |.

It can operate within a group, or on a whole expression. The patterns will be tested in order. b(a|e|i)d