Regular Expression in Python

##regularexpression ##python ##jupyternotebook ##inblog ##notebook

Ravi Chaurasia Sept 18 2020 · 2 min read
Share this

Contents

What is Regular Expression?
Usage of Regular Expression (RE)
Regex in Python
Various Methods of Regular Expressions
      - Compile Method
      - Search Method
      - Match Method
      - Findall Method
      - Split Method
      - Sub Method

1) What is Regular Expression?

regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations. Wikipedia

Regular expressions are a generalized way to match patterns with sequences of characters. It is used in every programming language like C++Java and Python.

Note: We can use regular expressions in Python. The re module provides an interface to the regular expression engine, allowing you to compile regular expressions into objects and then perform matches with them.

Regular expressions uses two types of characters:
Literals characters:
such as a, b, 1, 2...
Meta characters: such as opening and closing square brackets ( [ and ] ); backslash ( \ ); caret ( ^ ); dollar sign ( $ ) and etc.

2) Usage of Regular Expression (RE)

  • RE are used in Google analytics in URL matching.
  • It support for search and replace operation in most popular editors like Google Docs, Sublime, Notepad++ and Microsoft word.
  • Other than that it helps in several operations such as; File Renaming, Database queries (MySQL) and Web directives (Apache).
  • Many programming languages provide regex capabilities either built-in or via libraries.
  • 3) Regex in Python

    Python has a built-in package called re, which can be used to work with Regular Expressions.
    Import the re module by;

    import re

    Note: When you have imported the re module, you can start using regular expressions

    4) Various Methods of Regular Expressions

    The built-in re package provides multiple methods in order to perform queries on an input string. We will discuss the most commonly used re methods;
    - compile()
    - search()
    - match()
    - findall()
    - split()
    - sub()

    All this object instances also have several methods and attributes; the most important ones are:

    Method/Attribute Purpose
    group() Return the string matched by the RE
    start() Return the starting position of the match
    end() Return the ending position of the match
    span()  Return a tuple containing the (start, end) positions of the match

    4.1 Compile Method

    Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.

    re.compile(pattern)

    Note: Using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.

    import re
    
    sent='iNeuron provides affordable AI courses and AI internship program'
    
    pattern=re.compile('AI')
    result=pattern.findall(sent)
    result

    Out: ['AI', 'AI']

    4.2 Search Method

    Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.

    re.search(pattern, string)
    import re
    
    sent='iNeuron provides courses of data science, data anlytics and etc.'
    result = re.search('data',sent)
    print(result.group())

    Out: 'data' 

    result.group()

    Out: 'data' 

    result.start(), result.end()

    Out: (28, 32) 

    result.span()

    Out: (28, 32)

    4.3 Match Method

    Determine if the regular expressions matches at the beginning of the string

    re.match(pattern, string)
    import re
    
    string='iNeuron'
    
    ## ^ and $ match the start or end of the string respectively
    ## Matches with any single character
    pattern = '^i.....n$'
    
    result = re.match(pattern, string)
    print(result)
    if result:
        print("Search successful.")
    else:
        print("Search unsuccessful.")

    Out: <_sre.SRE_Match object; span=(0, 7), match='iNeuron'>
              Search successful.

    4.4 Findall Method

    Find all substrings where the RE matches, and returns them as a list. It has no such restriction of searching from start or end. While searching it is recommended to use re.findall() because it can work like both re.search() and re.match().

    re.findall(pattern, string)
    import re
    
    sent='iNeuron provides courses of data science, data anlytics and etc.'
    result = re.findall('data',sent)
    
    print(result)

    Out: ['data', 'data']

    4.5 Split Method

    Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings.

    re.split(pattern, string, maxsplit=0)

    Here, by default maxsplit=0 is set, but as per requirement we can change.

    import re
    
    result=re.split('e','iNeuron')
    result
    

    Out: ['iN', 'uron']

    sent='iNeuron provides courses of data science, data anlytics and etc.'
    
    #It has performed the splits operation based upon the pattern "e".
    result=re.split('e',sent)
    result

    Out: ['iN', 'uron provid', 's cours', 's of data sci', 'nc', ', data anlytics and ', 'tc.']

    sent='iNeuron provides courses of data science, data anlytics and etc.'
    #It has performed the splits operation based upon the pattern "e" with maxsplit=2. result=re.split('e',sent,maxsplit=2) result

    Out: ['iN', 'uron provid', 's courses of data science, data anlytics and etc.']

    4.6 Sub Method

    Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string, if the provided pattern is not found, the string remains unchanged.

    re.sub(pattern, repl, string)
    import re
    
    sent='iNeuron provides the best affordable data science courses in India'
    
    #It has performed the search and replace operation based upon pattern.
    result=re.sub('India','World',sent)
    result

    Out: 'iNeuron provides the best affordable data science courses in World'

    Comments
    Read next