What is Regular Expression?
Usage of Regular Expression (RE)
Regex in Python
Various Methods of Regular Expressions
- Compile Method
- Search Method
- Match Method
- Findall Method
- Split Method
- Sub Method
A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations. Wikipedia
Regular expressions are a generalized way to match patterns with sequences of characters. It is used in every programming language like C++, Java and Python.
Note: We can use regular expressions in Python. The re module provides an interface to the regular expression engine, allowing you to compile regular expressions into objects and then perform matches with them.
Regular expressions uses two types of characters:
Literals characters: such as a, b, 1, 2...
Meta characters: such as opening and closing square brackets ( [ and ] ); backslash ( \ ); caret ( ^ ); dollar sign ( $ ) and etc.
2) Usage of Regular Expression (RE)
Python has a built-in package called re, which can be used to work with Regular Expressions.
Import the re module by;
Note: When you have imported the re module, you can start using regular expressions
4) Various Methods of Regular Expressions
The built-in re package provides multiple methods in order to perform queries on an input string. We will discuss the most commonly used re methods;
All this object instances also have several methods and attributes; the most important ones are:
|group()||Return the string matched by the RE|
|start()||Return the starting position of the match|
|end()||Return the ending position of the match|
|span()||Return a tuple containing the (start, end) positions of the match|
Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.
Note: Using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.
import re sent='iNeuron provides affordable AI courses and AI internship program' pattern=re.compile('AI') result=pattern.findall(sent) result
Out: ['AI', 'AI']
Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.
import re sent='iNeuron provides courses of data science, data anlytics and etc.' result = re.search('data',sent) print(result.group())
Out: (28, 32)
Out: (28, 32)
Determine if the regular expressions matches at the beginning of the string
import re string='iNeuron' ## ^ and $ match the start or end of the string respectively ## Matches with any single character pattern = '^i.....n$' result = re.match(pattern, string) print(result) if result: print("Search successful.") else: print("Search unsuccessful.")
Out: <_sre.SRE_Match object; span=(0, 7), match='iNeuron'>
Find all substrings where the RE matches, and returns them as a list. It has no such restriction of searching from start or end. While searching it is recommended to use re.findall() because it can work like both re.search() and re.match().
import re sent='iNeuron provides courses of data science, data anlytics and etc.' result = re.findall('data',sent) print(result)
Out: ['data', 'data']
Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings.
re.split(pattern, string, maxsplit=0)
Here, by default maxsplit=0 is set, but as per requirement we can change.
import re result=re.split('e','iNeuron') result
Out: ['iN', 'uron']
sent='iNeuron provides courses of data science, data anlytics and etc.' #It has performed the splits operation based upon the pattern "e". result=re.split('e',sent) result
Out: ['iN', 'uron provid', 's cours', 's of data sci', 'nc', ', data anlytics and ', 'tc.']
sent='iNeuron provides courses of data science, data anlytics and etc.'#It has performed the splits operation based upon the pattern "e" with maxsplit=2. result=re.split('e',sent,maxsplit=2) result
Out: ['iN', 'uron provid', 's courses of data science, data anlytics and etc.']
4.6 Sub Method
Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string, if the provided pattern is not found, the string remains unchanged.
re.sub(pattern, repl, string)
import re sent='iNeuron provides the best affordable data science courses in India' #It has performed the search and replace operation based upon pattern. result=re.sub('India','World',sent) result
Out: 'iNeuron provides the best affordable data science courses in World'