pattern matching with regular expressions a common file processing requirement is to match strings...
TRANSCRIPT
![Page 1: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address](https://reader036.vdocuments.us/reader036/viewer/2022082612/56649e3b5503460f94b2d3d1/html5/thumbnails/1.jpg)
Pattern matching with regular expressions
• A common file processing requirement is to match strings within the file to a standard form, e.g. email address
• Regular expressions or ‘regexes’ give use the power to do this kind of matching
• At simplest, any word is a regex– regex: ‘email’– test: ‘email’
• Regex is in string, so it matches!
![Page 2: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address](https://reader036.vdocuments.us/reader036/viewer/2022082612/56649e3b5503460f94b2d3d1/html5/thumbnails/2.jpg)
Regular Expressions
• In reality regexes are used to search for a string that "has the form" of the regular expression”
• Need to define some syntax that lets us specify things such as – 'a number is in a range‘; – 'a letter is one of a set‘; – 'a certain number of characters' etc.
• Requires special characters
![Page 3: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address](https://reader036.vdocuments.us/reader036/viewer/2022082612/56649e3b5503460f94b2d3d1/html5/thumbnails/3.jpg)
Regular Expressions• Some special characters: *, [], {}• For a complete reference see
http://www.regular-expressions.info/reference.html
• An asterisk * specifies that the character preceding it can appear zero or more times, e.g, – regex: 'a*b' – test: 'b' # Matches as there is no 'a’– test: ‘ab’ #Matches– test: ‘aaab’ #Matches
![Page 4: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address](https://reader036.vdocuments.us/reader036/viewer/2022082612/56649e3b5503460f94b2d3d1/html5/thumbnails/4.jpg)
Regular Expressions
• A range of characters, or a "character class" is defined using square brackets [], e.g. – regex: '[a-z]' – test: 'm' # Matches as it is a lower case letter – test: ‘M' # Fails as it is an upper case letter
• Multiple ranges: separate with comma– regex: '[a-z,A-Z,0-9]' – test: ‘M’ # Matches– test: ‘9’ # Matches
![Page 5: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address](https://reader036.vdocuments.us/reader036/viewer/2022082612/56649e3b5503460f94b2d3d1/html5/thumbnails/5.jpg)
Regular Expressions
• To specify an exact number of characters use braces {}, e.g.– regex: 'a{2}' – test: 'abab' # Fails as there is not two
# consecutive a's in the string
– test: 'aaaab' # Matches
![Page 6: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address](https://reader036.vdocuments.us/reader036/viewer/2022082612/56649e3b5503460f94b2d3d1/html5/thumbnails/6.jpg)
Regular Expressions in Python
• Python contains a regular expression module, called ‘re’ that allows strings to be tested against regular expressions– import re– checker = re.compile('[a-z]')– if checker.match(test) != None:
• print 'String matches!'
– else: • print 'String does not contain a match'
![Page 7: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address](https://reader036.vdocuments.us/reader036/viewer/2022082612/56649e3b5503460f94b2d3d1/html5/thumbnails/7.jpg)
Practical example• filetestsRun = testResults.log' • f = open(filetestsRun,'r') • reTestCount = re.compile("Running\\s*(\\d+)\\s*test", re.IGNORECASE)• reCrashCount = re.compile("OK!") • reFailCount = re.compile("Failed\\s*(\\d+)\\s*of\\s*(\\d+)\\s*tests",
re.IGNORECASE)
• Above code searches through a file for lines such as– Running 13 tests.............OK!
• Used on Mantid to keep track of build server test passes/failures