pattern matching with regular expressions a common file processing requirement is to match strings...

7
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address Regular expressions or ‘regexes’ give use the power to do this kind of matching At simplest, any word is a regex regex: ‘email’ test: ‘email’ Regex is in string, so it matches!

Upload: pauline-lester

Post on 16-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address

Pattern matching with regular expressions

• A common file processing requirement is to match strings within the file to a standard form, e.g. email address

• Regular expressions or ‘regexes’ give use the power to do this kind of matching

• At simplest, any word is a regex– regex: ‘email’– test: ‘email’

• Regex is in string, so it matches!

Page 2: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address

Regular Expressions

• In reality regexes are used to search for a string that "has the form" of the regular expression”

• Need to define some syntax that lets us specify things such as – 'a number is in a range‘; – 'a letter is one of a set‘; – 'a certain number of characters' etc.

• Requires special characters

Page 3: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address

Regular Expressions• Some special characters: *, [], {}• For a complete reference see

http://www.regular-expressions.info/reference.html

• An asterisk * specifies that the character preceding it can appear zero or more times, e.g, – regex: 'a*b' – test: 'b' # Matches as there is no 'a’– test: ‘ab’ #Matches– test: ‘aaab’ #Matches

Page 4: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address

Regular Expressions

• A range of characters, or a "character class" is defined using square brackets [], e.g. – regex: '[a-z]' – test: 'm' # Matches as it is a lower case letter – test: ‘M' # Fails as it is an upper case letter

• Multiple ranges: separate with comma– regex: '[a-z,A-Z,0-9]' – test: ‘M’ # Matches– test: ‘9’ # Matches

Page 5: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address

Regular Expressions

• To specify an exact number of characters use braces {}, e.g.– regex: 'a{2}' – test: 'abab' # Fails as there is not two

# consecutive a's in the string

– test: 'aaaab' # Matches

Page 6: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address

Regular Expressions in Python

• Python contains a regular expression module, called ‘re’ that allows strings to be tested against regular expressions– import re– checker = re.compile('[a-z]')– if checker.match(test) != None:

• print 'String matches!'

– else: • print 'String does not contain a match'

Page 7: Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. email address

Practical example• filetestsRun = testResults.log' • f = open(filetestsRun,'r') • reTestCount = re.compile("Running\\s*(\\d+)\\s*test", re.IGNORECASE)• reCrashCount = re.compile("OK!") • reFailCount = re.compile("Failed\\s*(\\d+)\\s*of\\s*(\\d+)\\s*tests",

re.IGNORECASE)

• Above code searches through a file for lines such as– Running 13 tests.............OK!

• Used on Mantid to keep track of build server test passes/failures