regular expressions in coldfusion and studio. definitions string - any collection of 0 or more...
Post on 21-Dec-2015
221 views
TRANSCRIPT
DefinitionsString - Any collection of 0 or more characters.Example:
“This is a String”SubString - A segment of a StringExample:
“is a”
Case Sensitivity - detection if a character is upper or lower case.
Simple TaskFind the word “Name” inside a string:
<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>
Position=#Find(‘Name’, String)#</CFOUTPUT>
Position=0
Simple Task
Find the word “Name” inside a string:
<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>
Position=#Find(‘name’, String)#</CFOUTPUT>
Position=4
Simple TaskFind the word “Name” inside a string:
<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>
Position= #FindNoCase(‘Name’, String)#</CFOUTPUT>
Position=4
Simple TaskFind the word “Name” inside a string using Regular Expressions:
<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>
Position=#REFindNoCase(‘Name’, String)#</CFOUTPUT>
Position=4
Intro to Regular Expressions• Refereed to as RegEx• Matches patterns of characters• Used in many languages (ColdFusion, Perl, JavaScript, etc.)• Uses a small syntax library to do ‘dynamic’ matches• Can be used for Search and/or Replace actions• Slightly slower than similar Find() and Replace() functions• Has both a case sensitive and a non-case sensitive version of each function operation
•REFind()•REFindNoCase()•REReplace()•REReplaceNoCase
RegEx BasicsRule 1: A character matches itself as long as it is not a control character.Example:
A=“A” A=“a” (non-case sensitive)
<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>
Position= #REFindNoCase(‘n’, String)#</CFOUTPUT>
Position=4
RegEx BasicsRule 1a: A search will return the first successful match. To get a different match, set the start position (third attribute of the function - optional)
<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>
Position1= #REFindNoCase(‘M’, String)# Position2= #REFindNoCase(‘M’, String, 2)#
</CFOUTPUT>
Position1=1Position2=12
RegEx Basics
Rule 2: A collection of non-control characters matches another collection of non-control characters.
AA=“AA”AA!=“Aa” (case sensitive)AA=“Aa” (non-case sensitive)A A=“A A” (notice the space)
<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>
Position=#REFindNoCase(‘y n’, String)#</CFOUTPUT>
Position=2
RegEx Basics
Rule 3: A period (.) is a control character that matches ANY other character.Example:
. = “A” A. = “Ac”A.A=“A A”
<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>
Position= #REFindNoCase(‘N.me’, String)#</CFOUTPUT>
Position=4
RegEx BasicsRule 4: A control character can be ‘escaped’ by using a backslash (\) before it. This will cause the control character to match a text version of itself.Example:
. = “.” \. = “.”A\.A = “A.A”
<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>
Position1=#REFindNoCase(‘tz\.’, String)#</CFOUTPUT>
Position=26
RegEx AnchoringRule 5a: Using the caret (^) will make sure the text your searching for is at the start of the string.Example:
^A= “A” ^M != “AM”
<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>
Position1=#REFindNoCase(‘^My’, String)# Position2=#REFindNoCase(‘^is’, String)#
</CFOUTPUT>
Position1=1Position2=0
RegEx AnchoringRule 5b: Using the dollar sign ($) will make sure the text your searching for is at the end of the string.Example:
A$ = “A” M$ = “MAM” (second M will be returned)
<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>
Position1=#REFindNoCase(‘\.$’, String)#</CFOUTPUT>
Position1=28
RegEx RangesRule 6: When looking for one of a group of characters, place them inside square brackets ([]). Example:
‘[abc]’ will match either a, b, or c.‘[.+$^]’ will match either a period (.), a plus (+), a dollar
sign ($) or a caret (^). Note that all special characters are escaped within square brackets.
<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>
Position1=#REFindNoCase(‘M[aeiou]’, String)#</CFOUTPUT>
Position1=6
RegEx RangesRule 7a: A caret (^), when used within square brackets ([]) is has the effect of saying ‘NOT these characters’. It must be the first character for this to work.Example:
‘[^abc]’ will match ANY character other than a, b, or c.
<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>
Position1=#REFindNoCase(‘M[^aeiou]’, String)#</CFOUTPUT>
Position1=1
RegEx RangesRule 7b: A dash (-), when used within square brackets ([]) has the effect of saying ‘all characters from the first character till the last’. Example:
‘[a-e]’ will match ANY character between a and e.
<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>
Position1=#REFindNoCase(‘M[a-m]’, String)#</CFOUTPUT>
Position1=6
RegEx RangesRule 8: ColdFusion has a series of pre-built character ranges. These are referenced as [[:range name:]].Example:
[[:digit:]] - same as 0-9 (all numbers)[[:alpha:]] - same as A-Z and a-z (all letters of both case)
<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>
Position1=#REFindNoCase(‘[[:space:]]’, String)#</CFOUTPUT>
Position1=3
RegEx Character ClassesCharacterClass
Matches
Alpha Matches any letter. Same as [A-Za-z].Upper Matches any upper-case letter. Same as [A-Z].Lower Matches any lower-case letter. Same as [a-z].Digit Matches any digit. Same as [0-9].Alnum Matches any alphanumeric character. Same as [A-Za-z0-9].Xdigit Matches any hexadecimal digit. Same as [0-9A-Fa-f].Space Matches a tab, new line, vertical tab, form feed, carriage return, or
Space.Print Matches any printable character.Punct Matches any punctuation character, that is, one of ! ‘ # S % & ‘ ( ) * + , -
. / : ; < = > ? @ [ / ] ^ _ { | } ~Graph Matches any of the characters defined as a printable character except
Those defined to be part of the space character class.Cntrl Matches any character not part of the character classes [:upper:],
[:lower:], [:alpha:], [:digit:], [:punct:], [:graph:], [:print:], or [:xdigit:].
RegEx MultipliersAny character or character class can be assigned a multiplier that will define the use of the character or class. These multipliers can say that a character must exist, is optional, may exist for a certain minimum or maximum, etc. Multiplier characters include:Plus (+) One or moreAsterisk (*) 0 or moreQuestion Mark (?) may or may not exist onceCurly Brackets({}) A specific range of occurances
RegEx MultipliersThe Plus (+) multiplier specifies that the character or character group must exist but can exist more than once. Example:
A+ - A followed by any number of additional A’s[[:digit:]]+ - A number (0-9) followed by any amount of
additional numbers
<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>
Position1=#REFindNoCase(‘is+i’, String)#</CFOUTPUT>
Position1=2
RegEx MultipliersThe Asterisk (*) multiplier specifies that the character or character group may or may not exist, and can exist more than once. (I.e. 0 or more)Example:
A* - Either no A or an A followed by any number of additional A’s
[[:digit:]]* - Either no number (0-9) or a number followed by any amount of additional numbers
<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>
Position1=#REFindNoCase(‘si*s’, String)#</CFOUTPUT>
Position1=3
RegEx MultipliersThe Question mark (?) multiplier specifies that the character or character group may or may not exist, but only once. Example:
A? - Either A or no As[[:digit:]]+ - One or no numbers (0-9)
<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>
Position1=#REFindNoCase(‘p?i’, String)#</CFOUTPUT>
Position1=2
RegEx MultipliersCurly brackets ({}) can be used to specify a minimum and maximum range for a character to appear. The format is {min, max}Example:
A{2,4} - 2 As or more but no more than 4. [[:digit:]]{1,6} - 1 number (0-9) or more, but no more
than 6.
<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>
Position1=#REFindNoCase(‘s{2,3}’, String)#</CFOUTPUT>
Position1=3
RegEx SubExpressionsSubExpressions are a way of grouping characters together. This allows us to reference the entire group at once. To group characters together, place them within parenthesis ().Example:
(Name) = name(Name)+ = name, namename or basically one or more
names.
<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>
Position1=#REFindNoCase(‘(iss)+’, String)#</CFOUTPUT>
Position1=2
RegEx SubExpressionsAn additional special character that is usable within a subExpression is the pipe (|). This means either the first group of text or the second (or more). Example:
(Na|me) = na or me(Name|Date) = Name or date
<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>
Position1=#REFindNoCase(‘(hard|word)’, String)#</CFOUTPUT>
Position1=18
RegEx SubExpressionsSubExpressions allow us to do something else that’s special; back referencing. This is the ability to reference one or more groups directly. This is done by using the backslash (\) followed by a number that specifies which subexpression we want.Example:
(name)\1 = namename(Name|Date)\1 = namename or datedate
<CFSET String=“Mississippi is is a hard word.”><CFOUTPUT>
Position1=#REFindNoCase(‘(is )\1’, String)#</CFOUTPUT>
Position1=13
REReplace
The REReplace() and REReplaceNoCase() functions use everything you’ve learned about searching and allows you to ‘work’ with the search results, I.e. replace them with something.Example:<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>
Position1=#REReplaceNoCase(String, ‘iss’, ‘emm’)#Position2=#REReplaceNoCase(String, ‘iss’, ‘emm’,
‘all’)#</CFOUTPUT>
Position1=Memmissippi is a hard wordPosition2=Memmemmippi is a hard word