regular expressions in coldfusion and studio. definitions string - any collection of 0 or more...

28
Regular Expressions In ColdFusion and Studio

Post on 21-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Regular Expressions

In ColdFusion and Studio

DefinitionsString - Any collection of 0 or more characters.Example:

“This is a String”SubString - A segment of a StringExample:

“is a”

Case Sensitivity - detection if a character is upper or lower case.

Simple TaskFind the word “Name” inside a string:

<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>

Position=#Find(‘Name’, String)#</CFOUTPUT>

Position=0

Simple Task

Find the word “Name” inside a string:

<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>

Position=#Find(‘name’, String)#</CFOUTPUT>

Position=4

Simple TaskFind the word “Name” inside a string:

<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>

Position= #FindNoCase(‘Name’, String)#</CFOUTPUT>

Position=4

Simple TaskFind the word “Name” inside a string using Regular Expressions:

<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>

Position=#REFindNoCase(‘Name’, String)#</CFOUTPUT>

Position=4

Intro to Regular Expressions• Refereed to as RegEx• Matches patterns of characters• Used in many languages (ColdFusion, Perl, JavaScript, etc.)• Uses a small syntax library to do ‘dynamic’ matches• Can be used for Search and/or Replace actions• Slightly slower than similar Find() and Replace() functions• Has both a case sensitive and a non-case sensitive version of each function operation

•REFind()•REFindNoCase()•REReplace()•REReplaceNoCase

RegEx BasicsRule 1: A character matches itself as long as it is not a control character.Example:

A=“A” A=“a” (non-case sensitive)

<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>

Position= #REFindNoCase(‘n’, String)#</CFOUTPUT>

Position=4

RegEx BasicsRule 1a: A search will return the first successful match. To get a different match, set the start position (third attribute of the function - optional)

<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>

Position1= #REFindNoCase(‘M’, String)# Position2= #REFindNoCase(‘M’, String, 2)#

</CFOUTPUT>

Position1=1Position2=12

RegEx Basics

Rule 2: A collection of non-control characters matches another collection of non-control characters.

AA=“AA”AA!=“Aa” (case sensitive)AA=“Aa” (non-case sensitive)A A=“A A” (notice the space)

<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>

Position=#REFindNoCase(‘y n’, String)#</CFOUTPUT>

Position=2

RegEx Basics

Rule 3: A period (.) is a control character that matches ANY other character.Example:

. = “A” A. = “Ac”A.A=“A A”

<CFSET String=“My name is Michael Dinowitz”><CFOUTPUT>

Position= #REFindNoCase(‘N.me’, String)#</CFOUTPUT>

Position=4

RegEx BasicsRule 4: A control character can be ‘escaped’ by using a backslash (\) before it. This will cause the control character to match a text version of itself.Example:

. = “.” \. = “.”A\.A = “A.A”

<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>

Position1=#REFindNoCase(‘tz\.’, String)#</CFOUTPUT>

Position=26

RegEx AnchoringRule 5a: Using the caret (^) will make sure the text your searching for is at the start of the string.Example:

^A= “A” ^M != “AM”

<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>

Position1=#REFindNoCase(‘^My’, String)# Position2=#REFindNoCase(‘^is’, String)#

</CFOUTPUT>

Position1=1Position2=0

RegEx AnchoringRule 5b: Using the dollar sign ($) will make sure the text your searching for is at the end of the string.Example:

A$ = “A” M$ = “MAM” (second M will be returned)

<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>

Position1=#REFindNoCase(‘\.$’, String)#</CFOUTPUT>

Position1=28

RegEx RangesRule 6: When looking for one of a group of characters, place them inside square brackets ([]). Example:

‘[abc]’ will match either a, b, or c.‘[.+$^]’ will match either a period (.), a plus (+), a dollar

sign ($) or a caret (^). Note that all special characters are escaped within square brackets.

<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>

Position1=#REFindNoCase(‘M[aeiou]’, String)#</CFOUTPUT>

Position1=6

RegEx RangesRule 7a: A caret (^), when used within square brackets ([]) is has the effect of saying ‘NOT these characters’. It must be the first character for this to work.Example:

‘[^abc]’ will match ANY character other than a, b, or c.

<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>

Position1=#REFindNoCase(‘M[^aeiou]’, String)#</CFOUTPUT>

Position1=1

RegEx RangesRule 7b: A dash (-), when used within square brackets ([]) has the effect of saying ‘all characters from the first character till the last’. Example:

‘[a-e]’ will match ANY character between a and e.

<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>

Position1=#REFindNoCase(‘M[a-m]’, String)#</CFOUTPUT>

Position1=6

RegEx RangesRule 8: ColdFusion has a series of pre-built character ranges. These are referenced as [[:range name:]].Example:

[[:digit:]] - same as 0-9 (all numbers)[[:alpha:]] - same as A-Z and a-z (all letters of both case)

<CFSET String=“My name is Michael Dinowitz.”><CFOUTPUT>

Position1=#REFindNoCase(‘[[:space:]]’, String)#</CFOUTPUT>

Position1=3

RegEx Character ClassesCharacterClass

Matches

Alpha Matches any letter. Same as [A-Za-z].Upper Matches any upper-case letter. Same as [A-Z].Lower Matches any lower-case letter. Same as [a-z].Digit Matches any digit. Same as [0-9].Alnum Matches any alphanumeric character. Same as [A-Za-z0-9].Xdigit Matches any hexadecimal digit. Same as [0-9A-Fa-f].Space Matches a tab, new line, vertical tab, form feed, carriage return, or

Space.Print Matches any printable character.Punct Matches any punctuation character, that is, one of ! ‘ # S % & ‘ ( ) * + , -

. / : ; < = > ? @ [ / ] ^ _ { | } ~Graph Matches any of the characters defined as a printable character except

Those defined to be part of the space character class.Cntrl Matches any character not part of the character classes [:upper:],

[:lower:], [:alpha:], [:digit:], [:punct:], [:graph:], [:print:], or [:xdigit:].

RegEx MultipliersAny character or character class can be assigned a multiplier that will define the use of the character or class. These multipliers can say that a character must exist, is optional, may exist for a certain minimum or maximum, etc. Multiplier characters include:Plus (+) One or moreAsterisk (*) 0 or moreQuestion Mark (?) may or may not exist onceCurly Brackets({}) A specific range of occurances

RegEx MultipliersThe Plus (+) multiplier specifies that the character or character group must exist but can exist more than once. Example:

A+ - A followed by any number of additional A’s[[:digit:]]+ - A number (0-9) followed by any amount of

additional numbers

<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>

Position1=#REFindNoCase(‘is+i’, String)#</CFOUTPUT>

Position1=2

RegEx MultipliersThe Asterisk (*) multiplier specifies that the character or character group may or may not exist, and can exist more than once. (I.e. 0 or more)Example:

A* - Either no A or an A followed by any number of additional A’s

[[:digit:]]* - Either no number (0-9) or a number followed by any amount of additional numbers

<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>

Position1=#REFindNoCase(‘si*s’, String)#</CFOUTPUT>

Position1=3

RegEx MultipliersThe Question mark (?) multiplier specifies that the character or character group may or may not exist, but only once. Example:

A? - Either A or no As[[:digit:]]+ - One or no numbers (0-9)

<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>

Position1=#REFindNoCase(‘p?i’, String)#</CFOUTPUT>

Position1=2

RegEx MultipliersCurly brackets ({}) can be used to specify a minimum and maximum range for a character to appear. The format is {min, max}Example:

A{2,4} - 2 As or more but no more than 4. [[:digit:]]{1,6} - 1 number (0-9) or more, but no more

than 6.

<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>

Position1=#REFindNoCase(‘s{2,3}’, String)#</CFOUTPUT>

Position1=3

RegEx SubExpressionsSubExpressions are a way of grouping characters together. This allows us to reference the entire group at once. To group characters together, place them within parenthesis ().Example:

(Name) = name(Name)+ = name, namename or basically one or more

names.

<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>

Position1=#REFindNoCase(‘(iss)+’, String)#</CFOUTPUT>

Position1=2

RegEx SubExpressionsAn additional special character that is usable within a subExpression is the pipe (|). This means either the first group of text or the second (or more). Example:

(Na|me) = na or me(Name|Date) = Name or date

<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>

Position1=#REFindNoCase(‘(hard|word)’, String)#</CFOUTPUT>

Position1=18

RegEx SubExpressionsSubExpressions allow us to do something else that’s special; back referencing. This is the ability to reference one or more groups directly. This is done by using the backslash (\) followed by a number that specifies which subexpression we want.Example:

(name)\1 = namename(Name|Date)\1 = namename or datedate

<CFSET String=“Mississippi is is a hard word.”><CFOUTPUT>

Position1=#REFindNoCase(‘(is )\1’, String)#</CFOUTPUT>

Position1=13

REReplace

The REReplace() and REReplaceNoCase() functions use everything you’ve learned about searching and allows you to ‘work’ with the search results, I.e. replace them with something.Example:<CFSET String=“Mississippi is a hard word.”><CFOUTPUT>

Position1=#REReplaceNoCase(String, ‘iss’, ‘emm’)#Position2=#REReplaceNoCase(String, ‘iss’, ‘emm’,

‘all’)#</CFOUTPUT>

Position1=Memmissippi is a hard wordPosition2=Memmemmippi is a hard word