regular expressions: the proper care and feeding zain naboulsi msdn developer evangelist microsoft
TRANSCRIPT
Regular Expressions: Regular Expressions: The Proper Care and FeedingThe Proper Care and Feeding
Zain NaboulsiZain NaboulsiMSDN Developer EvangelistMSDN Developer EvangelistMicrosoftMicrosoft
Introduction to Regular ExpressionsIntroduction to Regular Expressions
What Are Regular Expressions?What Are Regular Expressions?
Why Would I Want To Use Them?Why Would I Want To Use Them?
Common MisconceptionsCommon Misconceptions
Anatomy of An Regular ExpressionAnatomy of An Regular Expression
DisclaimerDisclaimer
All opinions in this session are All opinions in this session are provided "AS IS" with no warranties, provided "AS IS" with no warranties, and confer no rights.and confer no rights.
All opinions are my mine and don't All opinions are my mine and don't necessarily reflect the opinion of necessarily reflect the opinion of Microsoft.Microsoft.
What Are What Are Regular Expressions?Regular Expressions?
Regular ExpressionsRegular Expressions““Regular expressions provide a powerful, Regular expressions provide a powerful, flexible, and efficient method for processing flexible, and efficient method for processing text. text.
[They allow] you to quickly parse large [They allow] you to quickly parse large amounts of text to find specific character amounts of text to find specific character patterns; to extract, edit, replace, or delete patterns; to extract, edit, replace, or delete text substrings; or to add the extracted strings text substrings; or to add the extracted strings to a collection in order to generate a report.”to a collection in order to generate a report.”
http://msdn2.microsoft.com/en-us/library/hs600312.aspx
Do What?Do What?
Simply put, regular expressions will help you Simply put, regular expressions will help you find text patterns and do pretty much find text patterns and do pretty much whatever you want to it.whatever you want to it.
It sounds simple but regular expressions are It sounds simple but regular expressions are one of the most difficult and least understood one of the most difficult and least understood constructs in programming.constructs in programming.
WarningWarning
Regular expressions are part art and part Regular expressions are part art and part science. There is a steep learning curve but science. There is a steep learning curve but the rewards are significant.the rewards are significant.
The PossibilitiesThe Possibilities
Okay, So What Is A Pattern?Okay, So What Is A Pattern?
““a regular or repetitive form, order, or a regular or repetitive form, order, or arrangement”arrangement”
http://encarta.msn.com/dictionary_1861724272/pattern.html
PATTERNS ARE PATTERNS ARE EVERYWHEREEVERYWHERE
Checker BoardChecker Board
Fibonacci SequenceFibonacci Sequence
TextText
The IP Address for the server is 192.169.1.3 The IP Address for the server is 192.169.1.3 but it should be 192.168.1.5, and I am not but it should be 192.168.1.5, and I am not sure how we managed to get into the sure how we managed to get into the 192.169.1 subnet but we need to remove 192.169.1 subnet but we need to remove ourselves from it immediately unless we are ourselves from it immediately unless we are moving to it then I want the new IP to be moving to it then I want the new IP to be 192.169.1.3 I suppose.192.169.1.3 I suppose.
YOU HAVE USED YOU HAVE USED PATTERNS BEFOREPATTERNS BEFORE
Wildcard Searches For FilesWildcard Searches For Files
Wildcards = VERY simple pattern matching Wildcards = VERY simple pattern matching constructs and are NOT regular expressionsconstructs and are NOT regular expressions
Examples:Examples:*.txt*.txt
b*b*b*b*
?un.txt?un.txt
Why Use Why Use Regular Expressions?Regular Expressions?
Major Uses of Major Uses of Regular ExpressionsRegular Expressions
Matching = find any text anywhere Matching = find any text anywhere regardless of complexityregardless of complexity
Substitution = once found, you can replace Substitution = once found, you can replace texttext
FeaturesFeatures
Can literally turn 10 lines of code into 1 Can literally turn 10 lines of code into 1
Extremely efficient pattern matching Extremely efficient pattern matching mechanismmechanism
Once learned, becomes one of the most Once learned, becomes one of the most indispensible techniques you can haveindispensible techniques you can have
Languages That SupportLanguages That SupportRegular ExpressionsRegular Expressions
All .NET languagesAll .NET languages
JScriptJScript
XML: XPath & XQueryXML: XPath & XQuery
T-SQLT-SQL
PERLPERL
JavaJava
[insert language here][insert language here]
ASP.NET ControlASP.NET Control
Common Common MisconceptionsMisconceptions
MisconceptionsMisconceptions
Regular Expressions can do complex Regular Expressions can do complex programming logicprogramming logic
Regular Expressions can do mathRegular Expressions can do math
Regular Expressions will give me winning Regular Expressions will give me winning lottery numberslottery numbers
Anatomy of an Anatomy of an Regular ExpressionRegular Expression
A Sample ExpressionA Sample Expression
^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$
AnatomyAnatomy
CharactersCharacters
MetacharactersMetacharacters
SubexpressionsSubexpressions
CharactersCharacters
A literal character represents any valid value A literal character represents any valid value represented by the current encoding method.represented by the current encoding method.
For example the “@” literal character is For example the “@” literal character is represented as the decimal value 65 in the represented as the decimal value 65 in the ASCII encoding system.ASCII encoding system.
^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$
MetacharactersMetacharacters
Unlike literal characters, metacharacters are Unlike literal characters, metacharacters are used as “place holders” for characters.used as “place holders” for characters.
For example, the metacharacter “\t” in regular For example, the metacharacter “\t” in regular expressions represents the tab character, expressions represents the tab character, whereas the “\d” matches any digit 0 through whereas the “\d” matches any digit 0 through 9.9.
^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$
SubexpressionsSubexpressions
These are simply smaller expressions nested These are simply smaller expressions nested inside larger ones.inside larger ones.
For example, the following expression has a For example, the following expression has a subexpression inside it:subexpression inside it:
(john|jane)doe(john|jane)doe
Must Have ResourcesMust Have Resources
ToolsTools
http://www.RegExLib.com
http://www.ultrapico.com/Expresso.htm
BookBook
ToolsTools
SummarySummary
SummarySummary
Regular expressions can be used to Regular expressions can be used to manipulate and change textmanipulate and change text
While there is a steep learning curve, regular While there is a steep learning curve, regular expressions are invaluable as a programming expressions are invaluable as a programming tooltool
Regular expressions are supported by Regular expressions are supported by virtually all major programming languagesvirtually all major programming languages
Next StepsNext StepsCheck out some of the patterns on the Check out some of the patterns on the RegExLib siteRegExLib site
Do a live search on regular expressions and Do a live search on regular expressions and see what others have to say about themsee what others have to say about them
Prepare your self mentally for a rewarding Prepare your self mentally for a rewarding journey into the world of regular expressionsjourney into the world of regular expressions
Have Fun!!!Have Fun!!!