advanced regular expressions in .net

Post on 12-Apr-2017

320 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Advanced Regular Expressions in .NET

Patrick Delancy

NOTICE!!!

This slide deck has been adapted from a

presentation that was intended to be given live,

in person…. like with a real person in front of

real people. You know… breathing the same air

and all that.

The key points have been transcribed onto

separate slides, so you still get some benefit

from reading through it all, but you are still

missing out on all of the great stories, witty

banter, hilarious costumes, stunning arias … or

something like that.

If you REALLY want to get the most out of this

presentation, go to patrickdelancy.com and ask

him to come give it to your group!

This presentation will help you understand what Regex

is capable of.

Don’t bother trying to memorize the syntax, just remember the concepts.

Then you can make a more intelligent decision about

when you should and should not use Regex.

Common Features

...but not ubiquitous

● Non-capturing groups

● Look ahead

● Look behind

● Free-spacing

Non-Capturing Groups

^(.*)(@)(.*)$

email@ddress.com

email@ddress.com[1] = email[2] = @[3] = ddress.com

^(.*)(?:@)(.*)$

email@ddress.com

email@ddress.com[1] = email[2] = ddress.com

Look Ahead

\b\w+(?=\.) # match the word at end of each sentence# but don’t capture the period.

See Dick. See Jane. See Dick and Jane run.

DickJanerun

Look Behind

(?<=\b19)\d{2}\b # match all years in the 1900’s# capturing only the 2-digit year

1842 1902 1776 1985 2003 1999

028599

Free Spacing (Ignore Pattern Whitespace)

new Regex(@”\b[^@]+ # pattern can now span multiple lines@[^\b]+\b # and include white space for readability

”, RegexOptions.IgnorePatternWhitespace);

Less-Common Features

...in more advanced engines

● Named Captures

● Comments

● Inline Directives

● Conditional Alternation

● Atomic Groups

● Compiled Patterns

● Unicode Categories and

Named Character Blocks

Named Captures

^(?<name>.*)(?:@)(?<domain>.*)$

email@ddress.com

email@ddress.com[name] = email[domain] = ddress.com

Comments

^.*@.*$ # comment to the end of the line

^.*@(?# this is an inline comment).*$

Inline Directives

John the (?ix) (?: wiser | better and greater | privy )

John the Wiser, John the BetterAndGreater, john the privy, John the Better and Greater

John the WiserJohn the BetterAndGreater

^Type:(?:(?<ssn>SSN)|(?<eid>EID)), ID:(?(ssn)\d{3}\-\d{2}\-\d{4}|[-\d]+)$

Type:SSN, ID:352-23-4567Type:EID, ID:35-2234567Type:SSN, ID:35-2234567Type:EID, ID:???

Conditional Alternation

\b(in|integer|insert)\b

integerintegersininsert

Atomic Grouping / Possessive Quantifiers

\b(?>in|integer|insert)\b

integerintegersininsert

var pattern = new Regex(@”a+h+!+”);

return pattern.IsMatch(value);

Compiled Patterns

var pattern = @”a+h+!+”;

return Regex.IsMatch(pattern, value);

\b(?:\p{IsGreek}+\s?)+\p{Pd}\s(?>\p{IsBasicLatin}+\s?)+

Κατα Μαθθαίον - The Gospel of Matthew

Named Character Blocks & Unicode Groups

Unique Features...in the .NET RegEx engine

● Balancing Groups

● Character Class Subtraction

● Explicit Capture Only

^(?:[^{}]|(?<open>{)|(?<-open>}))*(?(open)(?!))$

{ if (true) { return “A”; } else { return “B”; } }{ if (true) { return “A”; } else { return “B”; }

Balancing Groups

[0-9-[1-8]]

0123456789

[0-9-[1-8-[2-7]]]

0123456789

Character Class Subtraction

[\w-[aeiou]]

Lazy dog, quick fox, blah,blah, blah.

^(?<name>[^@\+]+(\+[^\+]+)?)@(?<domain>(\w+)\.(com|net|org))$

e+mail@ddress.com

e+mail@ddress.com[name] = e+mail[2] = +mail[domain] = ddress.com[4] = ddress[5] = com

Explicit Capture Only

(?n)^(?<name>[^@\+]+(\+[^\+]+)?)@(?<domain>(\w+)\.(com|net|org))$

e+mail@ddress.com

e+mail@ddress.com[name] = e+mail[domain] = ddress.com

Some Additional Resources

• https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines - This is a little outdated, but still a good overview of how Regex implementations vary.

• https://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#SupportedNamedBlocks –Here is a reference of all of the named Unicode blocks that .NET supports in Regex. Linked here because I told you I would : )

• http://www.regular-expressions.info/refflavors.html - This is a very comprehensive reference for many common Regex engines. Some content may be out of date as new versions of each platform are released.

• http://www.regexplanet.com/ - An online pattern tester. Not the best interface, but very capable and has some nice features.

top related