cptg286k programming - perl chapter 7: regular expressions

CPTG286K Programming - Perl

Chapter 7: Regular Expressions

Regular Expressions (aka regex)

• Regular expressions are patterns used to match against a string

• Regular expressions are contained between slashes

• The outcome is either a successful match or a failure to match

• Substitution, join, and split operations can be performed on successful matches

Simple Uses of regex

while (<>) # similar to grep “abc” filename{

if (/abc/) # regex /abc/ matches abc to $_{ print; } # prints $_ if it contains abc

• Replacing regex /abc/ with:– /ab*c/ matches an a, followed by 0 or more b’s,

followed by a c; same as /ab{0,}c/– /ab+c/ matches an a, followed by 1 or more b’s,

followed by a c; same as /ab{1,}c/– /ab?c/ matches an a, followed by 0 or 1 b’s, followed

by a c; same as /ab{0,1}c/

Quantifiers

Symbol Meaning

+ Match 1 or more times

* Match 0 or more times

? Match 0 or 1 time

{n} Match exactly n times

{n,} Match at least n times

{n,m} Match at least n but not more than m times

Patterns

• Single-character patterns– Character class– Negated character class

• Grouping patterns– Parenthesis– Multipliers– Sequence and anchoring– Alternation

Single-Character Patterns

• Specific single-character match: /a/• Any non-newline character: /./• Character class: /[valid_list]/

– /[0-9]/ # or \d, any single digit– /[a-zA-Z0-9_]/ # or \w, any word– /[ \r\t\n\f]/ # or \s, any space

• Negated class: /[^valid_list]/– /[^0-9]/ # or \D, any single non-digit– /[^a-zA-Z0-9_]/ # or \W, any single non-word– /[^ \r\t\n\f]/ # or \S, any non-space

Parenthesis grouping

• This grouping is used to “memorize” a pattern, so it can be referenced later

• A memorized pattern is referenced using a backslash and parenthesis grouping number

Examples:/(a)(b)c\2d\1/; # matches abcbda/a(.*)b\1c/; # matches aFREDbFREDc but

# does not match aXXbXXXc

Multiplier grouping

/x{5}/ # matches exactly 5 x’s

/x{5,10}/ # matches 5 to 10 x’s

/fo+ba?r*/ # matches f followed by one or more o’s, a b, # an optional a, and zero or more r’s

/fo{1,}ba{0,1}r{0,}/ # same as /fo+ba?r*/ using a general multiplier

• By default, * and + groupings are greedy:$_ = “Nuts sold here. Come here!”;

/N.*here/ # $_ matches “Nuts sold here. Come here!”

/N.*?here/ # $_ matches “Nuts sold here.” (non-greedy)

Anchor grouping

• \b requires a word boundary for a match• \B requires NO word boundary for match• ^ matches beginning of the string• $ matches end of stringExamples:/\bFred\b/; # matches Fred, not Frederick or alFred/\bFred\B/; # matches Frederick, not Fred Flintstone/^a/; # matches strings beginning with a/c$/; # matches strings ending in c (before \n)

Alternatives grouping

/al|bert|c/; # matches al or bert or c

/^x|y/; # x at beginning of line, # or y anywhere

/^(x|y)/; # either x or y at # beginning of

/songbird|bluebird/;# songbird or bluebird

/(song|blue)bird/; # same, using parenthesis

/(a|b)(c|d)/; # ac, ad, bc, or bd

Regex Grouping Precedence

• Arranged from highest to lowest precedence:Name Representation

Parenthesis ( ) (?: )

Multipliers ? + * {m,n} ?? +? *? {m,n}?

Sequence and Anchoring abc ^ $ \A \Z (?= ) (?! )

Alternation |

Example:/a|b*/; # interpreted as /a|(b*)/, not (a|b)*

/a|(?:b*)/ ; # same, but does not trigger memory

# to store into \1

The pattern binding =~ operator

• Use the =~ to bind pattern to a scalar variable other than the default $_ variable

• To match the regex to $name from keyboard:

print “Proceed (y/Y)? ”; # produce prompt

chomp ($name = <STDIN>); # chomp input

if ($name =~ /^[yY]/) # test both cases

print “Proceeding.”; # display decision

Ignoring case & other delimiters

• Append an i to the regex to ignore case:print “Proceed (y/Y)? ”; # produce promptchomp ($name = <STDIN>); # chomp inputif ($name =~ /^y/i) # use either case

• To use a different delimiter:– Place an m followed by a new character in place of

slashes (i.e. a #)print “Proceed (y/Y)? ”; # produce promptchomp ($name = <STDIN>); # chomp inputif ($name =~ m#^y#i) # new # delimiter

Variable Interpolation

• A regex can be constructed from computed strings rather than literals:

$sentence = “Every good bird does fly.”;

print “What should I look for? “; # prompt

$what = <STDIN>; # read keyboard

chomp($what); # chomp input

if ($sentence =~ /$what/) # matches [bw]ird

{ print “I saw $what in $sentence. \n”; }

else { print “Nope… didn’t find it.\n”; }

Special Read-only Variables

• Upon a successful pattern match, $1, $2, $3… are set to values in \1, \2, \3…

• These read-only variables can be used in later parts of the program:

$_ = “This is a test”;

/(\w+)\W+(\w+)/; # match first two words

# $1 is now “this” and

# $2 is now “is”

($first,$second) = /(\w+)\W+(\w+)/;

# $first is now “this” and $second is now “is”

cptg286k programming - perl chapter 7: regular expressions

Documents

perl notes for professionals - goalkicker.com · chapter...

fall 2004comp 3351 regular expressions. fall 2004comp 3352...

regular expressions 101 introduction to regular expressions

costas busch - lsu1 regular expressions. costas busch - lsu2...

programming in perl regular expressions and m,s operators...

regular expressions regular expressions. regular expressions...

perl 101: regular expressions -...

chapter 11: perl scripting off larry’s wall. in this...

perl and regular expressions regular expressions are...

and finite automata… ruby regular expressions. why learn...

regular expressions friend or foe?. introduction to regular...

110-31: an introduction to perl regular expressions · i...

265-29: an introduction to perl regular expressions in sas 9

cse 341 -- s. tanimoto regular expressions 1 regular...

unix linux administration iii class 7: solaris zones and...

lecture 10 regular expressions -...

regular expressions - osaka city university · equivalence...

1 perl regular expressions. things perl can do easily with...

regular expressions regular expression (or pattern) in perl...

regular expressions in perl – part i