cptg286k programming - perl chapter 7: regular expressions
Post on 05-Jan-2016
221 Views
Preview:
TRANSCRIPT
CPTG286K Programming - Perl
Chapter 7: Regular Expressions
Regular Expressions (aka regex)
• Regular expressions are patterns used to match against a string
• Regular expressions are contained between slashes
• The outcome is either a successful match or a failure to match
• Substitution, join, and split operations can be performed on successful matches
Simple Uses of regex
while (<>) # similar to grep “abc” filename{
if (/abc/) # regex /abc/ matches abc to $_{ print; } # prints $_ if it contains abc
}
• Replacing regex /abc/ with:– /ab*c/ matches an a, followed by 0 or more b’s,
followed by a c; same as /ab{0,}c/– /ab+c/ matches an a, followed by 1 or more b’s,
followed by a c; same as /ab{1,}c/– /ab?c/ matches an a, followed by 0 or 1 b’s, followed
by a c; same as /ab{0,1}c/
Quantifiers
Symbol Meaning
+ Match 1 or more times
* Match 0 or more times
? Match 0 or 1 time
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
Patterns
• Single-character patterns– Character class– Negated character class
• Grouping patterns– Parenthesis– Multipliers– Sequence and anchoring– Alternation
Single-Character Patterns
• Specific single-character match: /a/• Any non-newline character: /./• Character class: /[valid_list]/
– /[0-9]/ # or \d, any single digit– /[a-zA-Z0-9_]/ # or \w, any word– /[ \r\t\n\f]/ # or \s, any space
• Negated class: /[^valid_list]/– /[^0-9]/ # or \D, any single non-digit– /[^a-zA-Z0-9_]/ # or \W, any single non-word– /[^ \r\t\n\f]/ # or \S, any non-space
Parenthesis grouping
• This grouping is used to “memorize” a pattern, so it can be referenced later
• A memorized pattern is referenced using a backslash and parenthesis grouping number
Examples:/(a)(b)c\2d\1/; # matches abcbda/a(.*)b\1c/; # matches aFREDbFREDc but
# does not match aXXbXXXc
Multiplier grouping
/x{5}/ # matches exactly 5 x’s
/x{5,10}/ # matches 5 to 10 x’s
/fo+ba?r*/ # matches f followed by one or more o’s, a b, # an optional a, and zero or more r’s
/fo{1,}ba{0,1}r{0,}/ # same as /fo+ba?r*/ using a general multiplier
• By default, * and + groupings are greedy:$_ = “Nuts sold here. Come here!”;
/N.*here/ # $_ matches “Nuts sold here. Come here!”
/N.*?here/ # $_ matches “Nuts sold here.” (non-greedy)
Anchor grouping
• \b requires a word boundary for a match• \B requires NO word boundary for match• ^ matches beginning of the string• $ matches end of stringExamples:/\bFred\b/; # matches Fred, not Frederick or alFred/\bFred\B/; # matches Frederick, not Fred Flintstone/^a/; # matches strings beginning with a/c$/; # matches strings ending in c (before \n)
Alternatives grouping
/al|bert|c/; # matches al or bert or c
/^x|y/; # x at beginning of line, # or y anywhere
/^(x|y)/; # either x or y at # beginning of
line
/songbird|bluebird/;# songbird or bluebird
/(song|blue)bird/; # same, using parenthesis
/(a|b)(c|d)/; # ac, ad, bc, or bd
Regex Grouping Precedence
• Arranged from highest to lowest precedence:Name Representation
Parenthesis ( ) (?: )
Multipliers ? + * {m,n} ?? +? *? {m,n}?
Sequence and Anchoring abc ^ $ \A \Z (?= ) (?! )
Alternation |
Example:/a|b*/; # interpreted as /a|(b*)/, not (a|b)*
/a|(?:b*)/ ; # same, but does not trigger memory
# to store into \1
The pattern binding =~ operator
• Use the =~ to bind pattern to a scalar variable other than the default $_ variable
• To match the regex to $name from keyboard:
print “Proceed (y/Y)? ”; # produce prompt
chomp ($name = <STDIN>); # chomp input
if ($name =~ /^[yY]/) # test both cases
print “Proceeding.”; # display decision
Ignoring case & other delimiters
• Append an i to the regex to ignore case:print “Proceed (y/Y)? ”; # produce promptchomp ($name = <STDIN>); # chomp inputif ($name =~ /^y/i) # use either case
print “Proceeding.”; # display decision
• To use a different delimiter:– Place an m followed by a new character in place of
slashes (i.e. a #)print “Proceed (y/Y)? ”; # produce promptchomp ($name = <STDIN>); # chomp inputif ($name =~ m#^y#i) # new # delimiter
print “Proceeding.”; # display decision
Variable Interpolation
• A regex can be constructed from computed strings rather than literals:
$sentence = “Every good bird does fly.”;
print “What should I look for? “; # prompt
$what = <STDIN>; # read keyboard
chomp($what); # chomp input
if ($sentence =~ /$what/) # matches [bw]ird
{ print “I saw $what in $sentence. \n”; }
else { print “Nope… didn’t find it.\n”; }
Special Read-only Variables
• Upon a successful pattern match, $1, $2, $3… are set to values in \1, \2, \3…
• These read-only variables can be used in later parts of the program:
$_ = “This is a test”;
/(\w+)\W+(\w+)/; # match first two words
# $1 is now “this” and
# $2 is now “is”
($first,$second) = /(\w+)\W+(\w+)/;
# $first is now “this” and $second is now “is”
More Read-only Variables
• Use the $& variable to examine part of string matching a regex
• $` is part of string before matching part• $’ is part of string after matching part$_ = “This is a sample string”;/sa.*le/; # matches “sample”
# $` is now “This is a “# $& is now “sample”# $’ is now “ string”
Substitutions
• Use the substitution operator:s/regex/new-string/
• Replacement strings can be variable interpolated
• Can use pattern characters in the regex, and special read-only variables
• Can use ignore case and custom delimiters• Can use the pattern binding =~ operator
Split Function
• The split function splits a string into fields delimited by a regex
$line = “merlyn::118:10:Randal:/home/merlyn:/usr/bin/perl”;
@fields = split(/:/,$line); # split $line using
# : as delimiter
# @fields is now
# (“merlyn”, “”, “118”, “10”, “Randal”, “/home/merlyn”,
# “/usr/bin/perl”)
Splitting in list context
$line = “merlyn::118:10:Randal:/home/merlyn:”;
($name,$password,$uid,$gid,$gcos,$home,$shell) = split(/:/,$line); # split $line using : as delimiter
# $name is now “merlyn”,
# $password is now “”,
# $uid is now “118”,
# $gid is now “10”,
# $gcos is now “Randal”,
# $home is now “/home/merlyn”,
# $shell is now undef
The “Default” Split
$_ = “some string”;
@words = split;
# same as @words = split(/\s+/, $_);
# where \s+ specifies 1 or more spaces
# @words is now (“some”,“string”)
Join Function
• The join function joins a list of values with a glue string between list elements
• The $line can be reconstructed from the @field using
$line = join(“:”, @fields); # glue string “:”
# is not a regex
Glue Ahead & Trailing Glue
$_ = "some string"; # initialize default string
@words = split; # perform default split
print "@words\n"; # show split result
$result = join("+","",@words); # glue ahead
print "$result\n"; # $result is “+some+string”
$output = join(“\n”, @word, “”); # trailing glue
print $output\n”; # $output is “some\nstring\n”
top related