perl training regex
DESCRIPTION
Perl Training RegexTRANSCRIPT
PERL Regular Expressions
Regular Expressions (0)
• It’s a template that either matches or doesn’t match a given string.
• One of the most important features of PERL - “a strong regular expression support”
/PATTERN/
Regular Expressions (1)
the “Dirty Dozen” – Metacharacters
These characters have special meaning in regular expressions.
A backslash in front of any meta-character makes it non special.
\ . * + ? ( ) | [ { ^ $
Regular Expressions (2)
/to.*ols/ matches ‘to’, followed by any string, followed by ‘ols’.
/hello.you/ matches any string that has ‘hello’, followed by any one (exactly one) character, followed by ‘you’.
/to*ols/ last character before ‘*’ may be repeated zero or more times. Matches ‘tools’,’tooooools’,’tols’ (but not ‘toxols’ !!!)
/to+ols/ ------//------- one or more -----//------.
“.” matches any char except a newline “\n”Quantifiers – decides how many time the
preceding item has to be repeated.
Regular Expressions(3)
/to?ols/ the character before ‘?’ is optional. Thus, there are only two matching strings – ‘tools’ and ‘tols’.
/to{2}ls/ the number in ‘{}’ tells about the repetitions
{count} - Match exactly count times
{min,max} - Match at least min but not more than max times
{min,} - Match at least min times
Write {} quantifier for ‘*’, ‘+’, ‘?’ ?
Regular Expressions (4)
Grouping – parentheses ‘( )’ are used for grouping one or more characters.
/(tools)+/ matches “toolstoolstoolstools”.
Alternatives:
/hello (world|Perl)/ - matches “hello world”, “hello Perl”.
Regular Expressions (5)
Character Class - A list of all possible characters
/Hello [abcde]/ matches “Hello a” or “Hello b” …
/Hello [a-e]/ the same as above
Negating:
[^abc] any char except a,b,c
Regular Expressions (6)
Shortcuts
• \d digit [0-9]
• \w word character [A-Za-z0-9_ ]
• \s white space [\n \t \r \s]
Negative ^ – [^\d] matches non digit
\S anything not \s
\D anything not \d
\W anything not \w
The character classes for -
1. Matching of vowels
2. Matching of consonants
3. Anything other than non Numbers
Diff between – \D and [^\d]
Regular Expressions (7)
Anchors
^ - marks the beginning of the string
$ - marks the end of the string
/^Hello Perl/ - matches “Hello Perl, good by Perl”, but not “Perl Hello Perl”
What pattern will match blank lines ?
/^\s*$/ - matches all blank lines
/^abc/ - “^” beginning of a string
/a\^bc/ - matches “\^”
/[^abc]/ - negating
Regular Expressions (8)
\b - matches at either end of a word (matches the start or the end of a group of \w characters)
/\bPerl\b/ - matches “Hello Perl”, “Perl”
but not “Perl++”
\B - negative of \b
/^\w+\b/ matches with what part of “ That’s my house”
Regular Expressions (9)
Back references:
/(World|Perl) \1/ - matches “World World”, “Perl Perl”.
/((hello|hi) (world|Perl))/
•\1 refers to (hello|hi) (world|Perl)
•\2 refers to (hello|hi)
•\3 refers to (world|Perl) $1,$2,$3 store the values of \1,\2,\3 after a reg.expr. is applied.
Regular Expressions (10)
Option modifiers
/i : Case insensitive
/s : “.” will match “\n”
/m : Let “^” & “$” match next to embedded “\n”
/x : Ignore white spaces
/o : Compile the pattern once
Regular Expressions (11)
Bind Operator “ =~ ” Tells Perl to match the pattern on the right
against the string on the left.
Pattern match operator “ m// ” $str =~ /pattern/; $str =~ m/pattern/;
if( $str =~ /hello/){
…
}
while( <STDIN> ){
if( /hello/ ){
…
}
}@words = split /\s+/, $str;
When no variable is mentioned the pattern is matched with default variable “$_”
Regular Expressions (12)
Examples$date="12 10 10";if($date=~ /(\d+)/){ print $1.":".$2.":".$3.":\n";}
#output ($2 and $3 are empty): #12:::
if($date=~ /(\d+)(\s+\1)+/){ print $1.":".$2.":".$3.":\n"; }
#output (notice $3 is empty): #10: 10::
$str="Hello World";if($str=~ /((Hello|Hi) (World|Perl))/){ print $1.":".$2.":".$3.":\n"; }
#output:#Hello World:Hello:World:
$str="Hello Perl Hi";if($str=~ /((Hello|Hi) (World|Perl)) \1/){ print $1.":".$2.":".$3.":\n"; }
#output: non$str="Hello Perl Hi";if($str=~ /((Hello|Hi) (World|Perl)) \1/){ print $1.":".$2.":".$3.":\n"; }
#output:#Hi Perl:Hi:Perl:
Examples
1. What is it?
/^0x[0-9a-fA-F]+$/
2. Date format: Month-Day-Year -> Year:Day:Month
$date = “12-31-1901”;
$date =~ s/(\d+)-(\d+)-(\d+)/$3:$2:$1/;
Examples
4. /^\w+\b/ matches with what part of “ That’s my house”
3. Make a pattern that matches any line of input that has the same word repeated two (or more) times in a row. Whitespace between words may differ.
Example
1. /\w+/ #matches a word
2. /(\w+)/ #to remember later
3. /(\w+)\1/ #two times
4. /(\w+)\s+\1/ #whitespace between words
5. “This is a test” -> /\b(\w+)\s+\1/
6. “This is the theory” -> /\b(\w+)\s+\1\b/
Lets try
1) Write a regular expression that identifies a 24-hour clock. For example: 0:01, 00:20, 15:00, 23:59
2) Write a regular expression that identifies a floating point. For example: 10, 10.0001, -0.1, +001.3456789
For both write a single program that identifies these patterns in the input lines and prints out only the matched patterns.
Negated Match
if( $str =~ /hello/){
…
}
if( $str !~ /hello/){
…
}
Negation
Regular Expressions (13)
$& - what really was matched
$` - what was before
$’ - the rest of the string after the matched pattern
$` . $& . $’ - original string
Caution: Never use this in your script if you really don’t need this.
Regular Expressions (14)
Substitutions:
s/T/U/; #substitutes T with U (only once)
s/T/U/g; #global substitution
s/\s+/ /g; #collapses whitespaces
s/(\w+) (\w+)/$2 $1/g;
s/T/U/; #applied on $_ variable
$str =~ s/T/U/;
Regular Expressions (15)
File Extension Renaming:
my ($from, $to) = @ARGV;
@files = glob (“*.$from”);
foreach $file (@files){
$newfile = $file;
$newfile =~ s/\.$from/\.$to/g;
rename($file, $newfile);
}
=~ s/\.$from$/\.$to/g
Split and Join
$str=“aaa bbb ccc dddd”;
@words = split /\s+/, $str;
$str = join ‘:‘, @words; #result is “aaa:bbb:ccc:dddd”
@words = split /\s+/, $_; “ aaa b” -> “”, “aaa”, “b”
@words = split; “ aaa b” -> “aaa”, “b”
@words = split ‘ ‘, $_; “ aaa b” -> “aaa”, “b”
Grep
grep EXPR, LIST;
@results = grep /^>/, @array;@results = grep /^>/, <FILE>;
Thank You !!!