Download - Perl Regular Expressions
![Page 1: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/1.jpg)
Regular Expressions
Regular Expression (or pattern) in Perl – is a template that either matches or doesn’t match a given string.
if( $str =~ /hello/){…
}
while( <STDIN> ){if( /hello/ ){…}
}
Regular Expressions in Perl:
@words = split /\s+/, $str;
![Page 2: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/2.jpg)
Regular Expressions (3)
/to.*ols/ matches ‘to’, followed by any string, followed by ‘ols’./to?ols/ the character before ‘?’ is optional. Thus, there are only two matching strings – ‘tools’ and ‘tols’.
/hello.you/ matches any string that has ‘hello’, followed by any one (exactly one) character, followed by ‘you’./to*ols/ last character before ‘*’ may be repeated zero or more times. Matches ‘tools’,’tooooools’,’tols’ (but not ‘toxols’ !!!)/to+ols/ ------//------- one or more -----//------.
“.” matchs any char except a newline \n
![Page 3: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/3.jpg)
Regular Expressions (4)
Grouping – parentheses ‘( )’ are used for grouping one or more characters.
/(tools)+/ matches “toolstoolstoolstools”.
Alternatives:/hello (world|Perl)/ - matches “hello world”, “hello Perl”.
![Page 4: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/4.jpg)
Regular Expressions (5)
Character Class /Hello [abcde]/ matches “Hello a” or “Hello b” …/Hello [a-e]/ the same as above
Negating:[^abc] any char except a,b,c
![Page 5: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/5.jpg)
Regular Expressions (6)
Shortcuts • \d digit• \w word character [A-Za-z0-9_]• \s white space
Negative ^ – [^\d] matches non digit \S anything not \s\D anything not \d
![Page 6: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/6.jpg)
Regular Expressions (8)
Anchors^ - marks the beginning of the string $ - marks the end of the string
/^Hello Perl/ - matches “Hello Perl, good by Perl”, but not “Perl Hello Perl”
/^\s*$/ - matches all blank lines
/^abc/ - “^” beginning of a string/a\^bc/ - matches “\^” /[^abc]/ - negating
![Page 7: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/7.jpg)
Regular Expressions (9)
\b - matches at either end of a word (matches the start or the end of a group of \w characters)
/\bPerl\b/ - matches “Hello Perl”, “Perl” but not “Perl++”
\B - negative of \b
![Page 8: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/8.jpg)
Regular Expressions (10)
Backreferences:/(World|Perl) \1/ - matches “World World”, “Perl Perl”.
/((hello|hi) (world|Perl))/•\1 refers to (hello|hi) (world|Perl)•\2 refers to (hello|hi)•\3 refers to (world|Perl)
$1,$2,$3 store the values of \1,\2,\3 after a reg.expr. is applied.
![Page 9: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/9.jpg)
Examples:
$date="12 10 10";if($date=~ /(\d+)/){ print $1.":".$2.":".$3.":\n"; #output ($2 and $3 are empty): #12:::}
if($date=~ /(\d+)(\s+\1)+/){ print $1.":".$2.":".$3.":\n"; #output (notice $3 is empty): #10: 10::}
$str="Hello World";if($str=~ /((Hello|Hi) (World|Perl))/){ print $1.":".$2.":".$3.":\n"; #output: #Hello World:Hello:World:}
$str="Hello Perl Hi";if($str=~ /((Hello|Hi) (World|Perl)) \1/){ print $1.":".$2.":".$3.":\n"; #output: non}
$str="Hi Perl Hi Perl";if($str=~ /((Hi|Hello) (World|Perl)) \1/){ print $1.":".$2.":".$3.":\n"; #output: #Hi Perl:Hi:Perl:}
![Page 10: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/10.jpg)
Examples
1. What is it?/^0x[0-9a-fA-F]+$/
2. Date format: Month-Day-Year -> Year:Day:Month $date = “12-31-1901”;$date =~ s/(\d+)-(\d+)-(\d+)/$3:$2:$1/;
![Page 11: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/11.jpg)
Examples
3. Make a pattern that matches any line of input that has the same word repeated two (or more) times in a row. Whitespace between words may differ.
![Page 12: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/12.jpg)
Example
1. /\w+/ #matches a word2. /(\w+)/ #to remember later3. /(\w+)\1/ #two times4. /(\w+)\s+\1/ #whitespace between
words5. “This is a test” -> /\b(\w+)\s+\1/6. “This is the theory” -> /\b(\w+)\s+\1\b/
![Page 13: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/13.jpg)
HomeWork
1) Write a regular expression that identifies a 24-hour clock. For example: 0:01, 00:20, 15:00, 23:59
2) Write a regular expression that identifies a floating point. For example: 10, 10.0001, -0.1, +001.3456789
For both assignments write a single program that identifies these patterns in the input lines and prints out only the matched patterns.
![Page 14: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/14.jpg)
HomeWork
3) Write a CGI Perl script that extracts all http links from a given WWW page.
Input: http address. It is received from a HTML text box.Output: list of all http links found in <a href=“link”> field.
Input Examples:
http://www.tau.ac.ilhttp://www.cs.tau.ac.ilhttp://www.cnn.com
![Page 15: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/15.jpg)
HomeWork (3)
Remarks:1) You need to create two pages - (1) html
page with a text box (2) cgi script that receives the input and formats output html file.
2) Unix command ‘wget’ downloads html files.
3) Use regular expressions. The code for parsing should be small, 3-10 lines.
![Page 16: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/16.jpg)
Regular Expressions
Quantifiers:/a{3,6}/ - matches “a” repeated 3,4,5,6 times/(abc){3,}/ - matches three or more repetitions of “abc”./a{3}/ - matches exactly three repetitions of “a”.
* = {0,}+ = {1,}? = {0,1}
![Page 17: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/17.jpg)
Negated Match
if( $str =~ /hello/){…
}
if( $str !~ /hello/){…
}
Negation
![Page 18: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/18.jpg)
Regular Expressions (11)
$& - what really was matched $` - what was before$’ - the rest of the string after the matched pattern
$` . $& . $’ - original string
![Page 19: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/19.jpg)
Regular Expressions (12)
Substitutions:s/T/U/; #substitutes T with U (only once)s/T/U/g; #global substitutions/\s+/ /g; #collapses whitespacess/(\w+) (\w+)/$2 $1/g;
s/T/U/; #applied on $_ variable$str =~ s/T/U/;
![Page 20: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/20.jpg)
Regular Expressions (13)
File Extension Renaming:my ($from, $to) = @ARGV;@files = glob (“*.$from”);foreach $file (@files){
$newfile = $file;$newfile =~ s/\.$from/\.$to/g;rename($file, $newfile);
}
=~ s/\.$from$/\.$to/g
![Page 21: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/21.jpg)
Split and Join
$str=“aaa bbb ccc dddd”;@words = split /\s+/, $str;$str = join ‘:‘, @words; #result is “aaa:bbb:ccc:dddd”
@words = split /\s+/, $_; “ aaa b” -> “”, “aaa”, “b”@words = split; “ aaa b” -> “aaa”, “b”@words = split ‘ ‘, $_; “ aaa b” -> “aaa”, “b”
![Page 22: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/22.jpg)
Grep
grep EXPR, LIST;
@results = grep /^>/, @array;@results = grep /^>/, <FILE>;
![Page 23: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/23.jpg)
Regular Expressions (2)
Regular Expressions in Unix:grep “include .*h” *.h
regular expression globes
![Page 24: Perl Regular Expressions](https://reader034.vdocuments.us/reader034/viewer/2022042908/577cc9c61a28aba711a494ba/html5/thumbnails/24.jpg)
Defined/Undef
my $i;
if( defined $i ) #false
$i=0;
if( defined $i ) #true
my %hash; #or %hash=(); defined %hash; #false, hash is empty
$hash{“1”}=“one”;exists($hash{“1”})==defined($hash{“1”})==true;
undef $hash{“1”};exists($hash{“1”})== true;defined($hash{“1”})==false;
delete $hash{“1”};exists($hash{“1”})== false;defined($hash{“1”})==false;