perl regular expressions

Regular Expressions

Regular Expression (or pattern) in Perl – is a template that either matches or doesn’t match a given string.

if( $str =~ /hello/){…

while( <STDIN> ){if( /hello/ ){…}

Regular Expressions in Perl:

@words = split /\s+/, $str;

Regular Expressions (3)

/to.*ols/ matches ‘to’, followed by any string, followed by ‘ols’./to?ols/ the character before ‘?’ is optional. Thus, there are only two matching strings – ‘tools’ and ‘tols’.

/hello.you/ matches any string that has ‘hello’, followed by any one (exactly one) character, followed by ‘you’./to*ols/ last character before ‘*’ may be repeated zero or more times. Matches ‘tools’,’tooooools’,’tols’ (but not ‘toxols’ !!!)/to+ols/ ------//------- one or more -----//------.

“.” matchs any char except a newline \n

Grouping – parentheses ‘( )’ are used for grouping one or more characters.

/(tools)+/ matches “toolstoolstoolstools”.

Alternatives:/hello (world|Perl)/ - matches “hello world”, “hello Perl”.

Character Class /Hello [abcde]/ matches “Hello a” or “Hello b” …/Hello [a-e]/ the same as above

Negating:[^abc] any char except a,b,c

Shortcuts • \d digit• \w word character [A-Za-z0-9_]• \s white space

Negative ^ – [^\d] matches non digit \S anything not \s\D anything not \d

Anchors^ - marks the beginning of the string $ - marks the end of the string

/^Hello Perl/ - matches “Hello Perl, good by Perl”, but not “Perl Hello Perl”

/^\s*$/ - matches all blank lines

/^abc/ - “^” beginning of a string/a\^bc/ - matches “\^” /[^abc]/ - negating

\b - matches at either end of a word (matches the start or the end of a group of \w characters)

/\bPerl\b/ - matches “Hello Perl”, “Perl” but not “Perl++”

\B - negative of \b

Backreferences:/(World|Perl) \1/ - matches “World World”, “Perl Perl”.

$1,$2,$3 store the values of \1,\2,\3 after a reg.expr. is applied.

Examples:

$date="12 10 10";if($date=~ /(\d+)/){ print $1.":".$2.":".$3.":\n"; #output ($2 and $3 are empty): #12:::}

if($date=~ /(\d+)(\s+\1)+/){ print $1.":".$2.":".$3.":\n"; #output (notice $3 is empty): #10: 10::}

$str="Hello World";if($str=~ /((Hello|Hi) (World|Perl))/){ print $1.":".$2.":".$3.":\n"; #output: #Hello World:Hello:World:}

$str="Hello Perl Hi";if($str=~ /((Hello|Hi) (World|Perl)) \1/){ print $1.":".$2.":".$3.":\n"; #output: non}

$str="Hi Perl Hi Perl";if($str=~ /((Hi|Hello) (World|Perl)) \1/){ print $1.":".$2.":".$3.":\n"; #output: #Hi Perl:Hi:Perl:}

Examples

1. What is it?/^0x[0-9a-fA-F]+$/

2. Date format: Month-Day-Year -> Year:Day:Month $date = “12-31-1901”;$date =~ s/(\d+)-(\d+)-(\d+)/$3:$2:$1/;

Examples

3. Make a pattern that matches any line of input that has the same word repeated two (or more) times in a row. Whitespace between words may differ.

Example

1. /\w+/ #matches a word2. /(\w+)/ #to remember later3. /(\w+)\1/ #two times4. /(\w+)\s+\1/ #whitespace between

words5. “This is a test” -> /\b(\w+)\s+\1/6. “This is the theory” -> /\b(\w+)\s+\1\b/

HomeWork

1) Write a regular expression that identifies a 24-hour clock. For example: 0:01, 00:20, 15:00, 23:59

2) Write a regular expression that identifies a floating point. For example: 10, 10.0001, -0.1, +001.3456789

For both assignments write a single program that identifies these patterns in the input lines and prints out only the matched patterns.

HomeWork

3) Write a CGI Perl script that extracts all http links from a given WWW page.

Input: http address. It is received from a HTML text box.Output: list of all http links found in <a href=“link”> field.

Input Examples:

http://www.tau.ac.ilhttp://www.cs.tau.ac.ilhttp://www.cnn.com

HomeWork (3)

Remarks:1) You need to create two pages - (1) html

page with a text box (2) cgi script that receives the input and formats output html file.

2) Unix command ‘wget’ downloads html files.

3) Use regular expressions. The code for parsing should be small, 3-10 lines.

Regular Expressions

Quantifiers:/a{3,6}/ - matches “a” repeated 3,4,5,6 times/(abc){3,}/ - matches three or more repetitions of “abc”./a{3}/ - matches exactly three repetitions of “a”.

* = {0,}+ = {1,}? = {0,1}

Negated Match

if( $str =~ /hello/){…

if( $str !~ /hello/){…

Negation

$& - what really was matched $` - what was before$’ - the rest of the string after the matched pattern

$` . $& . $’ - original string

Substitutions:s/T/U/; #substitutes T with U (only once)s/T/U/g; #global substitutions/\s+/ /g; #collapses whitespacess/(\w+) (\w+)/$2 $1/g;

s/T/U/; #applied on $_ variable$str =~ s/T/U/;

File Extension Renaming:my ($from, $to) = @ARGV;@files = glob (“*.$from”);foreach $file (@files){

$newfile = $file;$newfile =~ s/\.$from/\.$to/g;rename($file, $newfile);

=~ s/\.$from$/\.$to/g

Split and Join

$str=“aaa bbb ccc dddd”;@words = split /\s+/, $str;$str = join ‘:‘, @words; #result is “aaa:bbb:ccc:dddd”

@words = split /\s+/, $_; “ aaa b” -> “”, “aaa”, “b”@words = split; “ aaa b” -> “aaa”, “b”@words = split ‘ ‘, $_; “ aaa b” -> “aaa”, “b”

grep EXPR, LIST;

@results = grep /^>/, @array;@results = grep /^>/, <FILE>;

Regular Expressions in Unix:grep “include .*h” *.h

regular expression globes

Defined/Undef

my $i;

if( defined $i ) #false

if( defined $i ) #true

my %hash; #or %hash=(); defined %hash; #false, hash is empty

$hash{“1”}=“one”;exists($hash{“1”})==defined($hash{“1”})==true;

undef $hash{“1”};exists($hash{“1”})== true;defined($hash{“1”})==false;

delete $hash{“1”};exists($hash{“1”})== false;defined($hash{“1”})==false;

perl regular expressions

Documents

perl: regular expressions -...

regular expressions aleksandr lenin. outline motivation for...

1 regular expressions. 2 regular expressions describe...

perl and regular expressions regular expressions are...

regular expressions. the purpose regular expressions are the...

perl regular expressions this powerpoint file can be found...

efficient submatch addressing for regular...

programming in perl - florida state...

regular expressions friend or foe?. introduction to regular...

9-sep-15 regular expressions. about “regular”...

perl part 3 1.subroutines 2.pattern matching and regular...

perl: lecture 2 advanced re & cgi. regular expressions 2

perl - bioperl · modules programmation objets en perl...

regular expressions in perl – part i

1 perl regular expressions. things perl can do easily with...

regular expressions -- sas and perl

programming in perl regular expressions and m,s operators...

265-29: an introduction to perl regular expressions …...

265-29: an introduction to perl regular expressions in sas 9

perl 101: regular expressions -...