perl training regex

PERL Regular Expressions

Regular Expressions (0)

• It’s a template that either matches or doesn’t match a given string.

• One of the most important features of PERL - “a strong regular expression support”

/PATTERN/


the “Dirty Dozen” – Metacharacters

These characters have special meaning in regular expressions.

A backslash in front of any meta-character makes it non special.

\ . * + ? ( ) | [ { ^ $


/to.*ols/ matches ‘to’, followed by any string, followed by ‘ols’.

/hello.you/ matches any string that has ‘hello’, followed by any one (exactly one) character, followed by ‘you’.

/to*ols/ last character before ‘*’ may be repeated zero or more times. Matches ‘tools’,’tooooools’,’tols’ (but not ‘toxols’ !!!)

/to+ols/ ------//------- one or more -----//------.

“.” matches any char except a newline “\n”Quantifiers – decides how many time the

preceding item has to be repeated.

Regular Expressions(3)

/to?ols/ the character before ‘?’ is optional. Thus, there are only two matching strings – ‘tools’ and ‘tols’.

/to{2}ls/ the number in ‘{}’ tells about the repetitions

{count} - Match exactly count times

{min,max} - Match at least min but not more than max times

{min,} - Match at least min times

Write {} quantifier for ‘*’, ‘+’, ‘?’ ?


Grouping – parentheses ‘( )’ are used for grouping one or more characters.

/(tools)+/ matches “toolstoolstoolstools”.

Alternatives:

/hello (world|Perl)/ - matches “hello world”, “hello Perl”.


Character Class - A list of all possible characters

/Hello [abcde]/ matches “Hello a” or “Hello b” …

/Hello [a-e]/ the same as above

Negating:

[^abc] any char except a,b,c


Shortcuts

• \d digit [0-9]

• \w word character [A-Za-z0-9_ ]

• \s white space [\n \t \r \s]

Negative ^ – [^\d] matches non digit

\S anything not \s

\D anything not \d

\W anything not \w

The character classes for -

1. Matching of vowels

2. Matching of consonants

3. Anything other than non Numbers

Diff between – \D and [^\d]


Anchors

^ - marks the beginning of the string

$ - marks the end of the string

/^Hello Perl/ - matches “Hello Perl, good by Perl”, but not “Perl Hello Perl”

What pattern will match blank lines ?

/^\s*$/ - matches all blank lines

/^abc/ - “^” beginning of a string

/a\^bc/ - matches “\^”

/[^abc]/ - negating


\b - matches at either end of a word (matches the start or the end of a group of \w characters)

/\bPerl\b/ - matches “Hello Perl”, “Perl”

but not “Perl++”

\B - negative of \b

/^\w+\b/ matches with what part of “ That’s my house”


Option modifiers

/i : Case insensitive

/s : “.” will match “\n”

/m : Let “^” & “$” match next to embedded “\n”

/x : Ignore white spaces

/o : Compile the pattern once


Bind Operator “ =~ ” Tells Perl to match the pattern on the right

against the string on the left.

Pattern match operator “ m// ” $str =~ /pattern/; $str =~ m/pattern/;

if( $str =~ /hello/){

…

}

while( <STDIN> ){

if( /hello/ ){

…

}

}@words = split /\s+/, $str;

When no variable is mentioned the pattern is matched with default variable “$_”


Examples$date="12 10 10";if($date=~ /(\d+)/){ print $1.":".$2.":".$3.":\n";}

#output ($2 and $3 are empty): #12:::

if($date=~ /(\d+)(\s+\1)+/){ print $1.":".$2.":".$3.":\n"; }

#output (notice $3 is empty): #10: 10::

$str="Hello World";if($str=~ /((Hello|Hi) (World|Perl))/){ print $1.":".$2.":".$3.":\n"; }

#output:#Hello World:Hello:World:

$str="Hello Perl Hi";if($str=~ /((Hello|Hi) (World|Perl)) \1/){ print $1.":".$2.":".$3.":\n"; }

#output: non$str="Hello Perl Hi";if($str=~ /((Hello|Hi) (World|Perl)) \1/){ print $1.":".$2.":".$3.":\n"; }

#output:#Hi Perl:Hi:Perl:

Examples

1. What is it?

/^0x[0-9a-fA-F]+$/

2. Date format: Month-Day-Year -> Year:Day:Month

$date = “12-31-1901”;

$date =~ s/(\d+)-(\d+)-(\d+)/$3:$2:$1/;

Examples

4. /^\w+\b/ matches with what part of “ That’s my house”

3. Make a pattern that matches any line of input that has the same word repeated two (or more) times in a row. Whitespace between words may differ.

Example

1. /\w+/ #matches a word

2. /(\w+)/ #to remember later

3. /(\w+)\1/ #two times

4. /(\w+)\s+\1/ #whitespace between words

5. “This is a test” -> /\b(\w+)\s+\1/

6. “This is the theory” -> /\b(\w+)\s+\1\b/

Lets try

1) Write a regular expression that identifies a 24-hour clock. For example: 0:01, 00:20, 15:00, 23:59

2) Write a regular expression that identifies a floating point. For example: 10, 10.0001, -0.1, +001.3456789

For both write a single program that identifies these patterns in the input lines and prints out only the matched patterns.

Negated Match

if( $str =~ /hello/){

…

}

if( $str !~ /hello/){

…

}

Negation


$& - what really was matched

$` - what was before

$’ - the rest of the string after the matched pattern

$` . $& . $’ - original string

Caution: Never use this in your script if you really don’t need this.


Substitutions:

s/T/U/; #substitutes T with U (only once)

s/T/U/g; #global substitution

s/\s+/ /g; #collapses whitespaces

s/(\w+) (\w+)/$2 $1/g;

s/T/U/; #applied on $_ variable

$str =~ s/T/U/;


File Extension Renaming:

my ($from, $to) = @ARGV;

@files = glob (“*.$from”);

foreach $file (@files){

$newfile = $file;

$newfile =~ s/\.$from/\.$to/g;

rename($file, $newfile);

}

=~ s/\.$from$/\.$to/g

Split and Join

$str=“aaa bbb ccc dddd”;

@words = split /\s+/, $str;

$str = join ‘:‘, @words; #result is “aaa:bbb:ccc:dddd”

@words = split /\s+/, $_; “ aaa b” -> “”, “aaa”, “b”

@words = split; “ aaa b” -> “aaa”, “b”

@words = split ‘ ‘, $_; “ aaa b” -> “aaa”, “b”

Grep

grep EXPR, LIST;

@results = grep /^>/, @array;@results = grep /^>/, <FILE>;

Thank You !!!

perl training regex

Documents