perl regular expression: string manipulation. substr function string = substr(string2,start pos...

29
Perl Regular expression: string manipulation

Upload: leona-wright

Post on 04-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

Perl

Regular expression: string manipulation

Page 2: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

substr function

• string = substr(string2,start pos (starts with 0), offset)—returns a substring after the start point to offset—string2 is not changed—$str2 = "Hi There";—$str = substr($str2, 3, 2);

– $str = "Th"; # from 4 position to 5 position;

• substr(string,start pos, offset) = string2—puts string2 after the start pos and removing old string

characters to offset.—$str2 = "Hi There"; $str = "hi";—substr($str2, 3,3) = $str; #insert and replace

– $str2 = "Hi hire";

—substr($str2, 3,0) = $str; #insert only.– $str2 = "Hi hihire";

Page 3: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

index and rindex

• index string, substring [, offset]—returns the position before the substring in

string, else -1—with offset, position after the offset, else -1

• rindex string, substring [, offset]—return the last occurrence of the substring,

else -1—with offset, the right most position that may

be returned.

• $pos = index $str, $str2—returns the position where $str2 is found in

$str

Page 4: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

example of substr and index

• $str = "There there Jim";• $sstr = "Jim";• $replace = "Fred";

• substr($str,(index $str,$sstr),3)= $replace;—replace Jim with Fred in $str—$str = "There there Fred";

• The substitution operator is an easier way to do this.

Page 5: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

grep

• LIST = grep EXPR, LIST• LIST = grep BLOCK LIST• like map, each element is assigned to the

$_, then processed by BLOCK or EXPR, results are put into the list.

@new = grep /[a-zA-Z]/, @lines

• NOTE: altering $_ will alter the original list@list = qw(barney fred dino wilma)@greplist = grep {s/^[bfd]//} @list

—@greplist = "arney", "red", "ino"—@list = "arney", "red", "ino", "wilma"

Page 6: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

s/// Operator (Substitution)

• $str =~ s/pattern to match/replacement/;—find the first match and replace it

• $str =~ s/pattern to match/replacement/g;—Find all matches and replace each of them.

• Simple substitution• $str = "3 dogs bit 1 dog";• $str =~ s/dog/cat/;

—$str = "3 cats bit 1 dog";

• $str =~ s/dog/cat/g;—$str = "3 cats bit 1 cat";

Page 7: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

s/// Operator (Substitution) (2)

• s/pattern//;—remove the pattern found

• $str = "abad";• s/a//g;

—$str ="bd";

• From substr and index slide$str =~ s/$sstr/$replace/;OR$str =~ s/Jim/Fred/;

Page 8: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

case insensitive substitution

• /i ignore case• $str = "Dog, dog, dOg";• s/DOG/cat/ig;

—$str = "cat, cat, cat";

• $str = "Dog, dog, dOg";• s/DOG/cAt/ig;

—$str = "cAt, cAt, cAt";—The replacement string is replaced as written.

Page 9: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

examples

• $str = "fred xxx barney";—$str =~ s/x/boom/;

– $str = "fred boomxx barney"

—$str =~ s/x/boom/g;– $str = "fred boomboomboom barney";

—$str =~ s/x+/boom/;– $str = "fred boom barney";

Page 10: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

alternation and group matching

• | allows an or'd matching• $str = "Wilma Flintstone";• $str =~ s/Fred|Wilma|Pebbles/Dino/g;

—$str = "Dino Flintstone";—Replace all instances of Fred or Wilma or

Pebbles with Dino.

• $str = "1st time winner";• $str =~ s/(1st|2nd|3rd) time/Last place/;

—$1 is the match, “1st” Entire match is “1st time”

—$str = "Last place winner"

Page 11: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

single character substitution

• Using []• $str =~ s/[abc]/d/; #sub a, b, or c with d• $str =~ s/[Fred]/x/g;

—If $str was "Fred", after it would be "xxxx"

• $str =~ s/[^aeiouAEIOU]/_/g;—replace any non-vowel with an _

• Common mistake:• $str =~ s/[a-z]/[A-Z]/g;

—Should replaces any lower case letter with upper case letters but replace side is literal (not a pattern)

—if $str = "hi", then it would be "[A-Z][A-Z]";—NOTE: $str = uc $str; #upper cases a string.

Page 12: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

matching quantifiers

• $str =~ s/a{3}/b/;—first instance of aaa is replace with b

• $str = "aaaaa"; # use this for the rest of the slide• $str =~ s/a{3,}/b/; #max matching

—$str = "b"

• $str =~ s/a{3,}?/b/; #min matching—$str = "baa"; #only sub 3 to make a min match

• $str =~ s/(a{3,}?)(a*)/b/;—$str = "b"; $1 = "aaa"; $2 = "aa";

• $str =~ s/(a{3,})(a*)/b/;—$str = "b"; $1 = "aaaaa"; $2 = "";

• $str =~ s/(a{3,}?)(a*?)/b/;# min match on both—$str = "baa"; $1 = "aaa"; $2 = "";

Page 13: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

matching quantifiers (2)

• $str = "aaaaab"; # use this for the rest of the slide

• $str =~ s/a{3,}?b/c/;—$str = "c", why? in order to make the match, it

used all the a's to include the b.

• + 1 or more and ? 0 or 1 time (max match)

• $str =~ s/(a+)(b?)/c/;—$str = "c", $1 = "aaaaa" and $2 = "b"

• $str =~ s/(a+?)(b??)/c/; #min match—$str = "caaaab"; $1 ="a"; $2 = "";

Page 14: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

matching quantifiers (3)

• Example and perl doesn’t always do what you think.

• $str = "ddogg";• $str =~ s/d.*g/cat/;

—$str = "cat" # max match, makes sense

• $str = "ddogg";• $str =~ s/d.*?g/cat/;

—$str = "catg"; #min match, but not the best min match it can make.

Page 15: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

matching quantifiers (4)

• More Examples (with $_ variable)

$_ = "a xxx c xxxxx c xxx d";• s/x{1,}/d/g; produces "a d c d c d d"• s/x{1,}?/d/g; produces "a ddd c ddddd c

ddd d"• s/x{1,2}/d/g; prodcues "a dd c ddd c dd d"• s/x{1,3}/d/g; produces "a d c dd c d d"• s/x{2,2}/d/g; produces "a dx c ddx c dx d"

—or s/x{2}/d/g;

Page 16: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

Anchoring

• $str = "Fred Flintstone Fred"• $str =~ s/Fred/Wilma/g;

—Replaces all instances of Fred with Wilma

• $str =~ s/Fred$/Wilma/g;—Only the last instance, "Fred Flintstone

Wilma", even with /g flag

• $str =~ s/^Fred/Wilma/g;—only the first instance, "Wilma Flintstone Fred",

even with the /g flag

• $str = "abcd";• $str =~ s/^[abc]+/d/;

—$str = "dd";

Page 17: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

Parentheses as memory

• s/a(.)b(.)c\2d\1/a mess/;—"adbecedd" is converted to "a mess"—"adbecdde" is not converted.

• s/a(.*)b\1c/a mess/;—"addbddc" changes to "a mess"—"adddbddc" is not changed

• To kept the pattern found use \1 ..\9 in replacement

• s/a(.*)b\1c/What is this: \1/;—"addbddc" converted to "What is this: dd"—again $1 = "dd"

Page 18: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

metasymbols

• a very common substitution —s/\s+/ /g; # replace all whitespace with single

space.– " a b\t c" changes to " a b c"

• remove word character duplicates—$str = "11aabbdccaa";—$str =~ s/(\w)\1/\1/g;

– $str = "1abcda"

• Remove any duplicates—$str = "11 ,,aa"—$str =~ s/(.)\1/\1/g;

– $str ="1 ,a"

Page 19: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

Metasymbols (2)

• \U Upper case until \E and \L lower case until \E

• Example• s/a(.*)b\1c/What is this: \U\1\E/;

—"addbddc" converted to "What is this: DD"

• s/a(.*)b\1c/What is this: \L\1\E/;—"addbddc" converted to "What is this: dd"

• \Q …\E stop regex characters in between

Page 20: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

Exercise 10

• What is the outcome of the following substitutions? Use $_ = "ad dog cd"

1. s/dog//;

2. while (/ /) { s/ / /g;}

3. s/(\w+)\s+(\w+)/$2 $1/g;

4. s/(.+)d/Dd/g;

5. s/(.+?)d/Dd/g;

6. s/(\S+)/=\1=/g;

7. Write a substitution to change each vowel to an X.

Page 21: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

s/// flags

• like the match operator• /m let ^ and $ match next to embedded \n• /s let . match newline• /x ignore whitespace and permit

comments

• s/// flags only• /g replace globally, ie all occurrences• /e evaluate the right side as an

expression—in other words, perl interprets the right side as

perl code, where you have return value

Page 22: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

/e flag

• s/(\d+)/sprintf("%#x",$1)/ge;—covert all numbers to hex—"2581" would converted to "0xb23"

• return to the leap year with a trinary operator

s/(\d+)/$1 % 4 ? "$1 (not a leap year)" :

$1 % 100 ? "$1 (a leap year)" :$1 % 400 ? "$1 (not a leap year)" :

"$1 (a leap year)"/gxe• "2000" changed to "2000 (a leap year)"

Page 23: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

tr/// Operator (Transliteration)

• same as sed, can as use y/// instead of tr///• DOES NOT use pattern matching, instead it

scans character by character and replaces each occurrence of a character with a replacement

• tr/SEARCHLIST/REPLACEMENTLIST/cds;• Example:

—$str = "AABBCCDDEE";—$str =~ tr/ABC/XYZ/;

– $str = "XXYYZZDDEE";

—$str =~ tr/DE/!/; #if the replacement list is too short, uses the last one as many times as needed.

– $str = "XXYYZZ!!!!";

Page 24: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

tr/// Operator (Transliteration) (2)

• Duplicates in the Searchlist are ignored—$str = "AABBCCDDEE";—$str =~ tr/AAB/xyz/;

– $str = "xxzzCCDDEE";

• /c means letters not in the Searchlist—$str = "AABBCCDDEE";—$str =~ tr/ABC/x/c;

– $str = "AABBCCxxxx";

Page 25: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

tr/// Operator (Transliteration) (3)

• /d delete found, but non-replaced characters—Changes tr, so if your replacement list is short,

those characters are removed—$str = "AABBCCDDEE";—$str =~ tr/ABC/xy/d;

– $str = "xxyyDDEE";

—$str =~ tr/DE//d;– $str = "xxyy";

Page 26: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

tr/// Operator (Transliteration) (4)

• /s removes duplicates in replaced characters—$str = "AABBCCDDEE";—$str =~ tr/ABC/xyz/s;

– $str ="xyzDDEE";

• tr/// returns the number of characters found/replaced.

• $count = ($str =~ tr/ABC/xyz/);—$count = 6; $str = "xxyyzzDDEE";

• $count = ($str =~ tr/ABC//);—$count = 6; $str = "AABBCCDDEE";

– No replacement list, so it just counted them and made no replacements. Note s/// would have removed them.

Page 27: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

More tr/// Examples

• $str = "AABBCCDDEE";• $str =~ tr/D//d; #delete found characters

—$str = "AABBCCEE";

• $str = "AABBCCDDEE";• $str =~ tr/ABD/xy/ds; #delete D, sub A for x

and B for y and remove duplicates replacements—$str = "xyCCEE";

• $str =~ tr/a-zA-Z//dc;—remove any non letters from $str.

• $str =~ tr/A-Za-z/N-ZA-Mn-za-m/;—rotate the characters by 13 letters for simple

encryption.

Page 28: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

Exercise 11

• What is the outcome of the following transliteration? Use $_ = "fred and barney"

1. tr/abcde/ABCDE/;2. tr/a-z/ABCDE/d;3. $count = tr/a-z/A-Z/;4. tr/a-z/_/c;5. tr/a-m/X/s;6. tr/aeiou/X/cs;7. $count = tr/aeiou//c;• Change the letters bdr to X and count the

number of changes.

Page 29: Perl Regular expression: string manipulation. substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the

QA&