Download - Regular expressions
![Page 1: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/1.jpg)
/Regular Expressions/
In Java
![Page 2: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/2.jpg)
Credits
• The Java Tutorials: Regular Expressions• docs.oracle.com/javase
/tutorial /essential/regex/
![Page 3: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/3.jpg)
Regex
• Regular expressions are a way to describe a set of strings based on common characteristics shared by each string in the set.
• They can be used to search, edit, or manipulate text and data.
• They are created with a specific syntax.
![Page 4: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/4.jpg)
Regex in Java
• Regex in Java is similar to Perl• The java.util.regex package primarily consists
of three classes: Pattern, Matcher, and PatternSyntaxException.
![Page 5: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/5.jpg)
Pattern & PatternSyntaxException
• You can think of this as the regular expression wrapper object.
• You get a Pattern by calling:– Pattern.compile(“RegularExpressionString”);
• If your “RegularExpressionString” is invalid, you will get the PatternSyntaxException.
![Page 6: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/6.jpg)
Matcher
• You can think of this as the search result object.
• You can get a matcher object by calling:– myPattern.matcher(“StringToBeSearched”);
• You use it by calling:– myMatcher.find()
• Then call any number of methods on myMatcher to see attributes of the result.
![Page 7: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/7.jpg)
Regex Test Harness
• The tutorials give a test harness that uses the Console class. It doesn’t work in any IDE.
• So I rewrote it to use Basic I/O
![Page 8: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/8.jpg)
CODE DEMOIt’s time for…
![Page 9: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/9.jpg)
Regex
• Test harness output example. • Input is given in Bold.
Enter your regex: foo Enter input string to search: foofooFound ‘foo’ at index 0, ending at index 3. Found ‘foo’ at index 3, ending at index 6.
![Page 10: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/10.jpg)
Indexing
![Page 11: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/11.jpg)
Metacharacters
• <([{\^-=$!|]})?*+.>• Precede a metacharacter with a ‘\’ to treat it
as a ordinary character.• Or use \Q and \E to begin and end a literal
quote.
![Page 12: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/12.jpg)
Metacharacters
Enter your regex: cat. Enter input string to search: cats Found ‘cats’ at index 0, ending at index 4.
![Page 13: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/13.jpg)
Character ClassesConstruct Description[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z, or A through Z, inclusive (range)
[a-d[m-p]] a through d, OR m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z] (subtraction)
![Page 14: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/14.jpg)
Character Class
Enter your regex: [bcr]at Enter input string to search: rat I found the text "rat" starting at index 0 and ending at index 3.
Enter input string to search: cat Found "cat" at index 0, ending at index 3.
![Page 15: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/15.jpg)
Character Class: Negation
Enter your regex: [^bcr]at Enter input string to search: rat No match found.
Enter input string to search: hat Found "hat" at index 0, ending at index 3.
![Page 16: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/16.jpg)
Character Class: Range
Enter your regex: foo[1-5]Enter input string to search: foo5Found "foo5" at index 0, ending at index 4.
Enter input string to search: foo6 No match found.
![Page 17: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/17.jpg)
Character Class: Union
Enter your regex: [0-4[6-8]] Enter input string to search: 0 Found "0" at index 0, ending at index 1.
Enter input string to search: 5 No match found.
Enter input string to search: 6 Found "6" starting at index 0, ending at index 1.
![Page 18: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/18.jpg)
Character Class: Intersection
Enter your regex: [0-9&&[345]] Enter input string to search: 5 Found "5" at index 0, ending at index 1.
Enter input string to search: 2 No match found.
![Page 19: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/19.jpg)
Character Class: Subtraction
Enter your regex: [0-9&&[^345]]Enter input string to search: 5 No match found.
![Page 20: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/20.jpg)
Predefined Character Classes
Construct Description
. Any character (may or may not match line terminators)
\d A digit: [0-9]\D A non-digit: [^0-9]\s A whitespace character: [ \t\n\x0B\f\r]\S A non-whitespace character: [^\s]\w A word character: [a-zA-Z_0-9]\W A non-word character: [^\w]
![Page 21: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/21.jpg)
Predefined Character Classes (cont.)
• To summarize:– \d matches all digits– \s matches spaces– \w matches word characters
• Whereas a capital letter is the opposite:– \D matches non-digits– \S matches non-spaces– \W matches non-word characters
![Page 22: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/22.jpg)
QuantifiersGreedy Reluctant Possessive MeaningX? X?? X?+ X, once or not at all
X* X*? X*+ X, zero or more times
X+ X+? X++ X, one or more times
X{n} X{n}? X{n}+ X, exactly n timesX{n,} X{n,}? X{n,}+ X, at least n times
X{n,m} X{n,m}? X{n,m}+ X, at least n but not more than m times
![Page 23: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/23.jpg)
Ignore Greedy, Reluctant, and Possessive
For now.
![Page 24: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/24.jpg)
Zero Length Match
• In the regexes ‘a?’ and ‘a*’ each allow for zero occurrences of the letter a.
Enter your regex: a* Enter input string to search: aaFound “aa" at index 0, ending at index 2.Found “” at index 2, ending at index 2.
![Page 25: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/25.jpg)
Quatifiers: Exact
Enter your regex: a{3}Enter input string to search: aaNo match found.
Enter input string to search: aaaa Found "aaa" at index 0, ending at index 3.
![Page 26: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/26.jpg)
Quantifiers: At Least, No Greater
Enter your regex: a{3,} Enter input string to search: aaaaaaaaaFound "aaaaaaaaa" at index 0, ending at index 9.
Enter your regex: a{3,6} Enter input string to search: aaaaaaaaa Found "aaaaaa" at index 0, ending at index 6. Found "aaa" at index 6, ending at index 9.
![Page 27: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/27.jpg)
Quantifiers
• "abc+" – Means "a, followed by b, followed by (c one or
more times)".– “abcc” = match!, “abbc” = no match
• “[abc]+”– Means “(a, b, or c) one or more times)– “bba” = match!
![Page 28: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/28.jpg)
Greedy, Reluctant, and Possessive
• Greedy– The whole input is validated, end characters are
consecutively left off as needed• Reluctant– No input is validated, beginning characters are
consecutively added as needed• Possessive– The whole input is validated, no retries are made
![Page 29: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/29.jpg)
Greedy
Enter your regex: .*foo Enter input string to search: xfooxxxxxxfooFound "xfooxxxxxxfoo" at index 0, ending at index 13.
![Page 30: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/30.jpg)
Reluctant
Enter your regex: .*?fooEnter input string to search: xfooxxxxxxfooFound "xfoo" at index 0, ending at index 4. Found "xxxxxxfoo" at index 4, ending at index 13.
![Page 31: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/31.jpg)
Possessive
Enter your regex: .*+fooEnter input string to search: xfooxxxxxxfooNo match found.
![Page 32: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/32.jpg)
Capturing Group
• Capturing groups are a way to treat multiple characters as a single unit.
• They are created by placing the characters to be grouped inside a set of parentheses.
• “(dog)” – Means a single group containing the letters "d"
"o" and "g".
![Page 33: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/33.jpg)
Capturing Group w/ Quantifiers
• (abc)+– Means "abc" one or more times
![Page 34: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/34.jpg)
Capturing Groups: Numbering
• ( ( A ) ( B ( C ) ) )1. ( ( A ) ( B ( C ) ) )2. ( A )3. ( B ( C ) )4. ( C )
• The index is based on the opening parentheses.
![Page 35: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/35.jpg)
Capturing Groups: Numbering Usage
• Some Matcher methods accept a group number as a parameter:
• int start(int group)• int end (int group)• String group (int group)
![Page 36: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/36.jpg)
Capturing Groups: Backreferences
• The section of input matching the capturing group is saved for recall via backreference.
• Specify a backreference with ‘\’ followed by the group number.
• ’(\d\d)’– Can be recalled with the expression ‘\1’.
![Page 37: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/37.jpg)
Capturing Groups: Backreferences
Enter your regex: (\d\d)\1Enter input string to search: 1212Found "1212" at index 0, ending at index 4.
Enter input string to search: 1234No match found.
![Page 38: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/38.jpg)
Boundary Matchers
Boundary Construct Description^ The beginning of a line$ The end of a line\b A word boundary\B A non-word boundary\A The beginning of the input\G The end of the previous match\Z The end of the input but for the final
terminator, if any\z The end of the input
![Page 39: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/39.jpg)
Boundary Matchers
Enter your regex: ^dog$Enter input string to search: dogFound "dog" at index 0, ending at index 3.
Enter your regex: ^dog\w*Enter input string to search: dogblahblahFound "dogblahblah" at index 0, ending at index 11.
![Page 40: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/40.jpg)
Boundary Matchers (cont.)
Enter your regex: \bdog\b Enter input string to search: The doggie plays in the yard. No match found.
Enter your regex: \Gdog Enter input string to search: dog dog Found "dog" at index 0, ending at index 3.
![Page 41: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/41.jpg)
Pattern Class (cont.)
• There are a number of flags that can be passed to the ‘compile’ method.
• Embeddable flag expressions are Java-specific regex that duplicates these compile flags.
• Check out ‘matches’, ‘split’, and ‘quote’ methods as well.
![Page 42: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/42.jpg)
Matcher Class (cont.)
• The Matcher class can slice input a multitude of ways:– Index methods give the position of matches– Study methods give boolean results to queries– Replacement methods let you edit input
![Page 43: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/43.jpg)
PatternSyntaxException (cont.)
• You get a little more than just an error message from the PatternSyntaxException.
• Check out the following methods:– public String getDescription()– public int getIndex()– public String getPattern()– public String getMessage()
![Page 44: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/44.jpg)
![Page 45: Regular expressions](https://reader035.vdocuments.us/reader035/viewer/2022081403/556ccc14d8b42aba548b5216/html5/thumbnails/45.jpg)
The End$