2003 jeremy d. frens. all rights reserved. calvin collegedept of computer science(1/8) regular...

9
2003 Jeremy D. Frens. All Rights Reserved. Calvin College Dept of Computer Science (1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin College

Upload: edgar-owen

Post on 05-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin

2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8)

Regular Expressions in Java

Joel Adams and Jeremy Frens

Calvin College

Page 2: 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin

2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(2/8)

Regular Expression Library

•Java 2 SDK 1.4 introduced the java.util.regex library.

•Regular expression matching added to String class.•Based on the regular expressions as written in Perl.

Page 3: 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin

2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(3/8)

Regular Expressions

•Regular expressions are a way to express simple textual patterns in the data.

•Some patterns cannot be recognized by a regular expression (e.g., balanced parentheses).

•Great for verifying input and parameter values.•Great for extracting data from complicated input.

Page 4: 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin

2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(4/8)

Building a Regular Expression

• Text matches exactly (e.g., abc).

Character classes

\\s for a whitespace character.\\d for a digit.\\w for a word character (letter, digit,

underscore).[a-z] for any character between a and z.. for any character.

Position ^ the beginning of the string.

$ the end of the string.

Page 5: 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin

2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(5/8)

More Buildinga and b are regular expressions; n and m are integers…

a|b matches a or b.

a? matches zero or one as.

a* matches zero or more as.

a+ matches one or more as.

a{n,m} match at least n as and no more than m as.

(a) treats a as a group.

Page 6: 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin

2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(6/8)

Simple Matching

•String#matches(String) returns a Boolean indicating if the pattern is found.

String name = “Jeremy”;assertTrue(name.matches(“\\w+”));

Page 7: 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin

2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(7/8)

More Powerful Matching

•The Pattern class creates a pattern. A pattern must be compiled. Then it can be matched against real text.

•The Matcher class returns the matches.

Pattern pattern = Pattern.compile(“\\w+”);Matcher matcher = pattern.matcher(“Jeremy”);assertTrue(matcher.find());assertEquals(“Jeremy”, matcher.group());

Page 8: 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin

2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(8/8)

Capturing Groups

•Parenthesized groups are also used to capture portions of the match.

•Each captured group is given a number based on the order of the left parentheses: Index 0 is for the whole match. Index 1 is for the first left parenthesis. Index 2 is for the second left parenthesis.Etc…

Page 9: 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin

2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(9/8)

public class SSN { private String myFirst, myMiddle, myLast; public SSN(String ssn) { Pattern pattern = Pattern.compile( “^(\\d{3})-(\\d{2})-(\\d{4})$”); Matcher matcher = pattern.matcher(ssn); if (!matcher.find()) throw new IllegalArgumentException(); myFirst = matcher.group(1); myMiddle = matcher.group(2); myLast = matcher.group(3); }}

Capture Example