cse 303 concepts and tools for software development

17
CSE 303 Concepts and Tools for Software Development Richard C. Davis UW CSE – 10/6/2006 Lecture 5 – Regular Expressions

Upload: aletha

Post on 06-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

CSE 303 Concepts and Tools for Software Development. Richard C. Davis UW CSE – 10/6/2006 Lecture 5 – Regular Expressions. Administravia. Homework 1 is due!. Tip of the Day. Viewing Multiple buffers in Emacs C-x 2 : Make two buffers visible C-x o : Switch cursor to “other” buffer - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSE 303 Concepts and Tools for  Software Development

CSE 303Concepts and Tools for Software Development

Richard C. DavisUW CSE – 10/6/2006

Lecture 5 – Regular Expressions

Page 2: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 2

Administravia

• Homework 1 is due!

Page 3: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 3

Tip of the Day

• Viewing Multiple buffers in Emacs– C-x 2 : Make two buffers visible– C-x o : Switch cursor to “other” buffer

• Careful! This also switches from mini-buffer!

– C-x 1 : Make current buffer the only buffer

Page 4: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 4

Where are we?

• Tons of info on the shell– Pseudo-programming language– Files, users, and processes– You know just enough to get around

• Coming up– More powerful ideas

Page 5: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 5

Today

• Regular Expressions– Find strings matching a pattern– Recurring theme in CS theory

• Crisp, elegant, and useful idea

– Powerful tool for computer science• Less educated hackers don’t tend to know

• Other Utilities– find, diff, wc

Page 6: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 6

Types of Regular Expressions

• “Globbing” = filename expansion

• “Regular Expressions” = pattern matching– In CSE 322 (finite automata)– In grep– In egrep– Others…

Page 7: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 7

What are Regular Expressions?

• Recursive definition: p = …– a, b, c : matches a single character

– p1p2 : matches if p1 comes just before p2

– (p1|p2) : matches either p1 or p2

– p* : matches 0 or more instances of p

• Example– Match abcbdgef against bd(e|g)*

Page 8: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 8

Why This Language?

• Amazing Facts– Such patterns are very easy to match

• Space/time scale nicely with length of input and regexp

• Know space needed before matching!

– Can tell if two regexps match same strings

• For more of this, see CSE 322

Page 9: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 9

More Regular Expressions

• Syntactic Conveniences– p+ = pp* (one or more p’s)– p? = (|p) (zero or one p)– [zd-h] = z | d | e | f | g | h– [^A-Z] = … (any not in set)– . = … (any character)– p{n} = … (n instances of p)– p{n,} = … (n or more p’s)– p{n,m} = … (from n to n+m instances of p)

Page 10: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 10

More Regular Expressions

• Other Conveniences– [:alnum:] = Letters and Numbers– [:alpha:] = Letters– [:digit:] = Numbers– [:lower:] = Lowercase letters– [:print:] = Printable characters– [:punct:] = Punctuation Marks– [:space:] = Whitespace

Page 11: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 11

grep finds patterns in files

• grep p [file]– Reads each line from stdin (or file) – If match, print line on stdout– Technically, matches against .*p.*

• Get rid of 1st .* with ^• Get rid of 2nd .* with $• Example: ^p$ matches whole line exactly

Page 12: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 12

Regular Expression Gotchas

• Check syntax of each program– grep uses \| \+ \? \{ \}– egrep uses | + ? { }– Use quotes to avoid shell substitutions– Escape special characters with \

•\. and . are very different!• Rules different inside [ ] (\ is literal)

Page 13: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 13

Reusing Previous Matches

• Can refer to matches of previous ( ) – \n refers to match of nth ( ) – Example: Double-words ^([a-zA-Z]+)\1$– Very useful when doing substitution

• We’ll see this next time

Page 14: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 14

Other Utilities

• Put these in your arsenal– find : search a directory tree for anything

• Linux Pocket Guide gives nice summary•find . –type f – name “foo.*” – exec grep “regexp” “{}” \; - print

– diff : compare two files or directories– wc : count the bytes, words, lines in a file

Page 15: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 15

Summary

• Regular Expressions– Powerful and useful idea– Powerful tool for computer science

• Less educated hackers don’t tend to know

• Other Utilities– find– diff– wc

Page 16: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 16

Reading

• Linux Pocket Guide– p71-74 (grep)– p66-68 (find)– p86-88 (diff)– p58 (wc)

Page 17: CSE 303 Concepts and Tools for  Software Development

10/6/2006 CSE 303 Lecture 5 17

Next Time

Other scripting tools– sed : For changing text files– awk : More powerful text editing– General-purpose scripting languages

•perl•python•ruby