cse 303 concepts and tools for software development
DESCRIPTION
CSE 303 Concepts and Tools for Software Development. Richard C. Davis UW CSE – 10/6/2006 Lecture 5 – Regular Expressions. Administravia. Homework 1 is due!. Tip of the Day. Viewing Multiple buffers in Emacs C-x 2 : Make two buffers visible C-x o : Switch cursor to “other” buffer - PowerPoint PPT PresentationTRANSCRIPT
CSE 303Concepts and Tools for Software Development
Richard C. DavisUW CSE – 10/6/2006
Lecture 5 – Regular Expressions
10/6/2006 CSE 303 Lecture 5 2
Administravia
• Homework 1 is due!
10/6/2006 CSE 303 Lecture 5 3
Tip of the Day
• Viewing Multiple buffers in Emacs– C-x 2 : Make two buffers visible– C-x o : Switch cursor to “other” buffer
• Careful! This also switches from mini-buffer!
– C-x 1 : Make current buffer the only buffer
10/6/2006 CSE 303 Lecture 5 4
Where are we?
• Tons of info on the shell– Pseudo-programming language– Files, users, and processes– You know just enough to get around
• Coming up– More powerful ideas
10/6/2006 CSE 303 Lecture 5 5
Today
• Regular Expressions– Find strings matching a pattern– Recurring theme in CS theory
• Crisp, elegant, and useful idea
– Powerful tool for computer science• Less educated hackers don’t tend to know
• Other Utilities– find, diff, wc
10/6/2006 CSE 303 Lecture 5 6
Types of Regular Expressions
• “Globbing” = filename expansion
• “Regular Expressions” = pattern matching– In CSE 322 (finite automata)– In grep– In egrep– Others…
10/6/2006 CSE 303 Lecture 5 7
What are Regular Expressions?
• Recursive definition: p = …– a, b, c : matches a single character
– p1p2 : matches if p1 comes just before p2
– (p1|p2) : matches either p1 or p2
– p* : matches 0 or more instances of p
• Example– Match abcbdgef against bd(e|g)*
10/6/2006 CSE 303 Lecture 5 8
Why This Language?
• Amazing Facts– Such patterns are very easy to match
• Space/time scale nicely with length of input and regexp
• Know space needed before matching!
– Can tell if two regexps match same strings
• For more of this, see CSE 322
10/6/2006 CSE 303 Lecture 5 9
More Regular Expressions
• Syntactic Conveniences– p+ = pp* (one or more p’s)– p? = (|p) (zero or one p)– [zd-h] = z | d | e | f | g | h– [^A-Z] = … (any not in set)– . = … (any character)– p{n} = … (n instances of p)– p{n,} = … (n or more p’s)– p{n,m} = … (from n to n+m instances of p)
10/6/2006 CSE 303 Lecture 5 10
More Regular Expressions
• Other Conveniences– [:alnum:] = Letters and Numbers– [:alpha:] = Letters– [:digit:] = Numbers– [:lower:] = Lowercase letters– [:print:] = Printable characters– [:punct:] = Punctuation Marks– [:space:] = Whitespace
10/6/2006 CSE 303 Lecture 5 11
grep finds patterns in files
• grep p [file]– Reads each line from stdin (or file) – If match, print line on stdout– Technically, matches against .*p.*
• Get rid of 1st .* with ^• Get rid of 2nd .* with $• Example: ^p$ matches whole line exactly
10/6/2006 CSE 303 Lecture 5 12
Regular Expression Gotchas
• Check syntax of each program– grep uses \| \+ \? \{ \}– egrep uses | + ? { }– Use quotes to avoid shell substitutions– Escape special characters with \
•\. and . are very different!• Rules different inside [ ] (\ is literal)
10/6/2006 CSE 303 Lecture 5 13
Reusing Previous Matches
• Can refer to matches of previous ( ) – \n refers to match of nth ( ) – Example: Double-words ^([a-zA-Z]+)\1$– Very useful when doing substitution
• We’ll see this next time
10/6/2006 CSE 303 Lecture 5 14
Other Utilities
• Put these in your arsenal– find : search a directory tree for anything
• Linux Pocket Guide gives nice summary•find . –type f – name “foo.*” – exec grep “regexp” “{}” \; - print
– diff : compare two files or directories– wc : count the bytes, words, lines in a file
10/6/2006 CSE 303 Lecture 5 15
Summary
• Regular Expressions– Powerful and useful idea– Powerful tool for computer science
• Less educated hackers don’t tend to know
• Other Utilities– find– diff– wc
10/6/2006 CSE 303 Lecture 5 16
Reading
• Linux Pocket Guide– p71-74 (grep)– p66-68 (find)– p86-88 (diff)– p58 (wc)
10/6/2006 CSE 303 Lecture 5 17
Next Time
Other scripting tools– sed : For changing text files– awk : More powerful text editing– General-purpose scripting languages
•perl•python•ruby