1 data structures: trees and grammars readings: sections 6.1, 7.1-7.4 (more from ch. 6 later)
Post on 30-Mar-2015
213 Views
Preview:
TRANSCRIPT
1
Data Structures:Trees and Grammars
Readings:Sections 6.1, 7.1-7.4(more from Ch. 6 later)
2
Goals for this Unit
• Continue focus on data structures and algorithms
• Understand concepts of reference-based data structures (e.g. linked lists, binary trees)– Some implementation for binary trees
• Understand usefulness of trees and hierarchies as useful data models– Recursion used to define data
organization
3
Taxonomy of Data Structures
• From the text:– Data type: collection of values and
operations• Compare to Abstract Data Type!
– Simple data types vs. composite data types
• Book: Data structures are composite data types– Definition: a collection of elements that are
some combination of primitive and other composite data types
4
Book’s Classification of Data Structures
• Four groupings:– Linear Data Structures– Hierarchical– Graph– Sets and Tables
• When defining these, note an element has:– one or more information fields– relationships with other elements
5
Note on Our Book and Our Course
• Our book’s strategy– In Ch. 6, discuss principles of Lists
• Give an interface, then implement from scratch
– In Ch. 7, discuss principles of Trees– Later, in Ch. 9, see what Java gives us
• Our course’s strategy– We did Ch. 9 first. Saw List interfaces and
operations– Then, Ch. 8 on maps and sets – Now, trees with some implementation too
6
Trees Represent…
• Concept of a tree verycommon and important.
• Tree terminology:– Nodes have one parent– A node’s children
• Leaf nodes: no children• Root node: top or start; no parent
• Data structures that store trees• Execution or processing that can
be expressed as a tree– E.g. method calls as a program runs– Searching a maze or puzzle
7
Trees are Important
• Trees are important for cognition and computation– computer science– language processing (human or
computer)• parse trees
– knowledge representation (or modeling of the “real world”)• E.g. family trees; the Linnaean taxonomy
(kingdom, phylum, …, species); etc.
8
Another Tree Example: File System
• What about file links (Unix) or shortcuts (Windows)?
C:\
lab2
CS216
lab1 lab3
MyMail
school
CS120
pers
list.h calc.cpplist.cpp
9
Another Tree Example: XML and HTML documents
<HTML><HEAD>…</HEAD><BODY><H1>My Page</H1><P> Blah<PRE>blah
blah</PRE>End</P></BODY></HTML>
How is this a tree? What are the leaves?
10
Tree Data Structures
• Why this now?– Very useful in coding
• TreeMap in Java Collections Framework
– Example of recursive data structures– Methods are recursive algorithms
11
Tree Definitions and Terms
• First, general trees vs. binary trees– Each node in a binary tree has at most two
children
• General tree definition:– Set of nodes T (possibly empty?) with a
distinguished node, the root– All other nodes form a set of disjoint subtrees
Ti
• each a tree in its own right• each connected to the root with an edge
• Note the recursive definition– Each node is the root of a subtree
12
Picture of Tree Definition
• And all subtrees are recursively defined as:– a node with…– subtrees attached to it
r
T1
T2 T3
13
Tree Terminology
• A node’s parent• A node’s children
– Binary tree: left child and right child– Sibling nodes– Descendants, ancestors
• A node’s degree (how many children)
• Leaf nodes or terminal nodes• Internal or non-terminal nodes
14
Recursive Data StructureRecursive Data Structure: a data
structure that contains a pointer or reference to an instance of itself:
public class TreeNode<T> {T nodeItem;TreeNode<T> left, right;TreeNode<T> parent;…
}• Recursion is a natural way to express
many algorithms.• For recursive data-structures, recursive
algorithms are a natural choice
15
General Trees
• Representing general trees is a bit harder– Each node has a list of child nodes
• Turns out that:– Binary trees are simpler and still quite
useful
• From now on, let’s focus on binary-trees only
16
ADT Tree
• Remember definition on an ADT?– Model of information: we just covered that– Operations? See pages 405-406 in textbook
• Many are similar to ADT List or any data structure– The “CRUD” operations: create, replace,
update, delete• Important about this list of operations
– some are in terms of one specified node, e.g. hasParent()
– others are “tree-wide”, e.g. size(), traversal
17
Classes for Binary Trees (pp. 416-431)
• class LinkedBinaryTree (p. 425, bottom)– reference to root BinaryTreeNode– methods: tree-level operations
• class BinaryTreeNode (p. 416)– data: an object (of some type)– left: references root of left-subtree (or null)– right: references root of right-subtree (or
null)– parent: references this node’s parent node
• Could this be null? When should it be?
– methods: node-level operations
18
Two-class Strategy for Recursive Data Structures
• Common design: use two classes for a Tree or List
• “Top” class– has reference to “first” node– other things that apply to the whole data-structure
object (e.g. the tree-object)• both methods and fields
• Node class– Recursive definitions are here as references to other
node objects– Also data (of course)– Methods defined in this class are recursive
19
Binary Tree and Node Class
• LinkedBinaryTree class has:– reference to root node– reference to a current node, a cursor– non-recursive methods like:
boolean find(tgt) // see if tgt is in the whole tree• Node class has:
– data, references to left and right subtrees– recursive versions of methods like find:
boolean find(tgt) // is tgt here or in my subtrees?• Note: BinaryTree.find() just calls Node.find()
on the root node!– Other methods work this way too
20
Why Does This Matter Now?
• This illustrates (again) important design ideas
• The tree itself is what we’re interested in– There are tree-level operations on it (“ADT
level” operations)• The implementation is a recursive data
structure– There are recursive methods inside the
lower-level classes that are closely related (same name!) to the ADT-level operation
• Principles? abstraction (hiding details), delegation (helper classes, methods)
21
ADT Tree Operations: “Navigation”
• Positioning:– toRoot(), toParent(), toLeftChild(),
toRightChild(), find(Object o)
• Checking:– hasParent(), hasLeftChild(), etc.– equals(Object tree2)
• Book calls this a “deep compare”• Do two distinct objects have the same
structure and contents?
22
ADT Tree Operations: Mutators
• Mutators:– insertRight(Object o),
insertLeft(Object o)• create a new node containing new data• make this new node be the child of the
current node• Important: We use these to build trees!
– prune()• delete the subtree rooted by the current
node
23
Next: Implementation
• Next (in the book)– How to implement Java classes for binary trees– Class for node, another class for BinTree– Interface for both, then two implementations
(array and reference)
• But for us:– We’ll skip some of this, particularly the array
version– We’ll only look at reference-base
implementation– After that: concept of a binary search tree
Understanding Implementations
• Let’s review some of the methods on pp. 416-431– (Done in class, looking at code in book.)
• Some topics discussed:– Node class. Parent reference or not?– Are two trees equal?– Traversal strategies: Section 7.3.2 in
book– visit() method and callback (also 7.3.2)
24
25
Binary Search Trees
• We often need collections that store items– Maybe a long series of inserts or deletions
• We want fast lookup, and often we want to access in sorted order– Lists: O(n) lookup– Could sort them for O(lg n) lookup
• Cost to sort is O(n lg n) and we might need to re-sort often as we insert, remove items
• Solution: search tree
26
Binary Search Trees
• Associated with each node is a key value that can be compared.
• Binary search tree property:– every node in the left subtree has
key whose value is less than the value of the root’s key value, and
– every node in the right subtree has key whose value is greater than the value of the root’s key value.
27
Example
3
1171
84
5
BINARY SEARCH TREE
28
Counterexample
4
181062
115
8
20
21NOT ABINARY SEARCH TREE
7
15
29
Find and Insert in BST• Find: look for where it should be• If not there, that’s where you insert
30
Recursion and Tree Operations
• Recursive code for tree operations is simple, natural, elegant
• Example: pseudo-code for Node.find()boolean find(Comparable tgt) { Node next = null; if (this.data matches tgt) return true else if (tgt’s data < this.data) next = this.leftChild else // tgt’s data > this.data next = this.rightChild // next points to left or right subtree if (next == null ) return false // no subtree else return next.find(tgt) // search on }
Order in BSTs
• How could we traverse a BST so that the nodes are visited in sorted order?– Does one of our traversal strategies work?
• A very useful property about BSTs• Consider Java’s TreeSet and TreeMap
– A search tree (not a BST, but be one of its better “cousins”)• In CS2150: AVL trees, Red-Black trees
– Guarantee: search times are O(lg n)
31
Deleting from a BST
• Removing a node requires– Moving its left and right subtrees– Not hard if one not there– But if both there?
• Answer: not too tough, but wait for CS2150 to see!
• In CS2110, we’ll not worry about this
32
Next: Grammars, Trees, Recursion
• Languages are governed by a set of rules called a grammar– Is a statement legal ?– Generate or derive a new legal statement
• Natural language grammars– Language processing by computers
• But, grammars used a lot in computing– Grammar for a programming language– Grammar for valid inputs, messages, data,
etc.33
34
Backus-Naur Form• http://en.wikipedia.org/wiki/Backus-Naur_form
• BNF is a widely-used notation for describing the grammar or formal syntax of programming languages or data
• BNF specifics a grammar as a set of derivation rules of this form: <symbol> ::= <expression with symbols>
• Look at website and example there (also on next slide)– How are trees involved here? Is it recursive?
35
BNF for Postal Address
1. <postal-address> ::= <name-part> <street-address> <zip-part>
2. <personal-part> ::= <first-name> | <initial> "." 3. <name-part> ::= <personal-part> <last-name> [<jr-
part>] <EOL> | <personal-part> <name-part>4. <street-address> ::= [<apt>] <house-num> <street-
name> <EOL> 5. <zip-part> ::= <town-name> "," <state-code> <ZIP-
code> <EOL> Example:Ann Marie G. Jones
123 Main St.Hooville, VA 22901
Where’s the recursion?
36
Grammars in Language
• Rule-based grammars describe– how legal statements can be produced– how to tell if a statement is legal
• Study textbook, pp. 389-391, to see rule-based grammar for simple Java-like arithmetic expressions– four rules for expressions, terms, factors,
and letter– Study how a (possibly) legal statement is
parsed to generate a parse tree
37
Computing Parse-Tree Example
• Expression: a * b + c
38
Grammar Terms and Concepts
• First, this is what’s called a context-free grammar– For CS2110, let’s not worry about what
this means! (But in CS2102, you learn this.)
• A CFG has– a set of variables (AKA non-terminals)– a set of terminal symbols– a set of productions– a starting symbol
39
Previous Parse Tree
• Terminal symbols:– <operator> could be: + *– <letter> could be: a b c
• Production: <factor> <letter> | <number>
40
Natural Language Parse Tree
• Statement: The man bit the dog
41
How Can We Use Grammars?
• Parsing– Is a given statement a valid statement in the
language? (Is the statement recognized by the grammar?)
– Note this is what the Java compiler does as a first step toward creating an executable form of your program. (Find errors, or build executable.)
• Production– Generate a legal statement for this grammar– Demo: generate random statements!
• See link on website next to slides
42
Demo’s Poem-grammar data file
{<start>The <object> <verb>
tonight}
{<object>waves big yellow flowers
slugs}
{<verb>sigh <adverb>portend like
<object>die <adverb> }
{<adverb>warilygrumpily}
• Note: no recursive productions in this example!
top related