21: advanced data structures

125
Advanced Data Structures

Upload: ledien

Post on 01-Jan-2017

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 21: Advanced Data Structures

Advanced DataStructures

Page 2: 21: Advanced Data Structures

2

30-2 6

-1 4

Page 3: 21: Advanced Data Structures

22

30-2 6

-1 4

30-2 6

-1 4

Page 4: 21: Advanced Data Structures

4

22

30-2 6

-1 4

30-2 6

-1

Page 5: 21: Advanced Data Structures

4

3

4

22

30-2 6

-1

0-2 6

-1

Page 6: 21: Advanced Data Structures

3

4

3

4

22

0-2 6

-1

0-2 6

-1

Page 7: 21: Advanced Data Structures

2

3

4

3

4

2

0-2 6

-1

0-2 6

-1

Page 8: 21: Advanced Data Structures

2

-1

2

3

4

3

4

0-2 6

-1

0-2 6

Page 9: 21: Advanced Data Structures

-1

-2

2

-1

2

3

4

3

4

0-2 60 6

Page 10: 21: Advanced Data Structures

-2

-1

-2

2

-1

2

3

4

3

4

0 60 6

Page 11: 21: Advanced Data Structures

Insertion Order Matters

● Suppose we create a BST of numbers in this order:

4, 2, 1, 3, 6, 5, 7

4

2

1 3

6

5 7

Page 12: 21: Advanced Data Structures

Insertion Order Matters

● Suppose we create a BST of numbers in this order:

1, 2, 3, 4, 5, 6, 7

12

34

56

7

Page 13: 21: Advanced Data Structures

Tree Terminology

● The height of a tree is the number of nodes in the longest path from the root to a leaf.

4

2

1 3

6

5 7

Page 14: 21: Advanced Data Structures

Tree Terminology

● The height of a tree is the number of nodes in the longest path from the root to a leaf.

12

34

56

7

Page 15: 21: Advanced Data Structures

Keeping the Height Low

● Almost all BST operations have time complexity based on height:● Insertion: O(h)● Search: O(h)● Deletion: O(h)

● Keeping the height low will make these operations much more efficient.

● How do we do this?

Page 16: 21: Advanced Data Structures

Tree Rotations

● One common way of keeping tree heights low is to reshape the BST when it gets too high.

● One way to accomplish this is a tree rotation, which locally rearranges nodes.

Page 17: 21: Advanced Data Structures

Tree Rotations

B

A

>B

<A>A<B

B

A

>B

<A

>A<B

Rotate Right

Rotate Left

Page 18: 21: Advanced Data Structures

1

2

3

4

5

6

Page 19: 21: Advanced Data Structures

1

2

3

4

5

6

Page 20: 21: Advanced Data Structures

1

2

3

4

5

6

Page 21: 21: Advanced Data Structures

1

2

3

4

5

6

Page 22: 21: Advanced Data Structures

1

2

3

4

5

6

Page 23: 21: Advanced Data Structures

1

2

3

4

5

6

Page 24: 21: Advanced Data Structures

1

2

3

4

5

6

Page 25: 21: Advanced Data Structures

1

2

3

4

5

6

Page 26: 21: Advanced Data Structures

1

2

3

4

5

6

Page 27: 21: Advanced Data Structures

1

2

3

4

5

6

Page 28: 21: Advanced Data Structures

1

2

3

4

5

6

Page 29: 21: Advanced Data Structures

1

2

3

4

5

6

Page 30: 21: Advanced Data Structures

Let's Code it Up!

Page 31: 21: Advanced Data Structures

When to Rotate?

● The actual code for rotations is not too complex.

● Deciding when and where to rotate the tree, on the other hand, is a bit involved.

● There are many schemes we can use to determine this:● AVL trees maintain balance information in each

node, then rotate when the balance is off.● Red/Black trees assign each node a color, then

rotate when certain color combinations occur.

Page 32: 21: Advanced Data Structures

An Interesting Observation

Page 33: 21: Advanced Data Structures

Random Binary Search Trees

● If we build a binary search tree with totally random values, the resulting tree is (with high probability) within a constant factor of balanced.● Approximately 4.3 ln n

● Moreover, the average depth of a given node is often very low.● Approximately 2 ln n.

● If we structure the BST as if it were a random tree, we get (with high probability) a very good data structure!

Page 34: 21: Advanced Data Structures

Treaps

● A treap is a data structure that combines a binary search tree and a binary heap.

● Each node stores two pieces of information:● The piece of information that we actually want to

store, and● A random real number.

● The tree is stored such that● The nodes are a binary search tree when looking up

the information, and● The nodes are a binary heap with respect to the

random real number.

Page 35: 21: Advanced Data Structures

Ibex

Dikdik

Cat

Sloth

Dog Puppy

Kitty

314

106

4 42

137

178

271

Page 36: 21: Advanced Data Structures

Treaps are Wonderful

● With very high probability, the height of an n-node treap is O(log n).

● Insertion is surprisingly simple once we have code for tree rotations.

● Deletion is straightforward once we have code for tree rotations.

Page 37: 21: Advanced Data Structures

Inserting into a Treap

● Insertion into a treap is a combination of normal BST insertion and heap insertion.

● First, insert the node doing a normal BST insertion. This places the value into the right place.

● Next, bubble the node upward in the tree by rotating it with its parent until its value is smaller than its parent.

Page 38: 21: Advanced Data Structures

Ibex

Dikdik

Cat

Sloth

Dog Puppy

314

106

4 42 178

271

Page 39: 21: Advanced Data Structures

Ibex

Dikdik

Cat

Sloth

Dog Puppy

Kitty

314

106

4 42

571

178

271

Page 40: 21: Advanced Data Structures

Ibex

Dikdik

Cat

Sloth

Dog

Puppy

Kitty

314

106

4 42 571

178

271

Page 41: 21: Advanced Data Structures

Ibex

Dikdik

Cat SlothDog

Puppy

Kitty

314

106

4 42

571

178

271

Page 42: 21: Advanced Data Structures

Ibex

Dikdik

Cat

Sloth

DogPuppy

Kitty

314

106

4 42

571

178

271

Page 43: 21: Advanced Data Structures

Let's Code it Up!

Page 44: 21: Advanced Data Structures

Removing from a Treap

● In general, removing a node from a BST is quite difficult because we have to make sure not to lose any nodes.

● For example, how do you remove the root of this tree?

● However, removing leaves is very easy, since they have no children.

4

2

1 3

6

5 7

Page 45: 21: Advanced Data Structures

Removing from a Treap

● It would seem that, since a treap has extra structure on top of that of a BST, that removing from a treap would be extremely hard.

● However, it's actually quite simple:● Keep rotating the node to delete with its

larger child until it becomes a leaf.● Once the node is a leaf, delete it.

Page 46: 21: Advanced Data Structures

Ibex

Dikdik

Cat

Sloth

DogPuppy

Kitty

314

106

4 42

571

178

271

Page 47: 21: Advanced Data Structures

Ibex

Dikdik

Cat SlothDog

Puppy

Kitty

314

106

4 42

571

178

271

Page 48: 21: Advanced Data Structures

Ibex

Dikdik

Cat

Sloth

Dog

Puppy

Kitty

314

106

4 42 571

178

271

Page 49: 21: Advanced Data Structures

Ibex

Dikdik

Cat

Sloth

Dog Puppy

Kitty

314

106

4 42

571

178

271

Page 50: 21: Advanced Data Structures

Ibex

Dikdik

Cat

Sloth

Dog Puppy

314

106

4 42 178

271

Page 51: 21: Advanced Data Structures

Summary of Treaps

● Treaps give a (reasonably) straightforward way to guarantee that the height of a BST is not too great.

● Insertion into a treap is similar to insertion into a BST followed by insertion into a binary heap.

● Deletion from a treap is similar to the bubble-down step from a heap.

● All operations run in expected O(log n) time.

Page 52: 21: Advanced Data Structures

A Survey of Other Data Structures

Page 53: 21: Advanced Data Structures

Data Structures so Far

● We have seen many data structures over the past few weeks:● Dynamic arrays.● Linked lists.● Hash tables.● Tries.● Binary search trees (and treaps).● Binary heaps.

● These are the most-commonly-used data structures for general data storage.

Page 54: 21: Advanced Data Structures

Specialized Data Structures

● For applications that manipulate specific types of data, other data structures exist that make certain operations surprisingly fast and efficient.

● Many critical applications of computers would be impossible without these data structures.

Page 55: 21: Advanced Data Structures

k-d Trees

Page 56: 21: Advanced Data Structures

Suppose that you want to efficientlystore points in k-dimensional space.

How might you organize thedata to efficiently query for

points within a region?

Page 57: 21: Advanced Data Structures

(3, 1, 4)

(2, 3, 7) (4, 3, 4)

(2, 1, 3) (2, 4, 5) (6, 1, 4)

(5, 2, 5)(1, 4, 4) (0, 5, 7)

(4, 0, 6) (7, 1, 6)

Page 58: 21: Advanced Data Structures

(3, 1, 4)

(2, 3, 7) (4, 3, 4)

(2, 1, 3) (2, 4, 5) (6, 1, 4)

(5, 2, 5)(1, 4, 4) (0, 5, 7)

(4, 0, 6) (7, 1, 6)

Search for (1, 4, 4)

Page 59: 21: Advanced Data Structures

(3, 1, 4)

(2, 3, 7) (4, 3, 4)

(2, 1, 3) (2, 4, 5) (6, 1, 4)

(5, 2, 5)(1, 4, 4) (0, 5, 7)

(4, 0, 6) (7, 1, 6)

Search for (1, 4, 4)

Page 60: 21: Advanced Data Structures

(3, 1, 4)

(2, 3, 7) (4, 3, 4)

(2, 1, 3) (2, 4, 5) (6, 1, 4)

(5, 2, 5)(1, 4, 4) (0, 5, 7)

(4, 0, 6) (7, 1, 6)

Search for (1, 4, 4)

Page 61: 21: Advanced Data Structures

(3, 1, 4)

(2, 3, 7) (4, 3, 4)

(2, 1, 3) (2, 4, 5) (6, 1, 4)

(5, 2, 5)(1, 4, 4) (0, 5, 7)

(4, 0, 6) (7, 1, 6)

Search for (1, 4, 4)

Page 62: 21: Advanced Data Structures

(3, 1, 4)

(2, 3, 7) (4, 3, 4)

(2, 1, 3) (2, 4, 5) (6, 1, 4)

(5, 2, 5)(1, 4, 4) (0, 5, 7)

(4, 0, 6) (7, 1, 6)

Search for (1, 4, 4)

Page 63: 21: Advanced Data Structures

The Intuition

Page 64: 21: Advanced Data Structures

The Intuition

2

-1 4

-2 0 63

Page 65: 21: Advanced Data Structures

The Intuition

2

-1 4

-2 0 63

0

Page 66: 21: Advanced Data Structures

The Intuition

2

-1 4

-2 0 63

0

Page 67: 21: Advanced Data Structures

The Intuition

2

-1 4

-2 0 63

0

Page 68: 21: Advanced Data Structures

The Intuition

2

-1 4

-2 0 63

0

Values less than two Values greater than two

Page 69: 21: Advanced Data Structures

The Intuition

2

-1 4

-2 0 63

0

Values less than two Values greater than two

Page 70: 21: Advanced Data Structures

Key Idea: Split Space in Half

Page 71: 21: Advanced Data Structures
Page 72: 21: Advanced Data Structures
Page 73: 21: Advanced Data Structures
Page 74: 21: Advanced Data Structures

Upper half-space

Lower half-space

Page 75: 21: Advanced Data Structures
Page 76: 21: Advanced Data Structures
Page 77: 21: Advanced Data Structures
Page 78: 21: Advanced Data Structures

x < x

0 x ≥ x0

Page 79: 21: Advanced Data Structures
Page 80: 21: Advanced Data Structures
Page 81: 21: Advanced Data Structures
Page 82: 21: Advanced Data Structures
Page 83: 21: Advanced Data Structures
Page 84: 21: Advanced Data Structures

Nearest-Neighbor Lookup

Page 85: 21: Advanced Data Structures
Page 86: 21: Advanced Data Structures
Page 87: 21: Advanced Data Structures
Page 88: 21: Advanced Data Structures
Page 89: 21: Advanced Data Structures
Page 90: 21: Advanced Data Structures
Page 91: 21: Advanced Data Structures

y = y0

(x1, y

1)

(x

2, y

2)

r 1

r 2

Page 92: 21: Advanced Data Structures

y = y0

(x1, y

1)

(x

2, y

2)

r 1

r 2

Page 93: 21: Advanced Data Structures

y = y0

(x1, y

1)

(x

2, y

2)

r 1

r 2

|y1 - y

0|

|y2 – y

0|

Page 94: 21: Advanced Data Structures

k-d Trees

● Assuming the points are nicely distributed, nearest-neighbor searches in k-d trees can run faster than O(n) time.

● Applications in computational geometry (collision detection), machine learning (nearest-neighbor classification), and many other places.

Page 95: 21: Advanced Data Structures

Suffix Trees

Page 96: 21: Advanced Data Structures

String Processing

● In computational biology, strings are enormously useful for storing DNA and RNA.

● Many important questions in biology can be addressed through string processing:● What is the most plausible evolutionary

history of the following genomes?● Are there particular gene sequences that

appear with high frequency within a genome?

Page 97: 21: Advanced Data Structures

Suffix Trees

● A suffix tree is a (slightly modified) trie that stores all suffixes of a string S.

● Here is the suffix tree for “dikdik;” the $ is a marker for “end-of-string.”

dik

$ dik$

ik

$ dik$

k

$ dik$

Page 98: 21: Advanced Data Structures

Suffix Trees

● Important, nontrivial, nonobvious fact: A suffix tree for a string of n characters can be built in time O(n).

● Given a string of length m, we can determine whether it is a substring of the original string in time O(m).

dik

$ dik$

ik

$ dik$

k

$ dik$

Page 99: 21: Advanced Data Structures

Suffix Trees

● Other applications of suffix trees:● Searching for one genome within another

allowing for errors, insertions, and deletions in time O(n + m).

● Finding the longest common substring of two sequences in time O(n + m).

● Improving the performance of data compression routines by finding long repeated strings efficiently.

Page 100: 21: Advanced Data Structures

Bloom Filters

Page 101: 21: Advanced Data Structures

Distributing Data

● Websites like Google and Facebook deal with enormous amounts of data.

● Probably measured in hundreds of millions of gigabytes (hundreds of petabytes).

● There is absolutely no way to store this on one computer.

● Instead, data must be stored on multiple computers networked together.

Page 102: 21: Advanced Data Structures

Looking up Data

● Suppose you are at Google implementing search.

● When you get a search query, you have to be able to know which computer knows what pages to display for that query.

● Network latency is, say, 2ms between you and each computer.

● If you have one thousand computers to search, you can't just query each one and ask.

Page 103: 21: Advanced Data Structures

Bloom Filters

● A Bloom filter is a data structure similar to a set backed by a hash table.

● Stores a set of values in a way that may lead to false positives:● If the Bloom filter says that an object is not

present, it is definitely not present.● If the Bloom filter says that an object is

present, it may actually not be present.

Page 104: 21: Advanced Data Structures

Bloom Filters

N N N N N N N N N N N N N N

Page 105: 21: Advanced Data Structures

Bloom Filters

N N N N N N N N N N N N N N

ValueOne

Page 106: 21: Advanced Data Structures

Bloom Filters

N N N N N N N N N N N N N N

ValueOne

Page 107: 21: Advanced Data Structures

Bloom Filters

N N N Y N N Y N Y N N Y N Y

ValueOne

Page 108: 21: Advanced Data Structures

Bloom Filters

N N N Y N N Y N Y N N Y N Y

Page 109: 21: Advanced Data Structures

Bloom Filters

N N N Y N N Y N Y N N Y N Y

ValueTwo

Page 110: 21: Advanced Data Structures

Bloom Filters

N N N Y N N Y N Y N N Y N Y

ValueTwo

Page 111: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

ValueTwo

Page 112: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

Page 113: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

ValueOne

Page 114: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

ValueOne

Page 115: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

Page 116: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

ValueTwo

Page 117: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

ValueTwo

Page 118: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

Page 119: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

ValueThree

Page 120: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

ValueThree

Page 121: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

Page 122: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

ValueFour

Page 123: 21: Advanced Data Structures

Bloom Filters

N Y N Y N Y Y N Y N N Y N Y

ValueFour

Page 124: 21: Advanced Data Structures

Bloom Filters and Networks

● Bloom filters can be used to mitigate the networking problem from earlier.

● Have each computer store a Bloom filter of what's stored on each other computer.

● To determine which computer has some data:● Look up that value in each Bloom filter.● Call up just the computers that might have it.

● Since Bloom filter lookup is substantially faster than a network query (probably 1000-10,000x), this solution is used extensively in practice.

Page 125: 21: Advanced Data Structures

Data structures make it possible tosolve important problems at scale.

You get to decide which problemswe'll be using them for.