session 11: data structures and collections

57
FEN 2011-02-05 UCN T&B: IT Technology 1 Session 11: Data Structures and Collections Lists (Array based, linked) Sorting and Searching Hashing Trees System.Collections.Generic

Upload: tien

Post on 23-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Session 11: Data Structures and Collections. Lists ( Array based, linked) Sorting and Searching Hashing Trees System.Collections.Generic. Lists. A data structure where elements are organised by position (index). ArrayList ( List ) and LinkedList Sometimes lists are called sequences . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology 1

Session 11: Data Structures and Collections

Lists (Array based, linked)

Sorting and SearchingHashing

TreesSystem.Collections.Generic

Page 2: Session 11:  Data  Structures and Collections

Lists• A data structure where elements are

organised by position (index).• ArrayList (List) and LinkedList• Sometimes lists are called sequences.

FEN 2011-02-05 UCN T&B: IT Technology 2

One fixed size segment in memory.

Each element has a reference to the next element. Hence elements may

be allocated at different memory locations.

numList

Page 3: Session 11:  Data  Structures and Collections

ArrayList

• Array-based:– Fixed size (statically allocated).– Always occupies maximum memory.– May grow or shrink dynamically, but that requires halting the

application and allocation of a new array. • Direct access to elements by position (index), otherwise

searching is required.• Inserting and deleting in the middle of the list requires moving

(many) elements.

FEN 2011-02-05 UCN T&B: IT Technology 3

Page 4: Session 11:  Data  Structures and Collections

Linked Lists (LinkedList)• A linked list consists of nodes representing elements.• Each node contains a value (or value reference) and

a reference (pointer) to the next element:

FEN 2011-02-05 UCN T&B: IT Technology 4

Page 5: Session 11:  Data  Structures and Collections

• The list it self is represented by a reference to the first element, often called head

• The next-reference of the last element is usually null• The linked list is dynamic in size: it grows and shrinks

as needed.• Access by position is slow (may require traversing

the hole list).

• See this Java Example.

FEN 2011-02-05 UCN T&B: IT Technology 5

Linked Lists (LinkedList)

Page 6: Session 11:  Data  Structures and Collections

Figure 4.1a) A linked list of integers; b) insertion; c) deletion

FEN 2011-02-05 UCN T&B: IT Technology 6

Page 7: Session 11:  Data  Structures and Collections

Implementation

private class Node { private object val; private Node next; public Node(object v, Node n) { val= v; next= n; }

FEN 2011-02-05 UCN T&B: IT Technology 7

public object Val { get{return val;} set{val= value;} } public Node Next { get{return next;} set{next= value;} }}

Class Node

Page 8: Session 11:  Data  Structures and Collections

Linked Implementation of ADT list

class LinkedList{ private class Node //… Node head,tail; int n;//number of elements public LinkedList() { head= null; tail= null; n= 0; } public int Count { get { return n; } }

FEN 2011-02-05 UCN T&B: IT Technology 8

public void AddFront(object o){ Node tmp = new Node(o, null); if (Count == 0)//list is empty tail = tmp; else tmp.Next = head; head = tmp; n++;}

Page 9: Session 11:  Data  Structures and Collections

public void Print(){//for debugging... Node p = head; //start of list while (p != null) //while not end of list {

Console.WriteLine(p.Val); //print current valuep = p.Next; //set p to next element of the list

}}

FEN 2011-02-05 UCN T&B: IT Technology 9

Traversing a Linked List

tail

head p

Page 10: Session 11:  Data  Structures and Collections

public int FindPos(object o){ //Returns the position of o in the list (counting from 0). //If o is not contained, -1 is return. bool found = false; int i = 0; Node p = head; while (!found && p != null){ if (p.Val.Equals(o)) found = true; else{ p = p.Next; i++; } } if (found) return i; else return -1;}

FEN 2011-02-05 UCN T&B: IT Technology 10

Finding a Position in a Linked List

Page 11: Session 11:  Data  Structures and Collections

Dynamic vs. Static Data Structures

• Array-Based Lists:– Fixed (static) size (waste of memory).– May be able to grown and shrink (ArrayList), but this is very

expensive in running time (O(n))– Provides direct access to elements from index (O(1))– May be sorted. Hence binary search gives fast access (O(log n))

• Linked List Implementations:– Uses only the necessary space (grows and shrinks as needed).– Overhead to references and memory allocation– Only sequential access: access by index requires searching

(expensive: O(n))

FEN 2011-02-05 UCN T&B: IT Technology 11

numList

Page 12: Session 11:  Data  Structures and Collections

Linked List - Variants• Using a tail-reference

FEN 2011-02-05 UCN T&B: IT Technology 12

Page 13: Session 11:  Data  Structures and Collections

• Using a dummy head node

FEN 2011-02-05 UCN T&B: IT Technology 13

Page 14: Session 11:  Data  Structures and Collections

• Circular

FEN 2011-02-05 UCN T&B: IT Technology 14

Page 15: Session 11:  Data  Structures and Collections

Doubly Linked List

FEN 2011-02-05 UCN T&B: IT Technology 15

Page 16: Session 11:  Data  Structures and Collections

…operations become more complicated …

FEN 2011-02-05 UCN T&B: IT Technology 16

Page 17: Session 11:  Data  Structures and Collections

The Full Monty….(LinkedList)

FEN 2011-02-05 UCN T&B: IT Technology 17

Page 18: Session 11:  Data  Structures and Collections

Search Trees:Dynamic Data Structures with Fast Search

• Binary Trees• Binary Search Trees• General Trees (Composite Pattern)• Balanced Search Trees (2-3 Trees etc.)• B- Trees (external, database index)

FEN 2011-02-05 UCN T&B: IT Technology 18

Page 19: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology 19

Terminology

• General trees:– leaf/external node/terminal– root– internal node– siblings, children, parents, ancestors, descendents– sub trees – the depth or height of a node = number of ancestors– the depth or height of a tree = max depth/height for

any leaf

Page 20: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology 20

Binary Trees

• A binary tree can be defined recursively by– Either the tree is empty– Or the tree is composed by a root with left

and right sub trees, which are binary trees themselves

• Note: contrary to general trees binary trees– have ordered sub trees (left and right)– may be empty

Page 21: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology 21

Reference Based Implementation

Page 22: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology 22

Figure 10.9Traversals of a binary tree: a) preorder; b) inorder; c) postorder

Page 23: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology 23

Binary Search Trees• Value based container:

– The search tree property:• For any internal node: the value in the root is greater

than the value in the left child• For any internal node: the value in the root is less than

the value in the right child– Note the recursive nature of this definition:

• It implies that all sub trees themselves are search trees• Every operation must ensure that the search tree

property is maintained (invariant)

Page 24: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology 24

Example:A Binary Search Tree Holding Names

Page 25: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology 25

Balance Problems (skewed tree):

• Values are inserted in sorted order

Page 26: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology 26

InOrder:Traversal Visits Nodes in Sorted Order

Page 27: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology 27

Efficiency• insert• retrieve• delete

– All depends on the depth of the tree

– If insertions and deletions are uniformly distributed, then the tree will eventually grow skewed

• O(log n) / O(n)• O(log n) / O(n)• O(log n) / O(n)

Page 28: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology 28

Solution:Balanced Search Trees

• Trading time for space:– In worst case additional space in

O(n) is required; but:– retrieve, insert and delete in

O(log n) – also w.c..• Principle:

– A node may hold several keys (n) and has several children (n+1)

– A node must be at least half filled (n/2 keys)

– Insert and delete can be performed, so the tree is kept balanced in O(logn)

2-3-tree:k = 2

Page 29: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology29

2-3-Trees (n=2)

Page 30: Session 11:  Data  Structures and Collections

FEN 2011-02-05UCN T&B: IT Technology

30

Retrieve

• Search using the same principle as in binary search trees:– Search the root– If not found, the search recursively in

the appropriate sub tree– Performance is proportional to the

height of the tree– Since the tree is balanced: O(log n)

Page 31: Session 11:  Data  Structures and Collections

FEN 2011-02-05UCN T&B: IT Technology

31

Insertion

• The insert algorithm must ensure that the 2-3-tree properties are conserved. It goes like this:– Search down through the tree to the appropriate leaf node and

insert– If there is room in the leaf, then we are done– Otherwise split the leaf node into two new leafs and move the

middle value up into the parent node– If there is no room in the parent, then continue recursive until a

node with room is reached, or– Eventually the root is reached. If there is no room in the root,

then a new root is created, and the height of the tree is increased

– Performance depends on the height of the tree (searching down through the tree + in worst case a trip from the leaf to the root rebalancing on the way up)

– That is: O(log n)

Page 32: Session 11:  Data  Structures and Collections

FEN 2011-02-05UCN T&B: IT Technology

32

Inserting 39 (there is room)

Page 33: Session 11:  Data  Structures and Collections

FEN 2011-02-05UCN T&B: IT Technology

33

Inserting 38 (there is no room in the leaf)

• Insert any way,• Split leaf and • Move middle value up

Page 34: Session 11:  Data  Structures and Collections

FEN 2011-02-05UCN T&B: IT Technology

34

Inserting 37 (there is room)

Page 35: Session 11:  Data  Structures and Collections

FEN 2011-02-05UCN T&B: IT Technology

35

Inserting 36 (there is no room)

Split and move up

Split and move up

Page 36: Session 11:  Data  Structures and Collections

FEN 2011-02-05UCN T&B: IT Technology

36

Inserting 35 , 34 and 33 (there is room)

Page 37: Session 11:  Data  Structures and Collections

FEN 2011-02-05UCN T&B: IT Technology

37

Deletion• Like insertion – just the other way around:-)

– find the node with the value to be deleted– If this is not a leaf, the swap with its inorder

successor (which is always a leaf - why?), and remove the value

– If there now is too few values (< n/2) in the leaf, then merge the node with a sibling and pull down a value from the parent node

– If there now is too few values in the parent, then continue recursively until there are enough values or the root is reached

– If the root becomes empty, the remove it and the height of the tree is decreased

– Performance: once again: down and up through the tree : O(log n)

Page 38: Session 11:  Data  Structures and Collections

FEN 2011-02-05UCN T&B: IT Technology

38

Balanced Search Trees• Variants:

– 2-3-trees– 2-3-4-trees– Red-Black-trees– AVL-trees– Splay-trees….

• Is among other used for realisation of the map/dictionary/table ADT

• In Java.Collections: TreeMap and TreeSet

Page 39: Session 11:  Data  Structures and Collections

An Alternative to Sorting and Searching: Hashing

• Keys are converted to indices in an array.

• A hash function, h maps a key to an integer, the hash code.

• The hash code is divided by the array size and the remainder is used as index

• If two or more keys gives the same index, we have a collision.

FEN 2011-02-05UCN T&B: IT Technology

39

Page 40: Session 11:  Data  Structures and Collections

Collision Handling

• Avoiding collisions:– Use a prime as the size of the array:

• Trying to store keys with hash codes 200, 205, 210, 215, 220,.., 595 in an array of size 100 yields three collisions for each key.

• But an array with size 101 results in no collision.– Choose a good hash function:

• this is a (mathematical) discipline of its own

FEN 2011-02-05UCN T&B: IT Technology

40

Page 41: Session 11:  Data  Structures and Collections

Collision Handling• Probing is searching for a near by free slot in the array.• Probing may be:

– Linear(h(x)+1, +2, +3, +4,…)

– Quadratic(h(x)+1, +2, +4, +8,…)

– Double hashing– …

FEN 2011-02-05UCN T&B: IT Technology

41

Page 42: Session 11:  Data  Structures and Collections

Chaining• The array doesn’t hold the element itself, but a reference to a

collection (a linked list for instance) of all colliding elements.• On search that list must be traversed

FEN 2011-02-05UCN T&B: IT Technology

42

Page 43: Session 11:  Data  Structures and Collections

Efficiency of Hashing• Worst case (maximum collisions):

– retrieve, insert, delete all O(n)• Average number of collisions depends on the load

factor, λ, not on table sizeλ = (number of used entries)/(table size)

– But not on n.• Typically (linear probing):

numberOfCollisionsavg = 1/(1 - λ)• Example: 75% of the table entries in use:

– λ = 0.75:1/(1-0.75) = 4 collisions in average

(independent of the table size).FEN 2011-02-05UCN T&B: IT

Technology43

Page 44: Session 11:  Data  Structures and Collections

When Hashing Is Inefficient

• Traversing in key order.• Find smallest/largest key.• Range-search (Find all keys

between high and low).• Searching on something else than

the designated primary key.

FEN 2011-02-05UCN T&B: IT Technology

44

See this Java Example

Page 45: Session 11:  Data  Structures and Collections

FEN 2011-02-05 UCN T&B: IT Technology 45

.NET 2:System.Collections.Generics

ICollection<T>

IList<T> LinkedList<T> IDictionary<TKey, TValue>

List<T>Dictionary

<TKey, TValue>SortedDictionary<TKey, TValue>

Index ableArray-based Balanced

search tree Hashtabel

(key, value) -pair

Page 46: Session 11:  Data  Structures and Collections

interface:

(i.e. Dictionary)

Specification

class Appl{

----

IDictionary d;

-----

m= new XXXDictionary();

Application

class:

Dictionary

SortedDictionary

----

ADT Data Structures and Algorithms

Select and use ADT, i.e.:

Dictionary

Select and use data structure, i.e. SortedDictionary

Knowledge of.

Read and write (use)

specifications

Learning Goals

FEN 2011-02-05 UCN T&B: IT Technology 46

Page 47: Session 11:  Data  Structures and Collections

Exercises• Consider some of our programmes (Banking, Forest,

AndersenAndAsp, for instance).

• Would it be better to use some other collection instead of List?

• Try to chance the implementation in one or more of your programs, so, for instance a hash table is used.

• Implement InsertAt(int index, object element) and RemoveAt(int index) on the linked list.

FEN 2011-02-05UCN T&B: IT Technology

47

Page 48: Session 11:  Data  Structures and Collections

48

Time Complexity – Big-”O”

• Investigation of the use of time and/or space of an algorithm

• Normally one looks at– Worst-case (easer to determine)– Only growth rates – not exact measures– Counts the number of some “basic

operations” (a computation, a comparison of to elements etc.).

FEN 2011-02-05 UCN T&B: IT Technology

Page 49: Session 11:  Data  Structures and Collections

49

Big-O notation:

• The complexity of an algorithm is notated with “Big-O”– O(f(n)), n is the size of the problem (number of input

elements, for instance), f is a function that indicates the efficiency of the algorithm, for instance n (the running time is linear in problem size)

– Big-O: is asymptotic (only holds for large values of n)– Big-O: only regards most significant term– Big-O: ignores constants

FEN 2011-02-05 UCN T&B: IT Technology

Page 50: Session 11:  Data  Structures and Collections

50

Examples public int sum (int a, b) {

int sum;sum = a + b;return sum;

}

What is the basic operation

?

public int sum (int[] a) {

int sum= 0;for(int i= 0; i<a.length; i++)

sum= sum+a[i];return sum;

}

What is the basic

operation?

O(1)

O(n)

FEN 2011-02-05 UCN T&B: IT Technology

Page 51: Session 11:  Data  Structures and Collections

51

Searching

• Linear search in a sequence with n elements: O(n) (why?)

• Binary search in a sorted sequence with n elements: O(log n) (why?)

• What about sweep algorithms?• Complexity O(n)

FEN 2011-02-05 UCN T&B: IT Technology

Page 52: Session 11:  Data  Structures and Collections

52

Constant and Linear complexity

• Consider an algorithm working on a sequence of length n:– If running time is independent of n, then the time

complexity is constant or O(1)– If we (in worst case) has to do some thing to every

element, then the time complexity is linear or O(n)– There are other possibilities:

• Quadratic O(n2) (some sorting algorithms), O(nlogn) (better sorting algorithms, logarithmic O(log n) (binary search), exponential O(2n) (“difficult” problems like the Towers of Hanoi – more on 3rd semester)

FEN 2011-02-05 UCN T&B: IT Technology

Page 53: Session 11:  Data  Structures and Collections

53

Does it matter…?

“år” means “year”

“døgn” means “day”

NOTE

Assuming one basic operation in 1 ns (one billion operations pr. sec. – GHz)

FEN 2011-02-05 UCN T&B: IT Technology

Page 54: Session 11:  Data  Structures and Collections

54

A Rule of Thumb• For each nested loop the complexity must be

multiplied with a factor n:

for(int i = 0; i < n; i++) O(n){…}

for(int i = 0; i < n; i++) { for(int j = 0; j < n; j++) O(n2)

{…}}

FEN 2011-02-05 UCN T&B: IT Technology

Page 55: Session 11:  Data  Structures and Collections

55

O(1)

public add(int n) {lastIndex++;data[lastIndex] = n;

}

Both statements are basic and their performanceis independent of the size of the array

FEN 2011-02-05 UCN T&B: IT Technology

Page 56: Session 11:  Data  Structures and Collections

56

O(n)

public void insert(int i, int newInt) {// make room for newIntfor(int j = data.length; j > i; j++)

data[j] = data[j-1];data[i] = newInt;//insert newInt

}

The for-loop indicates a time complexity of O(n)

FEN 2011-02-05 UCN T&B: IT Technology

Page 57: Session 11:  Data  Structures and Collections

57

O(n2)

public void sort() { for (int j = 0; j < numbers.size(); j++){ for (int i = 0; i < numbers.size()-1; i++){ if (numbers.get(i) > numbers.get(i+1)) swap(i,i+1);//swaps elements i and i+1 }//end for }//end for }//end sort

Nested for-loops suggestO(n2)

FEN 2011-02-05 UCN T&B: IT Technology