12/8/2015cs135601 introduction to information engineering1 data structure 12/8/2015 che-rung lee

48
06/18/22 CS135601 Introduction to Information Engineering 1 Data Structure 06/18/22 Che-Rung Lee

Upload: erika-chapman

Post on 20-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 1

Data Structure

04/21/23

Che-Rung Lee

Page 2: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 2

Data abstraction

• Main memory is organized as a sequence of addressable cells, but the data we want to model is usually not.

• Use “model” and “simulation”

Page 3: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 3

Pointers

• What is a pointer?– A special data that records memory address

• Example in C

int a = 3;

int *p = NULL;

p = &a;

*p = 5;

variable address value

a 0x03

p 0x04

3

00x03

5

Page 4: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 4

Outline

• Customized data type

• Array and list

• Stack and queue

• Trees

• Hash table

Page 5: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 5

Customized Data Type

Page 6: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 6

How to model a warrior?

• Class

• Skills

• Equipments

• Life point

• Magic point

• Money

• …But computers only have primitive data types: integer, real, character, and Boolean.

Diablo III

Page 7: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 7

User-defined data types

• Conglomerate of primitive data types collected under a single name

• Example in C: structtypedef struct { char class[10]; // Barbarian, Witch, Wizard or Monk int lifePoint; // min is 0, max is 100 int level; // min is 1, max is 72 …} Warrior;

Warrior player1;player1.lifePoint = 100;

User-defined data type

An instance of type Warrior

Page 8: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 8

Abstract data type

• A full model of abstract data type should include the operations of the model– Like +-*/, input, output for primitive data types

• Example in C++: class

– This is called an object, which we will talk more in the programming language lesson.

class Warrior { char class[10]; // Barbarian, Witch, Wizard or Monk … void fight(….); // function that defines the action “fight” };

Page 9: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 9

Heterogeneous array

• The storage that contains different types of data is called a heterogeneous array– struct and class are heterogeneous arrays– The items are called components.– The storage that contains the same type of

data is called a homogeneous array

• Example struct { char Name[25]; int Age; int SkillRating;} Employee;

Page 10: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 10

Storage of heterogeneous array

• Static method: – components are stored

one after the other in a contiguous block

• Dynamic method: – components are stored

in separate locations identified by pointers

Meredith W Linsmeyer

23

6.2

pointers

Meredith W Linsmeyer 23 6.2Meredith W Linsmeyer 23 6.2

Page 11: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 11

Array and List

Page 12: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 12

When to use arrays?

• Stock prices, student names, temperature readings– One dimensional array

• Matrix, images, the grades of class, train schedule– Two dimensional array

• Computed Tomography(斷層掃描 )– Three dimensional array

Page 13: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 13

Storing arrays

• Use a variable to denote the address of the first element– Ex: int Readings[24];

Relative addresscalled “index”

In C, the indexstarts from 0

0

1

2

3

Page 14: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 14

Two dimensional array

• Two dimensional array is stored in a one dimensional memory cells.

• Two ways to order the data

– What is the memory location of A[2][3] in the row (column) major order?

a11 a12 a13

a21 a22 a23

a31 a32 a33

a41 a42 a43

a11 a12 a13 a21 a22 a23 a31 a32 a33 a41 a42 a43

a11 a21 a31 a41 a12 a22 a32 a42 a13 a23 a33 a43

Row major order

Column major orderrow

column

Page 15: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 15

High dimensional array

• Consider the dimensional array A[m][n][k]– What is the size of the array?– What is the memory location of A[1][2][3] in

the row major order?• The row major order

– What is the memory location of A[1][2][3] in the column major order?

• The row major order

This changes first

This changes first

Page 16: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 16

When to use list?

• List is a collection of data which are arranged sequentially.– One dimensional array is a list of elements– Two dimensional array can be viewed as a

list of rows/columns– A string is a list of characters– Music is a list of sounds– Stacks and queues can be implemented

using lists• We will talk those later

Page 17: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 17

Contiguous list

• List is stored in a contiguous block of memory cells (an array)– Ex: list of names. Each name is occupied 8

bytes.

Page 18: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 18

Linked list

• List in which each entries are linked by pointers– Head pointer: Pointer to first entry in list– NIL pointer: A “non-pointer” value used to

indicate end of list

Use customized data type to define

Page 19: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 19

Static v.s. dynamic data structures

• Static data structures: – Size and shape does not change– Contiguous list– Easily to locate elements. No need to store

address.

• Dynamic data structures: – Size and shape can change– Linked list– Easily to delete/insert elements

Page 20: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 20

Linked list: delete/insert element

• Delete

• Insert

Page 21: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 21

Stack and Queue

Page 22: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 22

What is a stack?

• A list in which entries are removed and inserted only at the head– Top: The head of stack– Bottom or base: The tail of stack– Push: To insert an entry at the top– Pop: To remove the entry at the top– LIFO: Last-in-first-out

bottom

top

Page 23: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 23

When to use stacks?

• When the algorithm needs data LIFO?– EX1: reverse a word, ABCCBA

• Push A• Push B• Push C

– EX2: check matching parentheses (3*[(1+1)*2]• Push “(“• Push “[“• Push “(“

• Pop C• Pop B• Pop A

• Find “)”, pop “(“, matched• Find “]”, pop “[“, matched• No more “)”, but still one “(“ in stack,

not matched

A B C

Page 24: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 24

Stack implementation

• Using a list + a pointer (head)

Page 25: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 25

Queue

• A list in which entries are removed at the head and are inserted at the tail.– Enqueue: insert an entry at the tail– Dequeue: remove an entry at the head– FIFO: First-in-first-out

HeadTail

Page 26: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 26

Examples of using queues

• Ex1: the job queues in operating system

• Ex2: simulation of the Josephus problem– Dequeue 1– Enqueue 1– Dequeue 2– Dequeue 3– Enqueue 3

6 5 4 3 2 1

Operation counts 2n

Page 27: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 27

Queue implementation

• A list + 2 pointers (head+tail)– Enqueue A, B, C– Dequeue A, enqueue D– Dequeue B, enqueue E

• If using a static list, the queue crawls throughmemory as entities are inserted and removed.

Head pointer

Tail pointer A

B

C

D

E

Page 28: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 28

Circular queue

• A technique that uses a fixed region of memory space to implement queue.

tail

head A

B

C

DEnqueue A, B, CDequeue A, Enqueue DDequeue B, Enqueue E

E

Page 29: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 29

Trees

Page 30: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 30

What is a tree?

• A collection of nodes that are linked in a hierarchical structure, in which every node is linked by one parent, except the root.– Node: An entry in a tree– Parent: The node immediately

above a specified node– Root: The node at the top– Terminal or leaf node:

A node at the bottom

Page 31: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 31

Hierarchical relations

• Parent: The node immediately above a node– The parent of F is B

• Child: A node immediately below a node– The children of C are G and H.

• Ancestor: Parent, parent of parent, etc.– The ancestor of K are F, B, and A.

• Descendent: Child, child of child, etc.– The descendent of B are E, F, K, and L.

• Siblings: Nodes sharing a common parent– The siblings of C are B and D.

A

B C D

E F G H I J

K L

Page 32: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 32

Depth and height

• Textbook’s definition– The depth of a tree is the longest

path from the root to a leaf node• The length of a path is the

number of nodes on the path

• Ex: the depth of the tree is 4

• Conventional definition• Use the word “height” instead of depth

• The length of a path is the number of links on the path

• Ex: The height of the tree is 3 (= 4 – 1)

A

B C D

E F G H I J

K L

Page 33: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 33

What are trees used for?

• Representing hierarchical data– Organization chart

• Searching data– Game tree

Page 34: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 34

• A tree in which each parent has at most two children

Left subtree Right subtree

Binary tree

Left child Right child

Page 35: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 35

Storing a binary tree in a list

• This is called a heap in some applications.

Page 36: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 36

Advantages of using heap

• Easily to find the index of parent & children – Parent(B) = [index of B] / 2 = 1– LeftChild(B) = [index of B]*2 = 4– RightChild(B) = [index of B]*2 + 1= 5

Page 37: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 37

Problems for heap

• Heap is inefficient for storing the binary tree that is sparse and unbalanced– Sparse: most node has one or zero child– Unbalanced: the right subtree is much larger

than the left subtree, or vice versa

Page 38: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 38

Storing a binary tree using pointers

• Each node

Use customized data type to define

Page 39: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 39

Recursive structure

• Tree is a recursive structure– The subtrees of a tree are trees

• The recursive algorithms for a binary tree may look like this

– It is a depth first, in order algorithm for tree

procedure some_operation (root) if (root is not NULL) then ( call some_operation(root.left_child) do some operations on root call some_operation(root.right_child))

Page 40: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 40

Hash Table

Page 41: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 41

Search

• Search is a common task in daily life– Phone book: given a name, fine the phone

number– Dictionary: given a word, find it’s definition– Map: given an address, find the location or

route– DNS: given an URL, find it’s IP address

• Tree can be used to speedup searches.– How? And what is the operation count?

Page 42: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 42

Constant time search

• Something can be found in constant time– EX: What is fifth element of the array A? A[4]

• An array is like a lookup table. One can use the index to query and get the value

• Can we use this idea to organize data so that searches can be done in the constant time?– Hash table (or hash map)

Page 43: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 43

Hash table

• Each record of data has a key field– Key is like the index of an array.– An unique identification of the data (ideally)

• The storage space is divided into buckets– Each bucket is like an array cell.– Each record is stored in the bucket

corresponding to its key, so it can be retrieved in constant time

Page 44: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 44

How to define the mapping?

• Unique identification of a record is usually too large to be the index for storage– For example, the ASCII code for a string

We do not want to create such a large array!!

Page 45: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 45

Hash function

• A hash function computes a bucket number for each key value– EX: suppose there are only 41 buckets.

Page 46: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 46

Problem

• Collision: The case of two or more keys hashing to the same bucket– Major problem when table is over 75% full

Page 47: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 47

Solutions

• Use linked lists to store collided data– The search time becomes linear to the

number of collided data

• Increase the number of buckets and rehash all data– Time/space tradeoff

• Design a better hash function/algorithm– It’s a research problem

Page 48: 12/8/2015CS135601 Introduction to Information Engineering1 Data Structure 12/8/2015 Che-Rung Lee

04/21/23 CS135601 Introduction to Information Engineering 48

References

• Textbook 8.1, 8.2, 8.3, 8.5, 9.5

• Wikipedia

• Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, “Introduction to Algorithms”

• 資料結構,演算法,程式語言

Related courses