data structures and manipulation

33
Data Structures and Manipulation By Dan Jones

Upload: miette

Post on 23-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Data Structures and Manipulation. By Dan Jones. 3.3.5 Data structures and data manipulation a . explain how static data structures may be used to implement dynamic data structures; - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Structures and Manipulation

Data Structures and ManipulationBy Dan Jones

Page 2: Data Structures and Manipulation

OCR Specification Points

Topics

• Implementation of data structures, including stacks, queues and trees.

• Searching, merging and sorting.

Candidates should be able to:

• explain how static data structures may be used to implement dynamic data structures;

• describe algorithms for the insertion, retrieval and deletion of data items stored in stack, queue and tree structures;

• explain the difference between binary searching and serial searching, highlighting the advantages and disadvantages of each;

• explain how to merge data files;

• explain the differences between the insertion and quick sort methods, highlighting the characteristics, advantages and disadvantages of each.

3.3.5 Data structures and data manipulation

Page 3: Data Structures and Manipulation

Data Structures• A data structure is a way of storing data in a way that its position has

meaning.- e.g. Listing names in alphabetical order would give their position meaning.

• There are a number of structures that can be used.- For example:

Arrays and Records Serial, sequential, indexed sequential and direct access files

- When these are used will depend on the circumstances – the most appropriate structure for the data should be used.

• Data Structures can be grouped into two main categories:1. Dynamic2. Static

Page 4: Data Structures and Manipulation

IMPLEMENTATION OF STATIC DATA STRUCTURESArrays and Lists

Page 5: Data Structures and Manipulation

Static Data Structures• Data structure whose size is fixed when it is created in memory.

- e.g. Array (or List)

Page 6: Data Structures and Manipulation

Static Example:Array/List

Click icon to add picture

This shows an example of an array stored in memory in alphabetical order.Here decisions about the array must be made before it is used:

• Name• Data type• Size• Shape (number of

dimensions)This example is a list as it is one dimensional.

•You can have arrays with many dimensions.

Arrays are accessed by providing the array name and it’s index (from 0).

• e.g. People[2] would output “Janet”

2 ….

3 Janet

4 Louise

5 Ben

6 Harry

7 Terry

8 ….

9 ….

Length = 5

Array:• Name = People• Data Types = String• Size = 5• Shape = 1D

Adding an extra item would require redefining the array.

2 ….

3 Janet

4 Louise

5 Ben

6 Harry

7 Terry

8 0xFFEE

9 ….

What if the value of memory location 8 was already assigned to somewhere else?

The whole of the array must be moved.

2 …. People Array Index

3 Ben 0

4 Harry 1

5 Janet 2

6 Harry 3

7 Louise 4

8 0xFFEE N/A

9 …. N/A

This is very inefficient – especially when dealing with large amounts of data.

Page 7: Data Structures and Manipulation

Static Data Structures• Data structure whose size is fixed when it is created in memory.

- e.g. Array (or List)- Advantages:

Little to no risk of overflow or underflow errors – as it will always take up the same space in memory.• This will most likely be reserved so no other program can access it

The program/memory management system can allocate a fixed amount of memory.

- Disadvantages: Requires knowledge of the size of the array before it has been created. This can result in:

• Waste of resources – once they have been reserved the space it can no longer be used by other processes/data.

• Running out of space when the prediction of space is too little.

Page 8: Data Structures and Manipulation

IMPLEMENTATION OF DYNAMIC DATA STRUCTURESLinked Lists, Queues, Stacks, Binary Trees and their implementation.

Page 9: Data Structures and Manipulation

Examples of Dynamic Data Structures• There are four type of dynamic data structures we need to know:

1. Linked List2. Queue3. Stack4. Binary Trees

BINARY

Page 10: Data Structures and Manipulation

Linked List• A linked list is similar to an array in that it stores a list of values, however it is

dynamic and can be extended or shortened to the size of the data inside it at will.

• To do this pointers are used:- Each data item no holds two pieces of information:

1. Data = the original data the item held.2. Pointer = the address of the next item in the list.

- The Start Pointer stores the address of the first item in the list. This is what actually allows the list to be accessed by the program.

- The last item in the lists’ pointer will be Null (blank). This indicates the end of the list. Called a Null Pointer.

Page 11: Data Structures and Manipulation

Dynamic Example:Linked Lists

Click icon to add picture

2 ….

3 Ben 4

4 Harry 7

5 .....

6 “Misc. Data”

7 Janet 8

8 Louise 10

9 “Misc. Data”

10 Terry Null

11 .....

Start Pointer stores the address of the first item in the list.

Null Pointer is stored in the last item to indicate the last item in the list.

PointersData

• Inserting an item simply requires:

• Change pointer of previous item to its new location.

• Make its equal to the previous pointers old value.

• e.g. Inserting “Kerry” between “Janet”, “Louise”.

• Deleting an item simply requires:

• Taking its pointer value, and replacing the previous items’ pointer value with this.

• e.g. Deleting “Janet”.• Now “Janet” will be

skipped, and can be used for other data.

Kerry

5

8

5

...

Page 12: Data Structures and Manipulation

Static and Dynamic Data Structures Comparison

Static

• Data structure whose size is fixed when it is created in memory.- e.g. Array

- Advantages: The program/memory management

system can allocate a fixed amount of memory.

No risk of overflow or underflow errors • As it will always take up the same

space in memory.

Dynamic

• Data structure which will extend and change its size to fit the data.- e.g. Linked List

- Advantages: Can extend as far as physically

possible – more flexible. Allows for the program to be more

easily written – less must be determined at compilation time.

Inserting, merging and deleting of items is very easy and requires little processing power.

Page 13: Data Structures and Manipulation

Static and Dynamic Data Structures Comparison

- Disadvantages: Requires knowledge of the size of the

array before it has been created. This can result in:• Waste of resources – once they

have been reserved the space it can no longer be used by other processes/data.

• Running out of space when the prediction of space is too little.

Any manipulation other than adding or taking from the end requires moving large amounts of the data.• Inefficient use of memory and cpu-

time.

- Disadvantages: Unnecessary + inefficient for small

amounts of data. • In this case the size of the data

may be even smaller than the extra data needed to make it dynamic.

Data can be highly fragmented over extended use. This may cause a physical bottleneck when the hardware needs to access this data.

Page 14: Data Structures and Manipulation

Queue• A queue is a data structure, similar in implementation to a list/array.

- However it implements a “First In First Out” (FIFO or LILO) order. Hence le queue! So is therefore a “serial structure where the position is related to the chronological

appearance of the data”.- It can grow and shrink in size.

For example if items are being processed faster than they are being added to the queue, the queues size will be smaller.

- Have two pointers:1. Head Pointer holds the address of the oldest item in the queue (next to be read).2. Tail Pointer holds the free address before the most recently added item in the queue (last to

be read).

• There are two operations you can do to a queue:1. Enqueue = putting something on the end of the queue.

Value added to address pointed to by tail pointer. The tail pointer is then incremented.

2. Dequeue = reading and removing the item at the front of the queue. Value at head pointer read. Head pointer incremented.

LILO!

Page 15: Data Structures and Manipulation

Dynamic Example:Queue

0 1 2 3 4 5 6 7

LILO!

• Enqueue = putting something on the end of the queue. • Value added to address pointed to by tail pointer.• The tail pointer is then incremented.

• Dequeue = reading and removing the item at the front of the queue.• Value at head pointer read.• Head pointer incremented.• The previous head value is now ignored as if it were free space. Tail Pointer = 7 => 8

Head Pointer = 0 => 1

Page 16: Data Structures and Manipulation

Dynamic Example:Stack• A stack is a method of storing

data following the first in last out (FILO) principle.

• A stack pointer is used to store the location of the most recently added item of the stack.

• Used to read the data.• Note: Only the value

pointed to by the stack pointer can be read at any time.

• Push = the action of adding something to the stack.

• Stack Pointer Incremented (7=>8)

• Value is then stored in the address represented by stack pointer.

13

12

11

10

9

8

7

6

5

4

3

2

1

0

Stack Pointer

A variable which stores the address of the upmost value of

the stack.

= 7 => 8

Page 17: Data Structures and Manipulation

Dynamic Example:Stack• Pop = the action of removing

an item from the top of the stack.

• Stack Pointer is decremented (8=>7)

• Note: The data does not need to be deleted as there is no longer any reference to it.

13

12

11

10

9

8

7

6

5

4

3

2

1

0

Stack Pointer

A variable which stores the address of the upmost value of

the stack.

= 7 => 8 => 7

Page 18: Data Structures and Manipulation

Implementing Stacks and Queues• Static data structures can be used to implement dynamic data structures.

- This is because most computers do not implement dynamic data structures natively. And so these must be developed in code.

- Often higher level languages will have implementations built in. e.g. This is done by vb.net .

• A stack can be implemented by using an array.- Methods can be added to the array class to

allow popping and pushing to an array. And it will be stored identically in memory.

- However a linked list can be used as well. This provides a more flexible implementation. Meaning the stack does not need to take up a continuous

section of memory. It can also remove the source of some stack overflows errors.

• As when the available continuous memory runs out can simply point to more free space. (as opposed to writing over other data).

• A similar method can be applied to create a queue.

2 ….

3 Ben

4 Harry

5 Janet

6 Harry

7 Louise

8 ….

Array stored in memory as a stack by restricting access to the top item only (highest index -> “Louise”).

Page 19: Data Structures and Manipulation

• A binary tree is data structure which stores items of data.- Each item of data points to another two. (binary!)- The direction in which they are pointed gives their

position meaning. Can be used to sort alphabetically (as the example has been).

If the traversing algorithm is known.- The first node is called the root node.- Each pointer (arrow) is a possible path from the node- After each new set of items are created they are called a new layer.

• The syllabus specifies one way of traversing trees:1. If there is a left branch that has yet to be traversed, then follow it and repeat.2. Read the node if it hasn’t already been read.3. I there is a right branch traverse it and go back to 1.4. Go back up one layer.

• However other algorithms can be used. Such as the method used to traverse binary trees for Reverse Polish.

Binary Trees

Alex

Barry Terence

Chloe

Ben

Becky Bex

Example of a binary tree.

Follow this shape, but recursively.

Page 20: Data Structures and Manipulation

• Binary Trees are implemented using a “linked list of arrays”. - Each node is represented by an array containing:

Data the node represents Pointer to right hand child node ↴ Pointer to left hand child node → (null if on an end of the tree)

- The pointers store the address in memory of the nodes’children. In this way it acts as a linked list.

Implementing Binary Trees

Data Left Pointer Right Pointer

Ben 0xD023F 0xF11A2

Example node with associated data.

Page 21: Data Structures and Manipulation

Implementing Binary Trees

Data Left Pointer Right Pointer

Bex Null Null

Data Left Pointer Right Pointer

Becky Null Null

Data Left Pointer Right Pointer

Ben 27 18

Data Left Pointer Right Pointer

Barry 3 13

Data Left Pointer Right Pointer

Alex Null Null

Data Left Pointer Right Pointer

Terence Null Null

Data Left Pointer Right Pointer

Chloe 7 10

• Example binary tree: Root Node = 0

Second Layer

A path.

A node.

Page 22: Data Structures and Manipulation

• Example binary tree represented as arrays in memory:

0 “Chloe”

1 7

2 10

3 “Alex”

4 Null

5 Null

6 ...

7 “Barry”

8 3

9 13

10 “Terrence”

11 Null

12 Null

13 “Ben”

14 27

15 18

Implementing Binary Trees

16 ...

17 ...

18 “Bex”

19 Null

20 Null

21 ...

22 ...

23 ...

24 ...

25 ...

26 ...

27 “Becky”

28 Null

29 Null

30 ...

31 ...

Each node represented by an array of length 3.

Data L R

Chloe 7 10

Data L R

Barry 3 13

Data L R

Alex N N

Data L R

Ben 27 18

Data L R

Becky

N N

Data L R

Bex N N

Data L R

Terrence N N

Root Node = 0

Page 23: Data Structures and Manipulation

DATA MANIPULATION SEARCHINGMethods of searching lists. (Recap from F452)

Page 24: Data Structures and Manipulation

Linear Searches• A serial search is where a list is searched in order from its’ first to its’ last

item.- This list is not necessarily ordered (but can be).- Can be slow – especially as the dataset increases.- As there is no order of items, it will have to check each item on the list before it can

determine the item does not exist. Inefficient/waste of cpu time.

• A sequential search is a linear search performed on an ordered dataset.- Main advantage over serial searching is that if the item does not exist, this can be

determined more quickly. When it passes the point where the item should be, it will stop. e.g. When looking for “Ben” in fig.1, will stop when it gets to “Beth” – as

“Beth” is after “Ben” in the alphabet.

Names

Andrew

Beth

Chad

Dave

Fred

Fig.1 Sample ordered list.

Page 25: Data Structures and Manipulation

Binary Search• A binary search is a method of searching data which has been pre-sorted.

- Works by splitting the list in two each time, and taking the section which contains the data item. Hence le binary (two)

- Very efficient – much faster than a serial search. Will take a maximum of log2[Number of items in the list] iterations to find a specific value. As opposed to [Number of items in the list] for a serial/linear search.

• The algorithm can be summarised as: (in a LIST of length N)1. Find midpoint value of list: LIST[N/2]

If an odd number, round up e.g. 13/2 = 6.5 = 7 -> LIST[7]2. If target = midpoint, item found at index of midpoint.3. If target is greater than midpoint, delete all values above.

If target is smaller than midpoint, delete all values below.4. Go back to 1.

Page 26: Data Structures and Manipulation

Searching Example:BinaryFor example, finding “Dave” in this alphabetical list.

In LIST of length N1. Find midpoint value of list:

LIST[N/2]• If an odd number,

round up e.g. 13/2 = 6.5 = 7 -> LIST[7]

2. If target = midpoint, item found at index of midpoint.

3. If target is greater than midpoint, delete all values above.If target is smaller than midpoint, delete all values below.

4. Go back to 1.

Names

Andrew

Beth

Chad

Dave

Fred

Gareth

Harry

Matt

Steve

Terry

Vanessa

Zeffery

1. Midpoint, 12/2 = 6, is “Gareth”2. “Gareth” does not equal “Dave”.3. “Dave” is less than “Gareth” (higher in

the alphabet), so remove all value below and including “Gareth”.

4. Go back to 1.

N = 12

Names

Andrew

Beth

Chad

Dave

Fred

N = 5

1. Midpoint, 5/2 = 2.5 => 3, is “Chad”.2. “Chad” does not equal “Dave”.3. “Dave” is more than “Chad” (lower in the

alphabet), so remove all value above and including “Chad”.

4. Go back to 1.

Names

Dave

Fred

N = 2

1. Midpoint, 2/2 = 1, is “Dave”.2. “Dave” does not equal “Dave” => “Dave”

found at position 4.

Names

Dave

N = 1

Page 27: Data Structures and Manipulation

DATA MANIPULATION SORTINGMethods of sorting lists.

Page 28: Data Structures and Manipulation

Insertion Sort• A method of sorting in which each item is copied from the file into a new file,

in the correct position.- Simple, but has some disadvantages:

Inefficient use of time – very slow. Requires a lot space in memory.

• Algorithm: 

1. Read each value, storing the address of the smallest.

2. When all have been read:a. Copy smallest to the first place in the new fileb. Remove smallest from old file.

3. Go back to 1, until the old file is empty.

Page 29: Data Structures and Manipulation

Quick Sort• A Quick sort is an alternative method sorting.

- Complicated, and cumbersome method but... Becomes increasingly efficient as the number of items increases. Relatively easy to program.

• Algorithm:1. Display list in a row, with a fixed arrow on the first value, and a movable arrow on

the last (however does not actually matter)2. If the two pointed to values are in the right order:

Move the movable arrow towards the centre.3. Else:

Swap the arrows.4. Repeat 2-3. until arrows are adjacent – the middle item is now in the correct place.5. Repeat with sub lists on either side the correctly ordered item.

This is a good exemplar use of recursion.

Page 30: Data Structures and Manipulation

Sorts Question• Perform a quick sort on:

• Perform an insertion sort on:

Names

Andrew

Dave

Fred

Harry

Steve

Terry

Zeffery

Names

Vanessa

Gareth

Chad

Beth

Matt

Answer: Names

Vanessa

Zeffery

Fred

Harry

Steve

Andrew

Dave

Names

Beth

Chad

Gareth

Matt

Vanessa

Answer:

Names

Names

Page 31: Data Structures and Manipulation

DATA MANIPULATION MERGINGMethods of merging lists.

Page 32: Data Structures and Manipulation

Merge Sort• A merge sort is a method of merging two already sorted (sequential) files.

• Outline:1. Read first value from each file2. Compare3. Write smallest value to new file4. Read next value from file used5. Back to 2. until no more items are left.6. Write remainder of longest file to new file.

Page 33: Data Structures and Manipulation

Merge Sort• For example:

Names

Andrew

Dave

Fred

Harry

Steve

Terry

Zeffery

Names

Beth

Chad

Gareth

Matt

Vanessa

Names

Andrew

Beth

Chad

Dave

Fred

Gareth

Harry

Matt

Steve

Terry

Vanessa

Zeffery

Answer:

Names