1 1431227-3 file organization and processing “advanced data structres” “algorithms” lecturer...

28
1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer Al-Lith

Upload: bruce-gordon

Post on 26-Dec-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

1

1431227-3File Organization and

Processing“Advanced Data Structres”

“Algorithms”

LecturerDR.Ali Al Najjar

Umm Al Qura UniversityCollege Of Computer Al-Lith

Page 2: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

2

Books and Materials

• REQUIRED TEXTBOOK: Introduction to Algorithms by T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Second Edition, MIT Press, 2001. ISBN 0-262-03293-7

• RECOMMENDED MATERIALS: Data Structure and Algorithms in Java, Robert Lafore, Sams Publications- Online copy (I will provide)

Data Structures and Algorithms in Java, 2nd Edition, Michael T. Goodrich and Roberto Tamassia, John Wiley and Sons Inc. 2007 ISBN: 81-265-1226-1

I will try to provide everything in the lecture slides

Page 3: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

3

Course Outline

• Trees• Binary Search Trees

- AVL trees

- B-Tree -Black Red

- Heaps

• Secondary Storage Hard Disk• Graphs

–Depth first search–Breadth first search

Page 4: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

4

Grading Scheme

• Attendance 10%• Homework 10%• Midterm 20%• Quizzes 10%• Final exam 50%• Total 100%

Page 5: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

5

OVERVIEW OF ADS AND FILE STRUCTURE AND PROCESSING

Page 6: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

6

Topics Already Covered in ADS Course:

• Asymptotic complexity– Big O, small o, big omega, small omega

• Linear data structures– Arrays, linked list, stacks, ADT, queue ADT, a notion

of dynamic arrays• Sorting

– Insertion sort, merge sort (randomized), quick sort• Sorted sequence

– Dictionary ADT and important operations, trees, binary search trees, AVL trees, in-order traversal

• Hash tables– Hashing concepts, open hashing, closed hashing,

probing, rehashing, implementing dictionary operations using hash tables

• Priority queue– Binary heaps, implementation, dictionary operations

in pririoty queue, heapsort

Page 7: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

7

Refreshing ADS Basic Concepts

• Data Structure?• Algorithms?• File Structure?

Page 8: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

8

Basic Concepts- Data Structures

• A data Structures is the organization of data in a computer’s memory or in a disk file.

– Examples: Arrays, stacks, linked list

• Algorithms are the procedure a software program uses to manipulate the data in these structure.

• Example: Printing address labels– Array to store the address- Data Structure– For loop – sequential access to the array-

Algorithm

Page 9: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

9

Characteristics of Data Structures

• Array– Advantages: quick insertion, very fast access if

index is known– Disadvantages: Slow search, slow deletion, fixed

size

• Ordered Array– Advantages: Quicker search than unsorted array– Disadvantages: Slow insertion and deletion, fixed

size

• Stack– Advantages: Provides LIFO access– Disadvantage: Slow access to other items

Page 10: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

10

Characteristics of Data Structures

• Queue– Advantages: Provides FIFO access– Disadvantages: Slow access to other items

• Linked List– Advantages: Quick insertion, quick deletion– Disadvantages: Slow search

• Binary Tree– Advantages: Quick search, insertion,

deletion (if tree remains balanced)– Disadvantages: Deletion algorithm is

complex

Page 11: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

11

Overview of Algorithms• Basic operations:

– Insert a new data item– Search for a specified item– Delete a specified item

• Definitions:– Database

• All the data that will be dealt within a particular situation

• Stored on a disk- File– Records

• Units into which database is divided• Provide format for storing information

– Fields• Records are usually divided into several fields• Field holds a particular kind of data

• In Java records are usually represented by objects of an appropriate class

Page 12: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

12

Data Structure vs. File Structure

• Both involve:– Representation of Data– +– Operations for accessing data

• Difference:– Data Structures deal with data in main

memory– File Structures deal with data in secondary

storage device (File).

Page 13: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

13

Computer Architecture

CPU

Registers

Cache

Main Memory

Secondary Storage

─ Fast─ Small─ Expensive─ Volatile

─ Slow─ Large─ Cheap─ Stable

RAM

Disk, Tape, DVD-R

Page 14: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

14

Memory Hierarchy

►On systems with 32-bit addressing, only 232 bytes can bedirectly referenced in main memory.

►The number of data objects may exceed this number!

►Data must be maintained across program executions.

This requires storage devices that retain information when the computer is restarted.– We call such storage nonvolatile.– Primary storage is usually volatile, whereas secondary andtertiary storage are nonvolatile.

Page 15: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

15

How Fast?

• Typical times for getting info– Main memory: ~120 nanoseconds

=120x10-09

– Magnetic Disks: ~30 milliseconds = 30x10-03

• An analogy keeping same time proportion as above

– Looking at the index of a book: 20 secondsversus

– Going to the library: 1 hour

Page 16: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

16

Comparison

• Main Memory– Fast (since electronic)– Small (since expensive)– Volatile (information is lost when power

failure occurs)

• Secondary Storage– Slow (since electronic and mechanical)– Large (since cheap)– Stable, persistent (information is

preserved longer)

Page 17: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

17

Goals of this course

• Advanced data structure and algorithm that builds the student’s knowledge in the areas of

– search tree structures (Red-Black Trees, B- Trees, Splay Trees),

– advanced heap structures (Fibonacci Heaps),

– graphs and graphs algorithm (Depth-first, Breadth- first, Minimum Spanning Trees, Shortest path, Maximum flow, Matching) and

– geometric algorithm (Intersection of line segments, convex hull).

Page 18: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

18

Goals of this course (Cont’d)• The successful student, learning these

concepts, will be able to analyze algorithms for different data structures and file structures.

• The objective of Data Structures and Algorithm was to teach ways of efficiently organizing and manipulating data in main memory. In this course you will learn equivalent techniques for organization and manipulation of data in secondary storage.

Page 19: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

19

Algorithm Analysis- Big O Notation• Example: automobile

– Large, medium, economy (compacts, subcompacts, midsize)

• Provide quick idea about the size. Don’t need actual dimension

• Useful to have a shorthand way to say how efficient a computer algorithm is.

• In CS, rough measure is called Big O notation

• Alg. A is twice as fast as alg. B- not meaningful• Why?

– Proportion can change radically as the number of items change

– Need a measure that is related to the number of items

BIG O IS THE SOLUTION!!!

Page 20: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

20

Big O Notation• Insertion in an unsorted array

– Does not depend on how many items in an array, no matter how big is array N

– Item placed in the next available position a[nElems] and nElems++;

– Constant time , T=K, O(1)– Real situation: time depends on speed of the

microprocessor, how efficiently the compiler generated the program code and other factors;

– Constant K account for all such factors

• Linear search– No. of comparisons that must be made to find that item– Average time: half of the total number of items T=K*N/2– How to calculate K?– k= K/2; T=k*N– Proportional to the size of the array O(N)

Page 21: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

21

Running Times in Big O Notation

Algorithm Running time- Big O

Comment

Linear search O(N) Fair

Binary search O(log N) Good

Insertion in unordered array O(1) Best

Insertion in an ordered array O(N)

Deletion in an unordered array O(N)

Deletion in an ordered array O(N)

Sorting O(N2) Very bad

Why Not Use Arrays for Everything?

Page 22: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

22

Sorting

• Examples:– students by grade, customers by zip code, home sales by

price, cities in order of increasing population, countries by GNP, stars by magnitude, and so on.

• Sorting data may also be a preliminary step to searching it. Binary search, which can be applied only to sorted data, is much faster than a linear search.

• Because sorting is so important and potentially so time-consuming, it has been the subject of extensive research in computer science, and some very sophisticated methods have been developed.

Page 23: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

23

How would you do Sorting?

• Two Steps: – Compare two items.– Swap two items or copy one item.– Move one position right

• Problem: – Imagine that kids-league baseball team is lined up

on the field. The regulation nine players, plus an extra, have shown up for practice. You want to arrange the players in order of increasing height (with the shortest player on the left), for the team picture. How would you go about this sorting process?

Page 24: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

24

How would you do Sorting?

Unsorted

Sorted

Page 25: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

25

Bubble Sort

• Notoriously slow but conceptually simplest

• Solution: – You start at the left end of the line and

compare the two kids in positions 0 and 1. If the one on the left (in 0) is taller, you swap them.

– If the one on the right is taller, you don't do anything. Then you move over one position and compare the kids in positions 1 and 2.

– Again, if the one on the left is taller, you swap them.

Beginning of the 1st Pass

Page 26: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

26

Bubble Sort

• After 1st pass tallest kid is on the right. Biggest item bubble up to the top end of the array as the algorithm progresses

End of 1st Pass

Page 27: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

27

Bubble Sort- Observation

• After this first pass through all the data, you've made N–1 comparisons and somewhere between 0 and N–1 swaps, depending on the initial arrangement of the players. The item at the end of the array is sorted and won't be moved again.

• Now you go back and start another pass from the left end of the line. Again you go toward the right, comparing and swapping when appropriate. However, this time you can stop one player short of the end of the line, at position N–2, because you know the last position, at N–1, already contains the tallest player. This rule could be stated as:– When you reach the first sorted player, start over at the left end

of the line.• You continue this process until all the players are in

order.

Page 28: 1 1431227-3 File Organization and Processing “Advanced Data Structres” “Algorithms” Lecturer DR.Ali Al Najjar Umm Al Qura University College Of Computer

28

Efficiency of the Bubble Sort• Examples: 10 items

– Comparisons: 9+8+7+-----+1= 45– General formula :

• (N-1)+(N-2)+(N-3)+--------+1=N*(N-1)/2~N2/2 ~O(N2)

• Practice Problems– Define the following terms: data structure, algorithm, file structure, Big O

notation

– Why array is not used in everything?

– What are the major steps in sorting?

– Compare the run time in terms of Big O notation for different algorithm

– Implement bubble sort using Java