ceng-112 data structures iserap atay, ph.d. 2007 1 chapter 2 searching

57
Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 1 Chapter 2 Searching

Upload: cornelius-allison

Post on 27-Dec-2015

229 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 1

Chapter 2

Searching

Page 2: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 2

Outline

• Linear List Searches• Sequential Search

» The sentinel search,

» The probability search,

» The ordered search.

• Binary Search

• Hashed List Searches– Collision Resolution

Page 3: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 3

Linear List Searches

• We study searches that work with arrays.

Figure 2-1

Page 4: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 4

Linear List Searches

There are two basic searches for arrays

1. The sequential search.

– It can be used to locate an item in any array.

2. The binary search.

– It requires an ordered list.

Page 5: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 5

Linear List SearchesSequential Search

• The list is not ordered!

• We will use this technique only for small arrays.

We start searching at the beginning of the list and

continue until we find the target entity.

a. Eighter we find it,

b. or we reach the end of the list!

Page 6: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 6

Figure 2-2Locating data in unordered list.

Page 7: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 7

• RETURN: The algorithm must be tell two

things to calling algorithm;

1. Did it find the data ?

2. If it did, what is the index (address)?

Linear List SearchesSequential Search Algorithm

Page 8: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 8

• The searching algorithm requires five parameters:

1. The list.

2. An index to the last element in the list.

3. The target.

4. The address where the found element’s index location is

to be stored.

5. The address where the found or not found boolean is to

be stored.

Linear List SearchesSequential Search Algorithm

Page 9: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 9

Sequential Search Algorithm

algorithm SeqSearch (val list <array>, val last <index>,

val target <keyType>, ref locn <index>)

Locate the target in an unordered list of size elements.

PRE list must contain at least one element.

last is index to last element in the list.

target contains the data to be located.

locn is address of index in calling algorithm.

POST if found – matching index stored in locn & found TRUE

if not found – last stored in locn & found FALSE

RETURN found <boolean>

Page 10: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 10

Sequential Search Algorithm

1. looker = 1

2. loop (looker < last AND target not equal list(looker))1. looker = looker + 1

3. locn = looker

4. if (target equal list(looker))1. found = true

5. else1. found = false

6. return found

end SeqSearchBig-O(n)

Page 11: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 11

Variations On Sequential Search

There are three variations of sequential search algorithm:

1. The sentinel search,

2. The probability search,

3. The ordered search.

Page 12: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 12

Sequential Search AlgorithmThe Sentinel Search

• If the target will be found in the list, we can eliminate the test for the end of list.

algorithm SentinelSearch (val list <array>, val last <index>,

val target <keyType>, ref locn <index>)

Locate the target in an unordered list of size elements.

PRE list must contain element at the end for the sentinel.

last is index to last element in the list.

target contains the data to be located.

locn is address of index in calling algorithm.

POST if found – matching index stored in locn & found TRUE

if not found – last stored in locn & found FALSE

RETURN found <boolean>

Page 13: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 13

Sequential Search AlgorithmThe Sentinel Search

1. list[last+1] = target

2. looker = 1

3. loop (target not equal list(looker))1. looker = looker + 1

4. if (looker <= last)1. found = true

2. locn = looker

5. else1. found = false

2. locn = last

6. return found

end SentinelSearch

Big-O(n)

Page 14: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 14

Sequential Search AlgorithmThe Probability Search

algorithm ProbabilitySearch (val list <array>, val last <index>,

val target <keyType>, ref locn <index>)

Locate the target in a list ordered by the probability of each element being the target – most probable first, least probable last.

PRE list must contain at least one element.

last is index to last element in the list.

target contains the data to be located.

locn is address of index in calling algorithm.

POST if found – matching index stored in locn & found TRUE and element moved up in priority.

if not found – last stored in locn & found FALSE

RETURN found <boolean>

Page 15: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 15

Sequential Search AlgorithmThe Probability Search

1. looker = 1

2. loop (looker < last AND target not equal list[looker])1. looker = looker + 1

3. if (target = list[looker])1. found = true

2. if (looker > 1)1. temp = list[looker-1]

2. list[looker-1] = list[looker]

3. list[looker] = temp

4. looker = looker - 1

4. else1. found = false

5. locn = looker

6. return found

end ProbabilitySearch

Big-O(n)

Page 16: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 16

Sequential Search AlgorithmThe Ordered List Search

• If the list is small it can be more efficient to use a sequential search.• We can stop search loop, when the target becomes less than or equal to the

testing element of the list.algorithm OrderedListSearch (val list <array>, val last <index>,

val target <keyType>, ref locn <index>)Locate the target in a list ordered on target.PRE list must contain at least one element.

last is index to last element in the list. target contains the data to be located. locn is address of index in calling algorithm.

POST if found – matching index stored in locn & found TRUE if not found – last stored in locn & found FALSE

RETURN found <boolean>

Page 17: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 17

Sequential Search AlgorithmThe Ordered List Search

1. if (target <= list[last])1. looker = 1

2. loop (target > list[looker])1. looker = looker + 1

2. else1. looker = last

3. if (target equal list[looker]1. found = true

4. else1. found = false

5. locn = looker

6. return found

end OrderedListSearch

Big-O(n)

Page 18: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 18

Sequential Search

• The sequential search algorithm is very slow for the big lists.

• Big-O(n)

• If the list is ordered, we can use a more efficient algorithm

called the binary search.

Page 19: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 19

Binary Search

Test the data in the element at the middle of the array.

If it is in the first half! If it is in the second half!

Test the data in the element at the middle of the array.

Test the data in the element at the middle of the array.

If it is in the second half!

If it is in the second half!

If it is in the first half!

If it is in the first half!

......

......

Page 20: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 20

Figure 2-4

mid=(first+last)/2

target > midfirst = mid +1

target < midlast = mid -1

Page 21: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 21

Figure 2-5

first becomes larger than last!

Page 22: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 22

Binary Search Algorithm

algorithm BinarySearch(val list <array>, val last <index>,

val target <keyType>, ref locn <index>)

Search an ordered list using binary search.

PRE list is ordered:it must contain at least one element.

last is index to the largest element in the list.

target is the value of element being sought.

locn is address of index in calling algorithm.

POST Found : locn assigned index to target element.

found set true.

Not found: locn = element below or above target.

found set false.

RETURN found <boolean>

Page 23: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 23

Binary Search Algorithm1. first = 12. last = end3. loop (first <= last)

1. mid = (first + last)/22. if (target > list[mid])

1. first = mid + 1 (Look in upper half).

3. else if (target < list[mid]1. last = mid – 1 (Look it lower halt).

4. else1. first = last + 1 (Found equal: force exit)

4. locn = mid5. if (target equal list[mid])

1. found = true

6. else1. found = false

7. Returnend BinarySearch

Big-O(log2n)

Page 24: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 24

Comparison of binary and sequential searches

Size BinarySequential

(Average)Sequential(Worst case)

16 4 8 16

50 6 25 50

256 8 128 256

1.000 10 500 1.000

10.000 14 5.000 10.000

100.000 17 50.000 100.000

1.000.000 20 500.000 1.000.000

Page 25: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 25

Hashed List Searches

• In an ideal search, we would know exactly where the data are

and go directly there.

• We use a hashing algorithm to transform the key into the

index of array, that contains the data we need to locate.

Page 26: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 26

Figure 2-6

•It is a key-to-address transformation!

Page 27: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 27

Figure 2-7

•We call set of keys that hash to the same location in our list

synonymns.

•A collision is the event that occurs when a hashing algorithm produces

an address for an insertion key and that address is already occupied.

•Each calculation of an address and test for success is known as a probe.

Page 28: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 28

Figure 2-8

Hashing Methods

Page 29: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 29

Direct Hashing Method

• The key is the address without any algorithmic manipulation.

• The data structure must contain an element for every possible key.

• It quarantees that there are no synonyms.

• We can use direct hashing very limited!

Page 30: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 30

Figure 2-9

Direct Hashing Method

Direct hashing of employee numbers.

Page 31: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 31

Subtraction Hashing Method

• The keys are consecutive and do not start from one.

Example:

• A company have 100 employees,

• Employee numbers start from 1000 to 1100.

1

2

100

Ali Esin

Sema Metin

Filiz Yılmaz

x=1001

x – 1000x=1002x=1100

12100

99

Page 32: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 32

Modulo Division Hashing Method

• The modulo-division method divides the key by the array size

and uses remainder plus one for the address.

address = key mod (listSize) + 1

• If a list size selected a prime number, that produces fewer

collisions than other list sizes.

Page 33: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 33

Figure 2-10

Modulo Division Hashing Method

121267 / 307 = 395 andremainder = 2hash(121267)= 2 +1 = 3

We have 300 employees, and the first prime greater that 300 is 307!.

Page 34: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 34

Digit Extraction Method

• Selected digits are extracted from the key and used as the address.

Example:

379452 394

121267 112

378845 388

526842 568

Page 35: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 35

Midsquare Hashing Method

• The key is squared and the address selected from the middle of the squared number.

• The most obvious limitation of this method is the size of the key.

Example:

9452 * 9452 = 89340304 3403 is the address.

Or

379452 379 * 379 = 143641 364

Page 36: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 36

Figure 2-11

Folding Hashing Method

Page 37: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 37

Pseudorandom Hashing Method

• The key is used as the seed in a pseudorandom number

generator and resulting random number then scaled in to a

possiple address range using modulo division.

Use a function such as: y = (ax + b (mod m))+1

1. x is the key value,

2. a is coefficient,

3. b is a constant.

4. m is the count of the element in the list.

5. y is the address.

Page 38: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 38

Pseudorandom Hashing Method

y = (ax + b (mod m)) + 1 y = (17x + 7 (mod 307)) + 1

• x = 121267 is the key value,

• a = 17

• b = 7

• m =307

1. y = ((( 17 * 121267) + 7) mod 307) + 1

2. y = ((2061539 +7) mod 307) + 1

3. y = 2061546 mod 307 + 1

4. y = 41 + 1

5. y = 42

Page 39: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 39

Figure 2-12

Rotation Hashing Method

Rotation is often used in combination with folding and psuedorandomhashing.

Page 40: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 40

Figure 2-13

Collision Resolution Methods

All above methods of handling collision are independent of the hashing algorithm.

Page 41: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 41

Collision Resolution Concepts “Load Factor”

• We define a full list, as a list in which all elements except one contain data.

• Rule: A hashed list should not be allowed to become more than %75 full!

the number of filled elements in the list

Load Factor = ------------------------------------------------------ x 100

total number of elements in the list

k

α = --------- x 100 the number of elements

n

Page 42: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 42

Collision Resolution Concepts “Clustering”

• Some hashing algorithms tend to couse data to group

within the list. This is known as clustering.

• Clustering is created by collision.

• If the list contains a high degree of clustering, then

the number of probes to locate an element grows and

the processing efficiency of the list is reduced.

Page 43: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 43

Collision Resolution Concepts “Clustering”

• Clustering types are:

– Primary clustering; clustering around a home address in

our list.

– Secondary clustering; the data are widely distributed across

the whole list so that the list appears to be well distributed,

however, the time to locate a requested element of data

can become large.

Page 44: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 44

Collision Resolution Methods Open Addressing

• When a collision occurs, the home area addresses are searched

for an open or unoccupied element where the new data can be

placed.

• We have four different method:

– Linear probe,

– Quadratic probe,

– Double hashing,

– Key offset.

Page 45: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 45

Open Addressing“Linear Probe”

• When data cannot be stored in the home address, we

resolve the collision by adding one to the current address.

Advantage:

• Simple implementation!

• Data tend to remain near their home address.

Disadvantages:

• It tends to produce primary clustering.

• The search algorithm may become more complex

especially after data have been deleted!

.

Page 46: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 46

Open Addressing“Linear Probe”

15532 / 307 = 50 andremainder = 2hash(15532)= 2 +1 = 3New address = 3+1 =4

Page 47: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 47

Figure 2-14

Open Addressing“Linear Probe”

Page 48: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 48

Open Addressing“Quadratic Probe”

• Clustering can be eliminated by adding a value other than one to the current address.

• The increment is the collision probe number squared.– For the first probe 12

– For the second probe 22

– For the third collision probe 32 ...

– Until we eighter find an empty element or we exhoust the possible elements.

– We use the modulo of the quadratic sum for the new address.

Page 49: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 49

Increase by twoFore each probe!

Open Addressing “Quadratic Probe”

ProbeNumber

CollisionLocation

Probe*Probe&

IncrementNew

AddressIncrement

FactorNext

Increment

1 1 1*1=1 2 1 1

2 2 2*2=4 6 3 4

3 6 3*3=9 15 5 9

4 15 4*4=16 31 7 16

5 31 5*5=25 56 9 25

6 56 6*6=36 92 11 36

7 92 7*7=49 41 13 49

++

+

Page 50: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 50

Figure 2-15

In this methot, rather than using an arithmetic probe functions, the address is rehashed.

Open Addressing – Double Hashing“Pseudorandom Collision Resolution”

y = ((ax + c) mod listSize) +1y = ((3.2 +(-1) mod 307) +1y = 6

Page 51: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 51

Open Addressing – Double Hashing “Key Offset Collision Resolution”

• Key offset is another double hashing method and, produces

different collision paths for different keys.

• Key offset calculates the new address as a function of the old

address and the key.

Page 52: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 52

Open Addressing – Double Hashing “Key Offset Collision Resolution”

offSet = [key / listSize]

address = ((offSet + old address) mod listSize) + 1

offSet = [166702 / 307] = 543

1. Probe : address = ((543 + 2) mod 307) + 1 = 239

2. Probe : address = ((543 + 239) mod 307) + 1 = 169

KeyHome

Address Key Offset Probe 1 Probe 2

166702 2 543 239 169

572556 2 1865 26 50

67234 2 219 222 135

Page 53: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 53

Collision Resolution Open Addressing Resolution

• A major disadvantage to open addressing is that each

collision resolution increases the probability of future

collisions!

Page 54: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 54

Figure 2-16

Collision ResolutionLinked List Resolution

Link head pointer.

A link list is an ordered collection of data in which each element contains the location of the next element.

Page 55: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 55

Figure 2-17

Collision ResolutionBucket Hashing Resolution

Page 56: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 56

Hw #2

1. Create an array which includes the random integer 100 numbers between 0 and 150.

2. This should be an unordered list. 1. Use Linear sentinel search algorithm and find the target

value in the array.2. Use the Probability search algorithm and find the target

value in the array.

3. Create an ordered list which includes the 100 numbers between 0 and 150.

1. Use ordered list search algorithm and find the target value in the array.

2. Use binary search algorithm and find the target value in the array.

Load your HW-2 to FTP site until 15 Mar. 06 at 17:00.

Page 57: Ceng-112 Data Structures ISerap Atay, Ph.D. 2007 1 Chapter 2 Searching

Ceng-112 Data Structures I Serap Atay, Ph.D. 2007 57

Hw #2

– Run the each search algorithm 10 times and report these performance values for each of them.

– Write your comments about the result table.

 SentinelSearch

ProbabilitySearch

OrderedSearch

BinarySearch

Number of Completed Searches        

Number of Successful Searches        

Avarage number of tests per search