cpsc 335

28
CPSC 335 Computer Science University of Calgary Canada

Upload: jeb

Post on 19-Mar-2016

61 views

Category:

Documents


2 download

DESCRIPTION

CPSC 335. Computer Science University of Calgary Canada. Outline. Coalesced Hashing Variants Brent’s Method Binary Tree Comparison of various methods. Coalesced Hashing. Coalesced hashing is a collision resolution method that - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CPSC 335

CPSC 335

Computer ScienceUniversity of Calgary

Canada

Page 2: CPSC 335

2

Outline

Coalesced Hashing

Variants

Brent’s Method

Binary Tree

Comparison of various methods

Page 3: CPSC 335

3

Coalesced hashing is a collision resolution method that uses pointers to connect the elements of a synonym chain.

Coalesced Hashing

• A hybrid of separate chaining and open addressing.

• Linked lists within the hash table handle collisions.

• This strategy is effective, efficient and very easy to implement.

Page 4: CPSC 335

4

Coalesced hashing obtains its name from what occurs when we attempt to insert a record with a home address that is already occupied by a record from a chain with a different home address.

Coalesced Hashing

This situation would occur, for example, if we attempted to insert a record with a home address of s into the hash table. What occurs is that the two chains with records having different home addresses coalesce or grow together.

Page 5: CPSC 335

5

In figure to the right, the records with keys X, D, and Y were inserted in the given order into the hash table. A, B, C, and D form one set of synonyms and X and Y form another set.

When X is inserted into the table with coalescing, it must be inserted as the end of the chain that it is coalescing with. Instead of needing only one probe to retrieve X, three are needed. The greater the coalescing the longer he probe chain will be, and as a result, retrieval performance will be degraded.

When record D is now added, it must be inserted at the end of the coalesced chains; we must move over record X from the other chain then to locate D.

Coalesced Hashing

Synonym chain: with coalescing(The shaded portion indicates portion of the chain in which coalescing has occurred, the thin line represents the insertions on the synonym chain with r as its home address. The thick line represents the insertions on the chain with s as its home address.)

Page 6: CPSC 335

6

Coalesced Hashing

Coalesced hashing originated with Williams [1] and is also referred to as direct chaining.

Algorithm for Coalesced

Hashing

Page 7: CPSC 335

7

Many suggestions have been made for reducing the coalescing of probe chains and thereby lowering the number of retrieval probes which in turn improves performance.

The variants may be classified in three ways:

Variants

• The table organization (whether or not a separate overflow area is used).

• The manner of linking a colliding item into a chain.

• The manner of choosing unoccupied locations.

Page 8: CPSC 335

8

Coalescing may be reduced by modifying the table organization.

Instead of allocating the entire table space for both overflow records and home address records, the table is divided into a primary area and a overflow area.

Primary

Overflow (cellar)

Variants

• The primary area is the address space that the hash function maps into. • The overflow or cellar area contains only overflow records.

• The address factor is the ratio of primary area to the total table size – Address Factor = primary area / total table size

Page 9: CPSC 335

9

For a fixed amount of storage, as the address factor decreases, the cellar size increases, which reduces the coalescing but because the primary area becomes smaller, it increases the number of collisions.

More collisions mean more items requiring multiple retrieval probes.

Vitter [2] determined that an address factor of 0.86 yields nearly optimal retrieval performance for most load factors.

Variants

Page 10: CPSC 335

10

LISCH

The algorithm given in slide 6 is called Late Insertion Standard Coalesced Hashing (LISCH) since new records are inserted at the end of a probe chain.

[ The ‘Standard’ in the name refers to the

lack of a cellar.

The variant of that algorithm that uses a cellar is called LICH, Late Insertion Coalesced Hashing.

Variants

Page 11: CPSC 335

11

Another way of varying the insertion algorithm Changing the way in which we choose a unoccupied

location. The unoccupied locations are always chosen from

the bottom of the storage area. But the no. of collisions is increased

in this way.

Hsaio [3] suggest REISCH (‘R’ stands for ‘Random’), in which a random unoccupied location for the new insertion is chosen.

REISCH gives only 1% improvement over EISCH.

BLISCH (‘B’ signifies ‘Bidirectional’) is another method of choosing the overflow location for a collision insertion is to alternate the selection between the top and bottom of the table.

In DCWC (Direct Chaining Without Coalescing), a record not stored at its home address is moved.

Variants

Page 12: CPSC 335

12

Variants Table 1: Mean number of probes for successful lookup (n =

997) for variants of Coalesced Hashing

Page 13: CPSC 335

13

Dynamic collision resolution methods are methods in which an item once stored may be moved.

With these methods, any item may be moved, not only those

records which are not stored at their home addresses.

These methods require additional processing when inserting a record into the table but reduce the number of probes needed for retrieval.

The justification for this additional processing is that we usually insert an tem into a table only once but retrieve it many times.

Brent’s Method

Page 14: CPSC 335

14

The Primary Probe Chain of a record is the sequence of locations visited during the insertion or retrieval of the record.

The sequence of positions visited when attempting to move a record from the primary probe chain is called the Secondary Probe Chain.

We want to minimize the total number of probes for both the item being inserted and the items already in the table. This strategy assumes an equal likelihood of any of the items being retrieved.

Brent’s Method

Page 15: CPSC 335

15

Brent’s method is the first of several dynamic collision resolution methods. In each of them, moving a previously stored tem to achieve a

reduction in the retrieval probes is considered.

Brent’s Method

Brent’s method, probe chains, and their order of processing

The solid vertical line represents the primary probe chain.

The horizontal lines represent the secondary probe chain.

The q value along the primary probe chain is the increment for the item being inserted whereas the qi’s along the secondary probe chains represent the increments associated with the item being moved.

Page 16: CPSC 335

16

Brent’s Method

Brent’s method, probe chains, and their order of processing

The subscript i gives the number of probes needed to retrieve the item being inserted along its primary probe chain.

The subscript j gives the number of additional probes needed to retrieve the item being moved along its secondary probe chain.

To minimize the number of retrieval probes, (i+j) is minimized. In the case of i=j, we will arbitrarily choose to minimize on i.

When we can no longer achieve a reduction in the no. of retrieval probes, we should terminate the process of attempting to move an item.

Page 17: CPSC 335

17

Brent’s Method

Brent’s method, probe chains, and their order of processing

Let s be the number of probes required to retrieve an item if nothing is moved.

We then try all combinations of (i+j) < s such that we minimize (i+j).

On equality, since there would be no reduction in the number of probes, no movement would occur.

Page 18: CPSC 335

18

Coalesced Hashing Algorithm for insertion into a file

using Brent’s method

Page 19: CPSC 335

19

A question that is often asked when considering Brent’s collision resolution method is, “If it is a good idea to move an item on a primary probe chain, why not carry this concept one step further and move items from secondary and subsequent probe chains?”

Two features of the binary tree collision resolution method make it worth considering:

Binary Tree

• It needs fewer retrieval probes than Brent’s method.

• Perhaps more importantly, it illustrates the importance of choosing an appropriate data structure in order to be able to solve a problem effectively.

Page 20: CPSC 335

20

Binary tree collision resolution method uses a binary tree structure to determine when to move an item and where to move it.

A binary tree is appropriate since there are

essentially two choices at each probable storage address – continue to the next address along the probe chain of the item being inserted or move the item stored at that address to the next position on its probe chain.

A left branch in the binary tree signifies the

continue option and a right branch the move option.

Binary Tree

Page 21: CPSC 335

21

The Binary decision tree is generated in a breadth first fashion from the top down left to right a shown:

Binary Tree

Binary decision tree

The binary tree is used only as a control mechanism in deciding where to store an item and is not used for string records.

A different binary tree is constructed for each insertion of a record.

By moving items from secondary and subsequent probe chains, a placement of records that will further reduce the average number of retrieval probes when compared with Brent’s method is achieved.

Page 22: CPSC 335

22

Comparison

Table 2 provides the average number of retrieval probes for successful searches on a table of 997 records with a uniform distribution of keys.

Table 2: Comparison of Mean number of probes for

successful lookup (n = 997; = packing factor)

Page 23: CPSC 335

23

Comparison

Performance of collision resolution methods

Figure 5 graphically displays the performance data for all methods except for computed chaining with a 2-bit link field.

Page 24: CPSC 335

24

Comparison

It can be noticed the wide variance in performance at packing factors >= 90 percent.

The result of computed chaining with a 20 percent packing factor s less than that for DCWC (Direct Chaining Without Coalescing).

Performance of collision resolution methods

Page 25: CPSC 335

25

Comparison

The above table offers additional useful comparison criteria. The successful search criteria give the minimum and maximum number of probes necessary to retrieve an item.

Table 3: Search, relocation and storage

comparisons

Page 26: CPSC 335

26

Comparison Table 3: Search, relocation and

storage comparisons

• The range for worst case performance varies from ln n to n.

• Although the worst case performance for locating a record with both LISCH and computed chaining is n, their typical performances would be better, because only records of one chain need to be searched.

Page 27: CPSC 335

27

What is the best method? There is no single method that is the best for all

purposes.

The method that provides the lowest average number of probes, and thus the best performance, in general, is DCWC.

The method with the second lowest average number of retrieval probes is computed chaining.

Without coalescing, LISCH is DCWC and does perform better than computed chaining.

If storage s somewhat scarce, computed chaining will then have an advantage over DCWC.

Comparison

Page 28: CPSC 335

28

ComparisonTable 4: Advantages, disadvantages, and when to use

various collision resolution methods