db-5

8/12/2019 db-5

1/47

3: Database Systems

Part V: Physical Database

Design

8/12/2019 db-5

2/47

2

Physical Database Design

The process of mapping the logicaldata model into an internal set of

physical database structures

Major consideration: Can the user get the desired information,

in the appropriate format, and in a timely

(i.e. acceptable response time) fashion?

8/12/2019 db-5

3/47

3

Objectives of Physical DatabaseDesign

Implement the database as a set ofstored records, files, indexes, etc.

Provide adequate performance

Ensure database integrity, security,

and recoverability

8/12/2019 db-5

4/47

8/12/2019 db-5

5/47

5

Components of Physical Design

Data volume and usage analysis

Data distribution strategy

File organization

Indexes

Integrity constraints

8/12/2019 db-5

6/47

6

Data Volume and Usage Analysis

Database size

Used to select physical storage devices

and estimate cost of storage

Usage paths Used to select file organization and

access methods

Plan for use of indexes Strategy for data distribution

8/12/2019 db-5

7/47

7

Composite Usage Map

8/12/2019 db-5

8/47

8

Composite Usage Map

Data volumes

8/12/2019 db-5

9/47

9

Composite Usage Map

Access

Frequencies (per

hour)

8/12/2019 db-5

10/47

10

Composite Usage Map

Usage analysis:200 purchased parts

accessed per hour

80 quotations accessed

from these 200 purchased

part accesses

70 suppliers accessed from

these 80 quotation

accesses

8/12/2019 db-5

11/47

11

Composite Usage Map

Usage analysis:75 suppliers accessed per

hour

40 quotations accessed

from these 75 supplier

accesses 40 purchased parts

accessed from these 40

quotation accesses

8/12/2019 db-5

12/47

12

Data Distribution Strategies

Different approaches to determine atwhich nodes or sites to physicallylocate the data in a distributed

computing network Four strategies

Centralized

Partitioned

Replicated

Hybrid

8/12/2019 db-5

13/47

13

Centralized

All data are located at a single site

Advantage

Simple implementation

Disadvantages

Data not readily accessible to remote

users

Expensive data communication costs

When central system crashes, entire

database system fails

8/12/2019 db-5

14/47

14

Partitioned

Database is divided into non-overlapping partitions or fragments

which are assigned to particular sites

Advantage Data is more accessible to local user

Disadvantage

More complex implementation

8/12/2019 db-5

15/47

15

Replicated

Duplicate copies of the entire databaseare assigned to more than one site in

the network

Advantage Maximizes local access to data

Disadvantage:

Update problems (synchronization)

8/12/2019 db-5

16/47

16

Hybrid

Database is partitioned into critical andnon-critical fragments

Critical fragments are stored at

multiple sites, while non-criticalfragments are only in one site

What are the advantages and

disadvantages of this approach?

8/12/2019 db-5

17/47

17

File Organization

How records are physically arrangedor stored on secondary storage

devices

Example Storage on hard disks, tapes, CD-ROMs,

etc.

8/12/2019 db-5

18/47

18

Basic File Organizations

Sequential

Indexed

Indexed sequential

Indexed non-sequential

Hashed

8/12/2019 db-5

19/47

19

Sequential File Organization

Records in the fileare stored in

sequence

according to a

primary key value

If notsortedAverage time to

find desired

record = n/2.

1

2

n

If sorted every insert or

delete requires

resort

8/12/2019 db-5

20/47

20

Indexed File Organization

An index is created that allows user tolocate individual records faster

Index

A table or other data structure used todetermine the location of rows in the

main table that satisfy some condition

8/12/2019 db-5

21/47

21

Indexed Sequential

Records are stored sequentially byprimary key value

Uses block index

Example:

White pages phone directory

8/12/2019 db-5

22/47

22

Indexed Non-Sequential

Records are stored non-sequentially

Full index is required

Example

Books in a library

8/12/2019 db-5

23/47

23

Hashed File Organization

A hashing algorithm is used todetermine the address of each record

Hashing algorithm

Converts a primary key value into arelative record number or file address

Example: Divide primary key value by a

prime number and use the remainder asthe storage location

8/12/2019 db-5

24/47

24

Selecting File Organization

Select a file organization that provides areasonable balance among the following

criteria:

Fast access for retrieval

High throughput for processing transactions

Efficient use of storage devices

Protection from failures or data loss

Minimal need for reorganization Accommodation for file growth

Security from unauthorized use

8/12/2019 db-5

25/47

25

Constraints in Selecting FileOrganization

Physical characteristics of secondarystorage devices

Available operating system

File management software

User needs for storing and accessing

data

8/12/2019 db-5

26/47

26

Indexes

Stored in main memory for fastersearching of required values

Types of index

Primary key

Non-key

Clustering

8/12/2019 db-5

27/47

27

Types of Indexes

Primary key Index created based on the primary key

Non-key

Index created for each desired non-keyattribute

Clustering

Speeds up retrievals by physicallyordering the file or table based on a non-

key attribute

8/12/2019 db-5

28/47

28

Clustering Indexes

Clustering attribute Any non-key attribute used to group

together rows that have a common value

for the attribute Clustering index

Index defined on the clustering attribute

of a table

8/12/2019 db-5

29/47

29

Clustering Index: An Example

DESCRIPTION RECORD NO.

Bookcase 1

Chair 3,5

Dresser 2,6,7

Stand 4

RECORD NO. PRODUCT NO. DESCRIPTION FINISH PRICE

1 0100 Bookcase Oak 75

2 0350 Dresser Maple 625

3 0975 Chair Cherry 100

4 1000 Stand Pine 750

5 1250 Chair Maple 125

6 1425 Dresser Oak 800

7 1775 Dresser Pine 1200

PRODUCTTABLE

DESCRIPTION INDEX

(Non-clustered)

8/12/2019 db-5

30/47

30

Clustering Index: An Example

DESCRIPTION RECORD NO.

Bookcase 1

Chair 2

Dresser 4

Stand 7

RECORD NO. PRODUCT NO. DESCRIPTION FINISH PRICE

1 0100 Bookcase Oak 75

2 0975 Chair Cherry 100

3 1250 Chair Maple 125

4 0350 Dresser Maple 625

5 1425 Dresser Oak 800

6 1775 Dresser Pine 1200

7 1000 Stand Pine 750

PRODUCTTABLE

DESCRIPTION INDEX

(Clustered)

8/12/2019 db-5

31/47

31

Trees

Most common data structure forimplementing indexes

Branching factor

Degree of a tree

Maximum number of children allowed per

parent

Depth Number of levels between the root node

and a leaf node in a tree

8/12/2019 db-5

32/47

32

Balanced Trees

Also called B-Trees

A tree in which all leaves are of the

same distance from the root

Index files are most commonlyorganized using B-trees, which have

predictable efficiency

Also support sequential retrieval ofrecords

8/12/2019 db-5

33/47

33

Using B-Trees in Indexes

uses a tree searchAverage time to find desired

record = depth of the tree

8/12/2019 db-5

34/47

34

Main Trade-Off of Using an Index

Improved performance for retrievalsversus degraded performance for

inserting, deleting, and updating

records in a table Examples

Decision Support Systems (DSS)

Transaction Processing Systems (TPS)

8/12/2019 db-5

35/47

35

When to Use Indexes

Specify a unique index for the primarykey attribute of each table

In most situations, it is also advisable

to specify an index for foreign keys Specify an index for non-key attributes

that are referred to in qualification,

sorting, and grouping commands

8/12/2019 db-5

36/47

36

When to Use Indexes

Index search fields

Index only large tables (when there are

>100 values but not when there are

8/12/2019 db-5

37/47

8/12/2019 db-5

38/47

38

Referential Integrity

Considers the validity of referencesbetween objects in a database

The value of a foreign key in one table

(referencing table) must be an actualvalue of a primary key in some other

table (referenced table), or else it must

be null, if allowed

8/12/2019 db-5

39/47

39

Referential Integrity Rules

Insertion Rule A row cannot be inserted in the

referencing table unless a matching entry

already exists in the referenced table If insertion is allowed even without a

matching entry in the referenced table, a

null value is used for the foreign key in

the referencing table

8/12/2019 db-5

40/47

40

Referential Integrity Rules

Deletion Rule A row cannot be deleted from the

referenced table if there are matching

rows in the referencing table Three applicable rules

Restrict

Nullify Cascade

8/12/2019 db-5

41/47

41

Delete Rules

Restrict Deletion is not allowed

Nullify

Foreign key values changed to null in thereferencing table before correspondingrow in the referenced table is deleted

Cascade

Affected rows in the referencing table aredeleted first before matching row in thereferenced table is deleted

8/12/2019 db-5

42/47

42

Enforcing Referential Integrity

Enforcing referential integrity inapplication programs

Unreliable -- may be handled differently

in separate programs and cause conflicts Enforcing referential integrity

constraints within the DBMS

Consistent enforcement of rules Makes programming and maintenance

easier

8/12/2019 db-5

43/47

43

Denormalization

Database may not always beimplemented in normalized form

Used to speed up data access

Reduces number of tables that mustbe accessed to retrieve data

No hard and fast rules

8/12/2019 db-5

44/47

44

Denormalization

Situations to consider denormalization One-to-one relationship between two

entities

Many-to-many relationship with non-keyattributes

Reference data

8/12/2019 db-5

45/47

45

Denormalization of One-to-One

STUDENT

Student_ID

NameAddress

SCHOLARSHIP

APPLICATIONhas

Application_ID

Application_Date Status

Student_ID

Denormalized relation:

STUDENT (Student_ID, Name, Address, Application_Date, Status)

8/12/2019 db-5

46/47

46

Denormalization to Many-to-Many

VENDOR

Vendor_ID

Vendor_Name Address

submits PRICE

QUOTE

Vendor_ID

given for ITEM

Item_ID

Description

Item_ID

Price

Denormalized relations:

VENDOR (Vendor_ID, Vendor_Name, Address)

ITEM_QUOTE (Vendor_ID, Item_ID, Description, Price)

8/12/2019 db-5

47/47

47

Denormalization of Reference Data

STORAGE

Storage_ID

Container_No Cabinet_No

stores ITEM

Item_ID

Description

Denormalized relation:

STORAGE (Item_ID, Description, Container_No, Cabinet_No)

Storage_ID

db-5

Documents