db-5
TRANSCRIPT
-
8/12/2019 db-5
1/47
3: Database Systems
Part V: Physical Database
Design
-
8/12/2019 db-5
2/47
2
Physical Database Design
The process of mapping the logicaldata model into an internal set of
physical database structures
Major consideration: Can the user get the desired information,
in the appropriate format, and in a timely
(i.e. acceptable response time) fashion?
-
8/12/2019 db-5
3/47
3
Objectives of Physical DatabaseDesign
Implement the database as a set ofstored records, files, indexes, etc.
Provide adequate performance
Ensure database integrity, security,
and recoverability
-
8/12/2019 db-5
4/47
-
8/12/2019 db-5
5/47
5
Components of Physical Design
Data volume and usage analysis
Data distribution strategy
File organization
Indexes
Integrity constraints
-
8/12/2019 db-5
6/47
6
Data Volume and Usage Analysis
Database size
Used to select physical storage devices
and estimate cost of storage
Usage paths Used to select file organization and
access methods
Plan for use of indexes Strategy for data distribution
-
8/12/2019 db-5
7/47
7
Composite Usage Map
-
8/12/2019 db-5
8/47
8
Composite Usage Map
Data volumes
-
8/12/2019 db-5
9/47
9
Composite Usage Map
Access
Frequencies (per
hour)
-
8/12/2019 db-5
10/47
10
Composite Usage Map
Usage analysis:200 purchased parts
accessed per hour
80 quotations accessed
from these 200 purchased
part accesses
70 suppliers accessed from
these 80 quotation
accesses
-
8/12/2019 db-5
11/47
11
Composite Usage Map
Usage analysis:75 suppliers accessed per
hour
40 quotations accessed
from these 75 supplier
accesses 40 purchased parts
accessed from these 40
quotation accesses
-
8/12/2019 db-5
12/47
12
Data Distribution Strategies
Different approaches to determine atwhich nodes or sites to physicallylocate the data in a distributed
computing network Four strategies
Centralized
Partitioned
Replicated
Hybrid
-
8/12/2019 db-5
13/47
13
Centralized
All data are located at a single site
Advantage
Simple implementation
Disadvantages
Data not readily accessible to remote
users
Expensive data communication costs
When central system crashes, entire
database system fails
-
8/12/2019 db-5
14/47
14
Partitioned
Database is divided into non-overlapping partitions or fragments
which are assigned to particular sites
Advantage Data is more accessible to local user
Disadvantage
More complex implementation
-
8/12/2019 db-5
15/47
15
Replicated
Duplicate copies of the entire databaseare assigned to more than one site in
the network
Advantage Maximizes local access to data
Disadvantage:
Update problems (synchronization)
-
8/12/2019 db-5
16/47
16
Hybrid
Database is partitioned into critical andnon-critical fragments
Critical fragments are stored at
multiple sites, while non-criticalfragments are only in one site
What are the advantages and
disadvantages of this approach?
-
8/12/2019 db-5
17/47
17
File Organization
How records are physically arrangedor stored on secondary storage
devices
Example Storage on hard disks, tapes, CD-ROMs,
etc.
-
8/12/2019 db-5
18/47
18
Basic File Organizations
Sequential
Indexed
Indexed sequential
Indexed non-sequential
Hashed
-
8/12/2019 db-5
19/47
19
Sequential File Organization
Records in the fileare stored in
sequence
according to a
primary key value
If notsortedAverage time to
find desired
record = n/2.
1
2
n
If sorted every insert or
delete requires
resort
-
8/12/2019 db-5
20/47
20
Indexed File Organization
An index is created that allows user tolocate individual records faster
Index
A table or other data structure used todetermine the location of rows in the
main table that satisfy some condition
-
8/12/2019 db-5
21/47
21
Indexed Sequential
Records are stored sequentially byprimary key value
Uses block index
Example:
White pages phone directory
-
8/12/2019 db-5
22/47
22
Indexed Non-Sequential
Records are stored non-sequentially
Full index is required
Example
Books in a library
-
8/12/2019 db-5
23/47
23
Hashed File Organization
A hashing algorithm is used todetermine the address of each record
Hashing algorithm
Converts a primary key value into arelative record number or file address
Example: Divide primary key value by a
prime number and use the remainder asthe storage location
-
8/12/2019 db-5
24/47
24
Selecting File Organization
Select a file organization that provides areasonable balance among the following
criteria:
Fast access for retrieval
High throughput for processing transactions
Efficient use of storage devices
Protection from failures or data loss
Minimal need for reorganization Accommodation for file growth
Security from unauthorized use
-
8/12/2019 db-5
25/47
25
Constraints in Selecting FileOrganization
Physical characteristics of secondarystorage devices
Available operating system
File management software
User needs for storing and accessing
data
-
8/12/2019 db-5
26/47
26
Indexes
Stored in main memory for fastersearching of required values
Types of index
Primary key
Non-key
Clustering
-
8/12/2019 db-5
27/47
27
Types of Indexes
Primary key Index created based on the primary key
Non-key
Index created for each desired non-keyattribute
Clustering
Speeds up retrievals by physicallyordering the file or table based on a non-
key attribute
-
8/12/2019 db-5
28/47
28
Clustering Indexes
Clustering attribute Any non-key attribute used to group
together rows that have a common value
for the attribute Clustering index
Index defined on the clustering attribute
of a table
-
8/12/2019 db-5
29/47
29
Clustering Index: An Example
DESCRIPTION RECORD NO.
Bookcase 1
Chair 3,5
Dresser 2,6,7
Stand 4
RECORD NO. PRODUCT NO. DESCRIPTION FINISH PRICE
1 0100 Bookcase Oak 75
2 0350 Dresser Maple 625
3 0975 Chair Cherry 100
4 1000 Stand Pine 750
5 1250 Chair Maple 125
6 1425 Dresser Oak 800
7 1775 Dresser Pine 1200
PRODUCTTABLE
DESCRIPTION INDEX
(Non-clustered)
-
8/12/2019 db-5
30/47
30
Clustering Index: An Example
DESCRIPTION RECORD NO.
Bookcase 1
Chair 2
Dresser 4
Stand 7
RECORD NO. PRODUCT NO. DESCRIPTION FINISH PRICE
1 0100 Bookcase Oak 75
2 0975 Chair Cherry 100
3 1250 Chair Maple 125
4 0350 Dresser Maple 625
5 1425 Dresser Oak 800
6 1775 Dresser Pine 1200
7 1000 Stand Pine 750
PRODUCTTABLE
DESCRIPTION INDEX
(Clustered)
-
8/12/2019 db-5
31/47
31
Trees
Most common data structure forimplementing indexes
Branching factor
Degree of a tree
Maximum number of children allowed per
parent
Depth Number of levels between the root node
and a leaf node in a tree
-
8/12/2019 db-5
32/47
32
Balanced Trees
Also called B-Trees
A tree in which all leaves are of the
same distance from the root
Index files are most commonlyorganized using B-trees, which have
predictable efficiency
Also support sequential retrieval ofrecords
-
8/12/2019 db-5
33/47
33
Using B-Trees in Indexes
uses a tree searchAverage time to find desired
record = depth of the tree
-
8/12/2019 db-5
34/47
34
Main Trade-Off of Using an Index
Improved performance for retrievalsversus degraded performance for
inserting, deleting, and updating
records in a table Examples
Decision Support Systems (DSS)
Transaction Processing Systems (TPS)
-
8/12/2019 db-5
35/47
35
When to Use Indexes
Specify a unique index for the primarykey attribute of each table
In most situations, it is also advisable
to specify an index for foreign keys Specify an index for non-key attributes
that are referred to in qualification,
sorting, and grouping commands
-
8/12/2019 db-5
36/47
36
When to Use Indexes
Index search fields
Index only large tables (when there are
>100 values but not when there are
-
8/12/2019 db-5
37/47
-
8/12/2019 db-5
38/47
38
Referential Integrity
Considers the validity of referencesbetween objects in a database
The value of a foreign key in one table
(referencing table) must be an actualvalue of a primary key in some other
table (referenced table), or else it must
be null, if allowed
-
8/12/2019 db-5
39/47
39
Referential Integrity Rules
Insertion Rule A row cannot be inserted in the
referencing table unless a matching entry
already exists in the referenced table If insertion is allowed even without a
matching entry in the referenced table, a
null value is used for the foreign key in
the referencing table
-
8/12/2019 db-5
40/47
40
Referential Integrity Rules
Deletion Rule A row cannot be deleted from the
referenced table if there are matching
rows in the referencing table Three applicable rules
Restrict
Nullify Cascade
-
8/12/2019 db-5
41/47
41
Delete Rules
Restrict Deletion is not allowed
Nullify
Foreign key values changed to null in thereferencing table before correspondingrow in the referenced table is deleted
Cascade
Affected rows in the referencing table aredeleted first before matching row in thereferenced table is deleted
-
8/12/2019 db-5
42/47
42
Enforcing Referential Integrity
Enforcing referential integrity inapplication programs
Unreliable -- may be handled differently
in separate programs and cause conflicts Enforcing referential integrity
constraints within the DBMS
Consistent enforcement of rules Makes programming and maintenance
easier
-
8/12/2019 db-5
43/47
43
Denormalization
Database may not always beimplemented in normalized form
Used to speed up data access
Reduces number of tables that mustbe accessed to retrieve data
No hard and fast rules
-
8/12/2019 db-5
44/47
44
Denormalization
Situations to consider denormalization One-to-one relationship between two
entities
Many-to-many relationship with non-keyattributes
Reference data
-
8/12/2019 db-5
45/47
45
Denormalization of One-to-One
STUDENT
Student_ID
NameAddress
SCHOLARSHIP
APPLICATIONhas
Application_ID
Application_Date Status
Student_ID
Denormalized relation:
STUDENT (Student_ID, Name, Address, Application_Date, Status)
-
8/12/2019 db-5
46/47
46
Denormalization to Many-to-Many
VENDOR
Vendor_ID
Vendor_Name Address
submits PRICE
QUOTE
Vendor_ID
given for ITEM
Item_ID
Description
Item_ID
Price
Denormalized relations:
VENDOR (Vendor_ID, Vendor_Name, Address)
ITEM_QUOTE (Vendor_ID, Item_ID, Description, Price)
-
8/12/2019 db-5
47/47
47
Denormalization of Reference Data
STORAGE
Storage_ID
Container_No Cabinet_No
stores ITEM
Item_ID
Description
Denormalized relation:
STORAGE (Item_ID, Description, Container_No, Cabinet_No)
Storage_ID