db-5

Upload: denny-england

Post on 03-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 db-5

    1/47

    3: Database Systems

    Part V: Physical Database

    Design

  • 8/12/2019 db-5

    2/47

    2

    Physical Database Design

    The process of mapping the logicaldata model into an internal set of

    physical database structures

    Major consideration: Can the user get the desired information,

    in the appropriate format, and in a timely

    (i.e. acceptable response time) fashion?

  • 8/12/2019 db-5

    3/47

    3

    Objectives of Physical DatabaseDesign

    Implement the database as a set ofstored records, files, indexes, etc.

    Provide adequate performance

    Ensure database integrity, security,

    and recoverability

  • 8/12/2019 db-5

    4/47

  • 8/12/2019 db-5

    5/47

    5

    Components of Physical Design

    Data volume and usage analysis

    Data distribution strategy

    File organization

    Indexes

    Integrity constraints

  • 8/12/2019 db-5

    6/47

    6

    Data Volume and Usage Analysis

    Database size

    Used to select physical storage devices

    and estimate cost of storage

    Usage paths Used to select file organization and

    access methods

    Plan for use of indexes Strategy for data distribution

  • 8/12/2019 db-5

    7/47

    7

    Composite Usage Map

  • 8/12/2019 db-5

    8/47

    8

    Composite Usage Map

    Data volumes

  • 8/12/2019 db-5

    9/47

    9

    Composite Usage Map

    Access

    Frequencies (per

    hour)

  • 8/12/2019 db-5

    10/47

    10

    Composite Usage Map

    Usage analysis:200 purchased parts

    accessed per hour

    80 quotations accessed

    from these 200 purchased

    part accesses

    70 suppliers accessed from

    these 80 quotation

    accesses

  • 8/12/2019 db-5

    11/47

    11

    Composite Usage Map

    Usage analysis:75 suppliers accessed per

    hour

    40 quotations accessed

    from these 75 supplier

    accesses 40 purchased parts

    accessed from these 40

    quotation accesses

  • 8/12/2019 db-5

    12/47

    12

    Data Distribution Strategies

    Different approaches to determine atwhich nodes or sites to physicallylocate the data in a distributed

    computing network Four strategies

    Centralized

    Partitioned

    Replicated

    Hybrid

  • 8/12/2019 db-5

    13/47

    13

    Centralized

    All data are located at a single site

    Advantage

    Simple implementation

    Disadvantages

    Data not readily accessible to remote

    users

    Expensive data communication costs

    When central system crashes, entire

    database system fails

  • 8/12/2019 db-5

    14/47

    14

    Partitioned

    Database is divided into non-overlapping partitions or fragments

    which are assigned to particular sites

    Advantage Data is more accessible to local user

    Disadvantage

    More complex implementation

  • 8/12/2019 db-5

    15/47

    15

    Replicated

    Duplicate copies of the entire databaseare assigned to more than one site in

    the network

    Advantage Maximizes local access to data

    Disadvantage:

    Update problems (synchronization)

  • 8/12/2019 db-5

    16/47

    16

    Hybrid

    Database is partitioned into critical andnon-critical fragments

    Critical fragments are stored at

    multiple sites, while non-criticalfragments are only in one site

    What are the advantages and

    disadvantages of this approach?

  • 8/12/2019 db-5

    17/47

    17

    File Organization

    How records are physically arrangedor stored on secondary storage

    devices

    Example Storage on hard disks, tapes, CD-ROMs,

    etc.

  • 8/12/2019 db-5

    18/47

    18

    Basic File Organizations

    Sequential

    Indexed

    Indexed sequential

    Indexed non-sequential

    Hashed

  • 8/12/2019 db-5

    19/47

    19

    Sequential File Organization

    Records in the fileare stored in

    sequence

    according to a

    primary key value

    If notsortedAverage time to

    find desired

    record = n/2.

    1

    2

    n

    If sorted every insert or

    delete requires

    resort

  • 8/12/2019 db-5

    20/47

    20

    Indexed File Organization

    An index is created that allows user tolocate individual records faster

    Index

    A table or other data structure used todetermine the location of rows in the

    main table that satisfy some condition

  • 8/12/2019 db-5

    21/47

    21

    Indexed Sequential

    Records are stored sequentially byprimary key value

    Uses block index

    Example:

    White pages phone directory

  • 8/12/2019 db-5

    22/47

    22

    Indexed Non-Sequential

    Records are stored non-sequentially

    Full index is required

    Example

    Books in a library

  • 8/12/2019 db-5

    23/47

    23

    Hashed File Organization

    A hashing algorithm is used todetermine the address of each record

    Hashing algorithm

    Converts a primary key value into arelative record number or file address

    Example: Divide primary key value by a

    prime number and use the remainder asthe storage location

  • 8/12/2019 db-5

    24/47

    24

    Selecting File Organization

    Select a file organization that provides areasonable balance among the following

    criteria:

    Fast access for retrieval

    High throughput for processing transactions

    Efficient use of storage devices

    Protection from failures or data loss

    Minimal need for reorganization Accommodation for file growth

    Security from unauthorized use

  • 8/12/2019 db-5

    25/47

    25

    Constraints in Selecting FileOrganization

    Physical characteristics of secondarystorage devices

    Available operating system

    File management software

    User needs for storing and accessing

    data

  • 8/12/2019 db-5

    26/47

    26

    Indexes

    Stored in main memory for fastersearching of required values

    Types of index

    Primary key

    Non-key

    Clustering

  • 8/12/2019 db-5

    27/47

    27

    Types of Indexes

    Primary key Index created based on the primary key

    Non-key

    Index created for each desired non-keyattribute

    Clustering

    Speeds up retrievals by physicallyordering the file or table based on a non-

    key attribute

  • 8/12/2019 db-5

    28/47

    28

    Clustering Indexes

    Clustering attribute Any non-key attribute used to group

    together rows that have a common value

    for the attribute Clustering index

    Index defined on the clustering attribute

    of a table

  • 8/12/2019 db-5

    29/47

    29

    Clustering Index: An Example

    DESCRIPTION RECORD NO.

    Bookcase 1

    Chair 3,5

    Dresser 2,6,7

    Stand 4

    RECORD NO. PRODUCT NO. DESCRIPTION FINISH PRICE

    1 0100 Bookcase Oak 75

    2 0350 Dresser Maple 625

    3 0975 Chair Cherry 100

    4 1000 Stand Pine 750

    5 1250 Chair Maple 125

    6 1425 Dresser Oak 800

    7 1775 Dresser Pine 1200

    PRODUCTTABLE

    DESCRIPTION INDEX

    (Non-clustered)

  • 8/12/2019 db-5

    30/47

    30

    Clustering Index: An Example

    DESCRIPTION RECORD NO.

    Bookcase 1

    Chair 2

    Dresser 4

    Stand 7

    RECORD NO. PRODUCT NO. DESCRIPTION FINISH PRICE

    1 0100 Bookcase Oak 75

    2 0975 Chair Cherry 100

    3 1250 Chair Maple 125

    4 0350 Dresser Maple 625

    5 1425 Dresser Oak 800

    6 1775 Dresser Pine 1200

    7 1000 Stand Pine 750

    PRODUCTTABLE

    DESCRIPTION INDEX

    (Clustered)

  • 8/12/2019 db-5

    31/47

    31

    Trees

    Most common data structure forimplementing indexes

    Branching factor

    Degree of a tree

    Maximum number of children allowed per

    parent

    Depth Number of levels between the root node

    and a leaf node in a tree

  • 8/12/2019 db-5

    32/47

    32

    Balanced Trees

    Also called B-Trees

    A tree in which all leaves are of the

    same distance from the root

    Index files are most commonlyorganized using B-trees, which have

    predictable efficiency

    Also support sequential retrieval ofrecords

  • 8/12/2019 db-5

    33/47

    33

    Using B-Trees in Indexes

    uses a tree searchAverage time to find desired

    record = depth of the tree

  • 8/12/2019 db-5

    34/47

    34

    Main Trade-Off of Using an Index

    Improved performance for retrievalsversus degraded performance for

    inserting, deleting, and updating

    records in a table Examples

    Decision Support Systems (DSS)

    Transaction Processing Systems (TPS)

  • 8/12/2019 db-5

    35/47

    35

    When to Use Indexes

    Specify a unique index for the primarykey attribute of each table

    In most situations, it is also advisable

    to specify an index for foreign keys Specify an index for non-key attributes

    that are referred to in qualification,

    sorting, and grouping commands

  • 8/12/2019 db-5

    36/47

    36

    When to Use Indexes

    Index search fields

    Index only large tables (when there are

    >100 values but not when there are

  • 8/12/2019 db-5

    37/47

  • 8/12/2019 db-5

    38/47

    38

    Referential Integrity

    Considers the validity of referencesbetween objects in a database

    The value of a foreign key in one table

    (referencing table) must be an actualvalue of a primary key in some other

    table (referenced table), or else it must

    be null, if allowed

  • 8/12/2019 db-5

    39/47

    39

    Referential Integrity Rules

    Insertion Rule A row cannot be inserted in the

    referencing table unless a matching entry

    already exists in the referenced table If insertion is allowed even without a

    matching entry in the referenced table, a

    null value is used for the foreign key in

    the referencing table

  • 8/12/2019 db-5

    40/47

    40

    Referential Integrity Rules

    Deletion Rule A row cannot be deleted from the

    referenced table if there are matching

    rows in the referencing table Three applicable rules

    Restrict

    Nullify Cascade

  • 8/12/2019 db-5

    41/47

    41

    Delete Rules

    Restrict Deletion is not allowed

    Nullify

    Foreign key values changed to null in thereferencing table before correspondingrow in the referenced table is deleted

    Cascade

    Affected rows in the referencing table aredeleted first before matching row in thereferenced table is deleted

  • 8/12/2019 db-5

    42/47

    42

    Enforcing Referential Integrity

    Enforcing referential integrity inapplication programs

    Unreliable -- may be handled differently

    in separate programs and cause conflicts Enforcing referential integrity

    constraints within the DBMS

    Consistent enforcement of rules Makes programming and maintenance

    easier

  • 8/12/2019 db-5

    43/47

    43

    Denormalization

    Database may not always beimplemented in normalized form

    Used to speed up data access

    Reduces number of tables that mustbe accessed to retrieve data

    No hard and fast rules

  • 8/12/2019 db-5

    44/47

    44

    Denormalization

    Situations to consider denormalization One-to-one relationship between two

    entities

    Many-to-many relationship with non-keyattributes

    Reference data

  • 8/12/2019 db-5

    45/47

    45

    Denormalization of One-to-One

    STUDENT

    Student_ID

    NameAddress

    SCHOLARSHIP

    APPLICATIONhas

    Application_ID

    Application_Date Status

    Student_ID

    Denormalized relation:

    STUDENT (Student_ID, Name, Address, Application_Date, Status)

  • 8/12/2019 db-5

    46/47

    46

    Denormalization to Many-to-Many

    VENDOR

    Vendor_ID

    Vendor_Name Address

    submits PRICE

    QUOTE

    Vendor_ID

    given for ITEM

    Item_ID

    Description

    Item_ID

    Price

    Denormalized relations:

    VENDOR (Vendor_ID, Vendor_Name, Address)

    ITEM_QUOTE (Vendor_ID, Item_ID, Description, Price)

  • 8/12/2019 db-5

    47/47

    47

    Denormalization of Reference Data

    STORAGE

    Storage_ID

    Container_No Cabinet_No

    stores ITEM

    Item_ID

    Description

    Denormalized relation:

    STORAGE (Item_ID, Description, Container_No, Cabinet_No)

    Storage_ID