01~lecture notes

Upload: amit-kumar

Post on 14-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 01~Lecture Notes

    1/66

    Database Modeling and Design: Logical Design

    4th Edition

    Toby Teorey, Sam Lightstone, Tom Nadeau

    Lecture Notes

    Contents

    I. Introduction ...................................................................2Relational database life cycle 3Characteristics of a good database design process 6

    II. The Entity-Relationship (ER) Model ....7

    Basic ER concepts 7Ternary relationships 11

    III. The Unified Modeling Language (UML)....13Class diagrams 13Activity diagrams 19Rules of thumb for UML 21

    IV. Requirements Analysis and Conceptual Data Modeling...22Requirements analysis 22

    Conceptual data modeling 24View integration methods 25Entity Clustering 30

    V. Transforming the Conceptual Model to SQL...32

    VI. Normalization and normal forms 38First normal form to third normal form (3NF) and BCNF 383NF synthesis algorithm (Bernstein) 43

    VII. An Example of Logical Database Design48

    VIII. Business Intelligence.......52Data warehousing 52On-line analytical processing (OLAP) 58

    IX. CASE Tools for Logical Database Design.60

    1

  • 7/27/2019 01~Lecture Notes

    2/66

    I. Introduction

    Introductory Concepts

    dataa fact, something upon which an inference is based (information or knowledge has

    value, data has cost)

    data itemsmallest named unit of data that has meaning in the real world (examples:last name, address, ssn, political party)

    data aggregate (or group) -- a collection of related data items that form awhole concept; a simple group is a fixed collection, e.g. date (month, day, year); arepeating group is a variable length collection, e.g. a set of aliases.

    recordgroup of related data items treated as a unit by an application program(examples: presidents, elections, congresses)

    filecollection of records of a single type (examples: president, election)

    databasecollection of interrelated stored data that serves the needs of multiple userswithin one or more organizations; a collection of tables in the relational model.

    database management system (DBMS) -- a generalized software system for storingand manipulating databases. Includes logical view (schema, sub-schema), physical view(access methods, clustering), data manipulation language, data definition language,utilities - security, recovery, integrity, etc.

    database administrator (DBA) -- person or group responsible for the effective use ofdatabase technology in an organization or enterprise.

    Objectives of Database Management

    1. Data availabilitymake an integrated collection of data available to a wide variety ofusers

    * at reasonable costperformance in query update, eliminate or control dataredundancy* in meaningful formatdata definition language, data dictionary* easy accessquery language (4GL, SQL, forms, windows, menus);

    embedded SQL, etc.; utilities for editing, report generation, sorting

    2. Data integrityinsure correctness and validity* checkpoint/restart/recovery* concurrency control and multi-user updates* accounting, audit trail (financial, legal)

    3. Privacy (the goal) and security (the means)* schema/sub-schema, passwords

    2

  • 7/27/2019 01~Lecture Notes

    3/66

    4. Management controlDBA: lifecycle control, training, maintenance

    5. Data independence (a relative term) -- avoids reprogramming of applications, allowseasier conversion and reorganization

    * physical data independenceprogram unaffected by changes in the storage structure oraccess methods

    * logical data independenceprogram unaffected by changes in the schema

    * Social Security Administration example- changed benefit checks from $999.99 to $9999.99 format- had to change 600 application programs- 20,000 work hours needed to make the changes (10 work years)

    *Y2K (year 2000) problemmany systems store 2-digit years (e.g. 02-OCT-98) in

    their programs and databases, that give incorrect results when used in date arithmetic(especially subtraction), so that 00 is still interpreted as 1900 rather than 2000. Fixingthis problem requires many hours of reprogramming and database alterations for manycompanies and government agencies.

    Relational Database Lifecycle

    1. Requirements analysis* natural data relationships (process-independent)* usage requirements (process-dependent)* hardware/software platform (OS, DBMS)* performance and integrity constraints* result: requirements specification document, data dictionary entries

    2. Logical database design2.1 Conceptual data modeling2.2 View integration2.3 Transformation of the conceptual model to SQL tables2.4 Normalization of SQL tables*result: global database schema, transformed to table definitions

    3. Physical database design

    * index selection* materialized views, clustering, partitioning, denormalization*data distribution over the network

    4. Database implementation, monitoring, and modification

    3

  • 7/27/2019 01~Lecture Notes

    4/66

    4

  • 7/27/2019 01~Lecture Notes

    5/66

    5

  • 7/27/2019 01~Lecture Notes

    6/66

    6

    Figure 1.2

  • 7/27/2019 01~Lecture Notes

    7/66

    Characteristics of a Good Database Design Process

    * Iterative requirements analysis- interview top-down- use simple models for data flow and data relationships

    - verify model

    * Stepwise refinement and iterative re-design

    * Well-defined design review team and process-database designers-DBMS software group-end users in the application areas when to review

    * When to do a design review- after requirements analysis & conceptual design- after physical design

    - after implementation (tuning) meeting format

    *Design review rules- short documentation in advance- formal presentation- criticize product, not person- goal is to locate problems, do solutions off line

    7

  • 7/27/2019 01~Lecture Notes

    8/66

    II. Entity-Relationship (ER) Model

    Basic ER Modeling Concepts

    Entity - a class of real world objects having common characteristics and properties aboutwhich we wish to record information.

    Relationship - an association among two or more entities

    * occurrence - instance of a relationship is the collective instances of the related entities* degree - number of entities associated in the relationship (binary, ternary, other n-ary)* connectivity - one-to-one, one-to-many, many-to-many* existence dependency (constraint) - optional/mandatory

    Attribute - a characteristic of an entity or relationship

    * Identifier - uniquely determines an instance of an entity* Identity dependence - when a portion of an identifier is inherited from another entity* Multi-valued - same attribute having many values for one entity* Surrogate - system created and controlled unique key (e.g. Oracles create sequence)

    8

  • 7/27/2019 01~Lecture Notes

    9/66

    9

    Figure 2.1

  • 7/27/2019 01~Lecture Notes

    10/66

    Figure 2.2

    10

  • 7/27/2019 01~Lecture Notes

    11/66

    Figure 2.3

    11

  • 7/27/2019 01~Lecture Notes

    12/66

    12

    Super-class (super-type)/subclass (subtype) relationship

    Generalization

    * similarities are generalized to a super-class entity, differences are specialized to a subclass entity,

    called an ISA relationship (specialization is the inverse relationship)

    * disjointness constraint - there is no overlap among subclasses

    * completeness constraint - constrains subclasses to be all-inclusive of the super-class or not (i.e.total or partial coverage of the superclass)

    * special property: hierarchical in nature

    * special property: inheritance - subclass inherits the primary key of the super-class, super-class hascommon nonkey attributes, each subclass has specialized non-key attributes

    Aggregation

    * part-of relationship among entities to a higher type aggregate entity (contains is the inverserelationship)

    * attributes within an entity, data aggregate (mo-day-year)

    * entity clustering variation: membership or is-member-of relationship

  • 7/27/2019 01~Lecture Notes

    13/66

    Figure 2.4

    Constraints in ER modeling

    * exclusion constraint - restricts an entity to be related to only of several other

    * entities at a given point in time

    - mandatory/optional- specifies lower bound of connectivity of entity instances- participating in a relationship as 1 or 0

    * uniqueness constraint one-to-one functional dependency among key attributesin a relationship: binary, ternary, or higher n-ary

    13

  • 7/27/2019 01~Lecture Notes

    14/66

    Ternary Relationships

    Figure 2.6

    III. The Unified Modeling Language (UML)

    14

  • 7/27/2019 01~Lecture Notes

    15/66

    Class Diagrams

    Figure 3.1 Basic UML class diagram constructs

    Figure 3.2 Selected UML relationship types (parallel to Figure 2.2)

    15

    Employe

    emanaged

    manager

    1

    *

    Department Division1*

    Skill Projectskillused

    *Employee

    assignment* *

    Department Employe

    e

    manager

    11

    *

    Department Employe

    e1

    Department Employe

    e

    manager

    10..1

    Office Employe

    e

    occupant

    0..*1

    Employe

    e

    Project**

    WorkAssignment

    task-assignmentstart-date

    reflexiveassociation

    binaryassociation

    ternaryassociation

    Degree

    Multiplicitiesone-to-one

    one-to-

    many

    many-to-many

    optional

    mandatory

    Existence

  • 7/27/2019 01~Lecture Notes

    16/66

    Figure 3.3 UML generalization constructs (parallel to Figure 2.4)

    Figure 3.4 UML aggregation constructs (parallel to Figure 2.5)

    16

    Manager

    Secretary

    Employee

    Engineer

    Technician

    Individual

    Employee Customer

    EmpCust

    Completeenumeration of

    sub-classes

    Program

    ElectronicDocumentation

    SoftwareProduct

    Teacher TextBook

    Course

  • 7/27/2019 01~Lecture Notes

    17/66

    Figure 3.5 UML n-ary relationship (parallel to Figure 2.7)

    Figure 3.6 UML constructs illustrating primary keys

    Example from the Music Industry

    17

    Car

    pk vinmileage

    color

    Primary key as astereotype

    Compositionexample withprimary keys

    Invoice

    pkinv_numcustomer_idinv_date

    LineItem

    pkinv_numpkline_numdescriptionamount

    1 .. *

  • 7/27/2019 01~Lecture Notes

    18/66

    Figure 3.7 Example of related packages

    Figure 3.8 Example illustrating classes grouped into packages

    18

    Music Media Distributio

    n

    Distribution

    StudioPublisherRetailStore

    Medi

    aMusicMediaAlbumCDTrack

    Musi

    cGroupArtistComposerLyricistMusicianInstrumentSongRendition

  • 7/27/2019 01~Lecture Notes

    19/66

    Figure 3.9 Relationships between classes in the music package

    19

    StudioMusic

    Media

    Rendition

    Producer

    Album CD

    Track

    Publishe

    rGrou

    p

    Artist

  • 7/27/2019 01~Lecture Notes

    20/66

    Figure 3.10 Classes of the media package, and related classes

    20

  • 7/27/2019 01~Lecture Notes

    21/66

    Activity Diagrams

    Figure 3.11 UML activity diagram constructs

    21

  • 7/27/2019 01~Lecture Notes

    22/66

    22

    Customer Manufacturer

    Generatequote

    Requestquote

    [acceptable]

    Review quote

    [unacceptable]

    Placeorder

    Enterorder

    Produce

    order

    Ship order

    Receiveorder

    Generateinvoice

    Receiveinvoice

    Pay Recordpayment

  • 7/27/2019 01~Lecture Notes

    23/66

    Figure 3.12 UML activity diagram, manufacturing example

    23

  • 7/27/2019 01~Lecture Notes

    24/66

    Rules of Thumb for UML Usage

    1. Decide what you wish to communicate first, and then focus yourdescription. Illustrate the details that further your purpose, and elide the rest..Be concise.

    2. Keep each UML diagram to one page. Diagrams are easier to understand ifthey can be seen in one shot.

    3. Use the UML when it is useful. Don't feel compelled to write a UMLdocument just because you feel you need a UML document.

    4. Accompany your diagrams with textual descriptions..

    5. Take care to clearly organize each diagram. Avoid crossing associations. Groupelements together if there is a connection in your mind.

    24

  • 7/27/2019 01~Lecture Notes

    25/66

    IV. Requirements Analysis and Conceptual Data Modeling

    Requirements Analysis

    Purpose - identify the real-world situation in enough detailto be able to define database components. Collect two types of data: natural data (input tothe database) and processing data (output from the database).

    Natural data requirements (what goes into the database)

    1. Organizational objectives- sell more cars this year- move into to recreational vehicle market

    2. Information system objectives

    - keep track of competitors products and prices- improve quality and timing of data to management regarding production schedule delays,etc.- keep track of vital resources needed to produce and market a product

    3. Organizational structure/chart

    4. Administrative and operational policies- annual review of employees- weekly progress reports- monthly inventory check- trip expense submission

    5. Data elements, relationships, constraints, computing environment

    Processing requirements (what comes out of the database)

    1. Existing applications - manual, computerized

    2. Perceived new applications

    * quantifies how data is used by applications

    * should be a subset of data identified in the natural relationships(but may not be due to unforeseen applications)

    * problem - many future applications may be unknown

    25

  • 7/27/2019 01~Lecture Notes

    26/66

    Data and Process Dictionary Entries for Requirements Analysis

    in the Database Design Lifecycle

    Entity Description (possibly in a data dictionary)Name customerReference-no 4201

    Cardinality 10,000Growth rate 100 per monthSynonyms user, buyerRole (or description) someone who purchases or rents a

    product made by the company.Security level 0 (customer list is public)Subtypes adults, minorsKey attribute(s) cust-noNon-key attribute(s) cust-name, addr, phone, payment-status Relationship toother entities salesperson, order, productUsed in which applications billing, advertising

    Attribute description (data elements in a data dictionary)Name cust-noReference-no 4202Range of legal values 1 to 999,999Synonyms cno, customer-numberData type integerDescription customer id number set by the company.Key or nonkey keySource of data table of allowable id numbersUsed in applications billingAttribute trigger /*describes actions that occur when a

    data element is queried or updated*/

    Relationship descriptionName purchaseReference-no 511037Degree binaryEntities and connectivity customer(0,n), product(1,n)Synonyms buyAttributes (of the relationship) quantity, order-noAssertions a customer must have purchased at

    least one product, but some productsmay not have been purchased as yet by

    any customers.

    Process (application) descriptionName payrollReference-no 163Frequency bi-weeklyPriority 10Deadline noon FridaysData elements used emp-name, emp-salary

    26

  • 7/27/2019 01~Lecture Notes

    27/66

    Entities used employeeData volume (how many entities) implicit from entity cardinality

    Interviews at different levels

    Top management - business definition, plan/objectives, future plans

    Middle management - functions in operational areas, technical areas, job-titles, job functionsEmployees - individual tasks, data needed, data outSpecific end-users of a DBMS - applications and data of interest

    Basic rules in interviewing

    1. Investigate the business first2. Agree with the interviewee on format for documentation (ERD, DFD, etc.)3. Define human tasks and known computer applications4. Develop and verify the flow diagram(s) and ER diagram(s)5. Relate applications to data (this helps your programmers)

    Example: order entry clerk

    Function: Take customer orders and either fill them or make adjustments.Frequency: daily

    Task Def Volume Data Elements

    1. Create order 2000 A, B, E, H2. Validate order 2000 A, B, G, H, J3. Fill out error form 25 A, C4. Reserve item/price 6000 A, D, H

    5. Request alternate items 75 A, E, I, K,M6. Enter unit price 5925 A, F, J, N

    Conceptual Data Modeling

    1. Classify entities and attributes* entities should contain descriptive information* multivalued attributes should be classified as entities* attributes should be attached to the entities they most directly describe

    2. Identify the generalization hierarchies

    3. Define relationships binary, unary, ternary

    27

  • 7/27/2019 01~Lecture Notes

    28/66

    View Integration Methods

    Goal in schema integration- to create a non-redundant unified (global) conceptual schema

    (1) completeness - all components must appear in the global schema

    (2) minimality - remove redundant concepts in the global schema

    (3) understandability - does global schema make sense?

    1. Comparing of schemas

    * look for correspondence (identity) among entities

    * detect possible conflicts- naming conflictshomonyms - same name for different conceptssynonyms - different names for the same concept- structural conflictstype conflicts - different modeling construct for the same concept (e. g. order as an entity,attribute, relationship)- dependency conflicts - connectivity is different for different views (e.g. job-title vs. job-title-history)- key conflicts - same concept but different keys are assigned (e.g. ID-no vs. SSN)- behavioral conflicts - different integrity constraints (e.g. null rules for optional/mandatory:insert/delete rules)

    * determine inter-schema properties- possible new relationships to combine schemas- possible abstractions on existing entities or create new super-classes (super-types)

    2. Conforming of schemas

    * resolve conflicts (often user interaction is required)

    * conform or align schemas to make compatible for integration

    * transform the schema via

    - renaming (homonyms, synonyms, key conflicts)- type transformations (type or dependency conflicts)- modify assertions (behavioral conflicts)

    28

  • 7/27/2019 01~Lecture Notes

    29/66

    3. Merging and restructuring

    * superimpose entities

    * restructure result of superimposition

    29

  • 7/27/2019 01~Lecture Notes

    30/66

    Figure 4.5

    30

  • 7/27/2019 01~Lecture Notes

    31/66

    Figure 4.6

    31

  • 7/27/2019 01~Lecture Notes

    32/66

    Figure 4.7

    32

  • 7/27/2019 01~Lecture Notes

    33/66

    Entity Clustering for ER Models

    Motivation

    * conceptual (ER) models are difficult to read and understand for large and complex databases, e.g.

    10,000 or more data elements

    * there is a need for a tool to abstract the conceptual database schema (e. g. clustering of the ERdiagram)

    * potential applications

    - end user communication

    - application design team communication

    - documentation of the database conceptual schema (in coordination with the data dictionary)

    Clustering Methodology

    Given an extended ER diagram for a database.....

    Step 1. Define points of grouping within functional areas.

    Step 2. Form entity clusters* group entities within the same functional area* resolve conflicts by combining at a higher functional grouping

    Step 3. Form higher entity clusters.

    Step 4. Validate the cluster diagram.* check for consistency of interfaces.* end-users must concur with each level.

    33

  • 7/27/2019 01~Lecture Notes

    34/66

    Figure 4.8

    34

  • 7/27/2019 01~Lecture Notes

    35/66

    V. Transforming the Conceptual Data Model to SQL

    Tables

    * Entity directly to a SQL table

    * Many-to-many binary relationship directly to a SQL table, taking the 2 primarykeys in the 2 entities associated with this relationship as foreign keys in the new table

    * One-to-many binary relationship primary key on one side entity copied as aforeign key in the many side entitys table

    * Recursive binary relationship same rules as other binary relationships

    * Ternary relationship directly to a SQL table, taking the 3 primary keys of the 3entities associated with this relationship as foreign keys in the new table

    *Attribute of an entity directly to be an attribute of the table transformed from thisentity

    * Generalization super-class (super-type) entity directly to a SQL table

    * Generalization subclass (subtype) entity directly to a SQL table, but with theprimary key of its super-class (super-type) propagated down as a foreign key into its table

    35

    Figure 4.9

  • 7/27/2019 01~Lecture Notes

    36/66

    * Mandatory constraint (1 lower bound) on the one side of a one-to-many

    relationship the foreign key in the many side table associated with the primary keyin the one side table should be set as not null (when the lower bound is 0, nulls areallowed as the default in SQL)

    36

  • 7/27/2019 01~Lecture Notes

    37/66

    37

  • 7/27/2019 01~Lecture Notes

    38/66

    Figure 5.1

    38

  • 7/27/2019 01~Lecture Notes

    39/66

    Figure 5.3

    39

  • 7/27/2019 01~Lecture Notes

    40/66

    Figure 5.5

    40

  • 7/27/2019 01~Lecture Notes

    41/66

    Figure 5.7

    41

  • 7/27/2019 01~Lecture Notes

    42/66

    VI.Normalization

    First normal form (1NF) to third normal form (3NF) and BCNF

    Goals of normalization1. Integrity2. Maintainability

    Side effects of normalization* Reduced storage space required (usually, but it could increase)* Simpler queries (sometimes, but some could be more complex)* Simpler updates (sometimes, but some could be more complex)

    First normal form (1NF) -- a table R is in 1NF iff all underlyingdomains contain only atomic values, i.e. there are no repeating groups in

    a row.

    functional dependencygiven a table R, a set of attributes B is functionally dependent onanother set of attributes A if at each instant of time each A value is associated with only one Bvalue. This is denoted by A -> B. A trivial FD is of the form XY --> X (subset).

    super-key -- a set of one or more attributes, which, when taken collectively, allows us to identifyuniquely an entity or table.

    candidate keyany subset of the attributes of a super-key that is also a super-key, but notreducible.

    primary key -- arbitrarily selected from the set of candidate keys, as needed for indexing.

    Third normal form (3NF)A table is in 3NF if, for every nontrivial FD X --> A, either:

    (1) attribute X is a super-key, or(2) attribute A is a member of a candidate key (prime attribute)

    Boyce-Codd normal form (BCNF)A table is in BCNF if, for every nontrivial FD X --> A,

    (1) attribute X is a super-key.

    42

  • 7/27/2019 01~Lecture Notes

    43/66

    First Normal Form (single table)

    TABLE SUPPLIER_PART (100k rows, 73 bytes/row => 7.3 MB)SNUM SNAME STATUS CITY PNUM PNAME WT QTY SHIPDATES1 SMITH 20 LONDON P1 NUT 12 3 1-4-90S1 SMITH 20 LONDON P2 BOLT 22 2 2-17-90S1 SMITH 20 LONDON P3 WRENCH 27 6 11-5-89S1 SMITH 20 LONDON P4 WRENCH 24 2 6-30-91S1 SMITH 20 LONDON P5 CLAMP 22 1 8-12-91S1 SMITH 20 LONDON P6 LEVEL 19 5 4-21-91S2 JONES 10 PARIS P1 NUT 12 3 5-3-90S2 JONES 10 PARIS P2 BOLT 22 4 12-31-90S3 BLAKE 10 PARIS P3 WRENCH 27 4 3-25-91S3 BLAKE 10 PARIS P5 CLAMP 22 2 3-27-91S4 CLARK 20 LONDON P2 BOLT 22 2 10-31-89

    S4 CLARK 20 LONDON P4 WRENCH 24 3 7-14-90S4 CLARK 20 LONDON P5 CLAMP 22 7 8-20-90S5 ADAMS 30 ATHENS P5 CLAMP 22 5 8-11-91

    Functional dependenciesSNUM --> SNAME, STATUS,CITYCITY --> STATUSPNUM --> PNAME, WTSNUM,PNUM,SHIPDATE --> QTY

    Attribute sizes (bytes)

    SNUM 5 PNAME 10SNAME 20 WT 5STATUS 2 QTY 5CITY 10 SHIPDATE 8PNUM 8 Total size 73

    Third Normal Form

    TABLE PART (100 rows, 23 bytes/row => 2.3 KB)PNUM PNAME WT Functional dependenciesP1 NUT 12 PNUM --> PNAME, WTP2 BOLT 17

    P3 WRENCH 17P4 WRENCH 24P5 CLAMP 12P6 LEVEL 19

    TABLE SHIPMENT (100k rows, 26 bytes/row => 2.6 MB)SNUM PNUM QTY SHIPDATE Functional dependency

    43

  • 7/27/2019 01~Lecture Notes

    44/66

    S1 P1 3 1-4-90 SNUM, PNUM, SHIPDATE--> QTYS1 P2 2 2-17-90S1 P3 6 11-5-89S1 P4 2 6-30-90S1 P5 1 8-12-91S1 P6 5 4-21-91S2 P1 3 5-3-90S2 P2 4 12-31-90S3 P3 4 3-25-91S3 P5 2 3-27-91S4 P2 2 10-31-89S4 P4 3 7-14-90S4 P5 7 8-20-90S5 P5 5 8-11-91

    NOT Third Normal Form

    TABLE SUPPLIER (200 rows, 37 bytes/row => 7.4 KB)SNUM SNAME STATUS CITY Functional dependenciesS1 SMITH 20 LONDON SNUM --> SNAME, STATUS, CITYS2 JONES 10 PARIS CITY --> STATUSS3 BLAKE 10 PARISS4 CLARK 20 LONDONS5 ADAMS 30 ATHENSDecomposition of Table Supplier into two Third Normal Form (3NF) Tables

    44

  • 7/27/2019 01~Lecture Notes

    45/66

    Third Normal Form

    TABLE SUPPLIER_W/O_STATUS (200 rows, 35 bytes/row => 7 KB)SNUM SNAME CITY Functional dependencyS1 SMITH LONDON SNUM --> SNAME, CITY

    S2 JONES PARISS3 BLAKE PARISS4 CLARK LONDONS5 ADAMS ATHENS

    TABLE CITY_AND_STATUS (100 rows, 12 bytes/row => 1.2 KB)CITY STATUS Functional dependencyLONDON 20 CITY --> STATUSPARIS 10ATHENS 30

    In general, the FDs can be derived from1. Explicit assertions given2. ER diagram (implied by ER constructs)3. Intuition (your experience with the problem data)

    45

  • 7/27/2019 01~Lecture Notes

    46/66

    Functional Dependency Inference rules

    (Armstrongs Axioms)

    1. ReflexivityIf Y is a subset of the attributes of X, then X->Y.

    X = ABCD, Y = ABC => X->YX->X trivial case

    2. AugmentationIf X->Y and Z is a subset of table R (i.e. Z is any set of attributes in R), then XZ -> YZ .

    3. TransitivityIf X->Y and Y->Z then X->Z.

    4. Pseudo-transitivityIf X->Y and YW->Z then XW->Z.(transitivity is a special case of pseudo-transitivity when W is null)

    5. UnionIf X->Y and X->Z then X->YZ.

    6. DecompositionIf X->YZ then X->Y and X->Z.

    Superkey Rule 1.Any FD involving all attributes of a table definesa super-key on the LHS of the FD.

    Given: any FD containing all attributes in the table R(W,X,Y,Z), i.e. XY -> WZ.Proof:(1) XY -> WZ given(2) XY -> XY by the reflexivity axiom(3) XY -> XYWZ by the union axiom(4) XY uniquely determines every attribute in table R, as shown in (3)(5) XY uniquely defines table R, by the definition of a table as having no duplicate rows(6) XY is therefore a super-key, by the definition of a super-key.

    Super-key Rule 2.Any attribute that functionally determines aSuper-key of a table, is also a super-key for that table.

    Given: Attribute A is a super-key for table R(A,B,C,D,E), and E -> A.Proof:(1) Attribute A uniquely defines each row in table R, by the def. of a super-key(2) A -> ABCDE by the definition of a super-key and a relational table(3) E -> A given(4) E -> ABCDE by the transitivity axiom(5) E is a super-key for table R, by the definition of a super-key.

    46

  • 7/27/2019 01~Lecture Notes

    47/66

    3NF Synthesis Algorithm (Bernstein)

    Basic definitions

    g e H set of FDs

    H+ closure of H - set of all FDs derivable from H using all the FD inference rules

    H cover of H - any set of FDs from which every FD in H+ canbe derived

    H(non-redundant) non-redundant cover of H, i.e. a cover which contains no proper subset

    which is also a cover. Can be determined with quadratic complexity O(n2).

    ExampleGiven a set of FDs H, determine a minimal set of tables in 3NF,while preserving all FDs and maintaining only lossless decomposition/joins.

    H: AB->C DM->NP D->KLA->DEFG D->ME->G L->DF->DJ PR->SG->DI PQR->ST

    Step 1: Eliminate any extraneous attributes in the left handsides of the FDs. We want to reduce the left hand sides of as many FDs as possible.In general: XY->Z and X->Z => Y is extraneous (Reduction Rule 1)

    XYZ->W and X->Y => Y is extraneous (Reduction Rule 2)For this example we mix left side reduction with the union and decomposition axioms:

    DM->NP => D->NP => D -> MNPD->M D->M

    PQR->ST => PQR->S, PQR->T => PQR->.TPR->S PR->S PR->S

    Step 2: Find a non-redundant cover H of H, i.e. eliminate any FDderivable from others in H using the inference rules (most frequently the transitivity axiom).

    A->E->G => eliminate A->G from the coverA->F->D => eliminate A->D from the cover

    Step 3: Partition H into tables such that all FDs with the

    same left side are in one table, thus eliminating any non-fully functional FDs. (Note: creatingtables at this point would be a feasible solution for 3NF, but not necessarily minimal.)

    R1: AB->C R4: G->DI R7: L->DR2: A->EF R5: F->DJ R8: PQR->TR3: E->G R6: D->KLMNPR9: PR->S

    Step 4: Merge equivalent keys, i.e. merge tables where all FDs satisfy 3NF.

    47

  • 7/27/2019 01~Lecture Notes

    48/66

    4.1 Write out the closure of all LHS attributes resulting from Step 3, based on transitivities.

    4.2 Using the closures, find tables that are subsets of other groups and try to merge them. Use Rule1 and Rule 2 to establish if the merge will result in FDs with super-keys on the LHS. If not, tryusing the axioms to modify the FDs to fit the definition of super-keys.

    4.3 After the subsets are exhausted, look for any overlaps among tables and apply Rules 1 and 2(and the axioms) again.

    In this example, note that R7 (L->D) has a subset of the attributes of R6 (D->KLMNP). Thereforewe merge to a single table with FDs D->KLMNP, L->D because it satisfies 3NF: D is a super-key by Rule 1 and L is a super-key by Rule 2.

    Final 3NF (and BCNF) table attributes, FDs, and candidate keys:R1: ABC (AB->C with key AB) R5: DFJ (F->DJ with key F)R2: AEF (A->EF with key A) R6: DKLMNP (D->KLMNP, L->D, w/keys D, L)

    R3: EG (E->G with key E) R7: PQRT (PQR->T with key PQR)R4: DGI (G->DI with key G) R8: PRS (PR->S with key PR)

    Step 4a.Check to see whether all tables are also BCNF. For any table that is not BCNF, add theappropriate partially redundant table to eliminate the delete anomaly.

    48

  • 7/27/2019 01~Lecture Notes

    49/66

    Maiers Example using 3NF Synthesis

    [Maier,D. The Theory of Relational Databases, Computer Science Press, 1983]

    R = {A,B,C,D,E,F,G,H,I,J,K }

    Functional dependencies (FDs):(1) E --> A B C D F G H I J K (7) H I --> J(2) A B C --> E D F G H I J K (8) I J --> H(3) A B D --> E C F G H I J K (9) H J --> I(4) G --> H I J(5) C F --> K(6) D F --> K

    Step 1 - No reduction of determinants necessary.Step 2 - Find non-redundant cover.

    (4) G->HIJ => eliminate HIJ from (1), (2), and (3)(7) HI->J => reduce (4) to G->HI, eliminating J from (4)(5) CF -> K => eliminate K from (1) and (3)(6) DF->K => eliminate K from (2)(1) E->DFG => eliminate DFG from (2)(1) E->CFG => eliminate CFG from (3)

    Step 3 - Partition into groups with the same left side.G1: E->ABCDFG G6: DF->KG2: ABC->E G7: HI->JG3: ABD->E G8: IJ->HG4: G->HI G9: HJ->IG5: CF->K

    Step 4 - Merge equivalent keys, forming new groups. Construct final set of tables, attributes, FDs, ancandidate keys.

    R1: ABCDEFG ( E->ABCDFG, ABC->E, ABD->E with keys E, ABC, ABD)R2: GHI (G->HI with key G)R3: CFK (CF->K with key CF)R4: DFK (DF->K with key DFR5: HIJ (HI->J, IJ->H, HJ->I with keys HI, IJ, HJ)

    49

  • 7/27/2019 01~Lecture Notes

    50/66

    Example of a 3NF table that is not BCNF,

    i.e. it has further anomalies:

    S = student, C = course, I = instructor

    SC -> I For each course, each student is taught by only one instructor. A course may be taught by mthan one instructor.

    I -> C Each instructor teaches only one course.

    This table is 3NF with a candidate key SC:SCI student course instructor

    Sutton Math Von NeumannSutton Journalism MurrowNiven Math Von NeumannNiven Physics Fermi

    Wilson Physics Einstein

    Delete anomaly: If Sutton drops Journalism, then we have no record of Murrow teaching JournalismHow can we decompose this table into BCNF?

    Decomposition 1 (bad)........eliminates the delete anomalySC (no FDs) and I -> C (two tables)Problems - 1. lossy join

    2. dependency SC -> I is not preserved

    SC student course IC instructor courseSutton Math Von Neumann Math

    Sutton Journalism Murrow JournalismNiven Math Fermi PhysicsNiven Physics Einstein PhysicsWilson Physics

    ----------------join SC and IC ------------------SCI student course instructor

    Sutton Math Von NeumannSutton Journalism MurrowNiven Math Von NeumannNiven Physics Fermi

    Niven Physics Einstein (spurious row)Wilson Physics Fermi (spurious row)Wilson Physics Einstein

    50

  • 7/27/2019 01~Lecture Notes

    51/66

    Decomposition 2 (better).....eliminates the delete anomalySI (no FD) and I -> CAdvantages eliminates the delete anomaly, losslessDisadvantage - dependency SC -> I is not preserved

    SI student instructor IC instructor courseSutton Von Neumann Von Neumann MathSutton Murrow Murrow JournalismNiven Von Neumann Fermi PhysicsNiven Fermi Einstein PhysicsWilson Einstein Dantzig Math (new)Sutton Dantzig (new)

    The new row is allowed in SI using unique(student,instructor) in the create table command, and thejoin of SI and IC is lossless. However, a join of SI and IC now produces the following two rows:

    student course instructorSutton Math Von Neumann

    Sutton Math Dantzig which violates the FD SC -> I.

    Oracle, for instance, has no way to automatically check SC->I, although you could write a proceduredo this at the expense of a lot of overhead.

    Decomposition 3 (tradeoff between integrity and performance)SC -> I and I -> C (two tables with redundant data)Problems -extra updates and storage cost

    51

  • 7/27/2019 01~Lecture Notes

    52/66

    VII. An Example of Logical Database Design

    Requirements Specification

    The management of a large retail store would like a database to keep track of salesactivities. The requirements analysis for this database led to the following six entities andtheir unique identifiers:

    Entity Entity key Key length(max) No. ofin characters occurrences

    Customer cust-no 6 80,000Job job-no 24 80Order order-no 9 200,000Salesperson sales-id 20 150

    Department dept-no 2 10Item item-no 6 5,000

    The following assertions describe the data relationships:

    Each customer has one job-title, but different customers may have the same job-title. Each customer may place many orders, but only one customer may place a particularorder. Each department has many salespeople, but each salesperson must work in only onedepartment. Each department has many items for sale, but each item is sold in only one

    department. (Item means item type, like IBM PC). For each order, items ordered in different departments must involve different

    salespeople, but all items ordered within one department must be handled byexactly one salesperson. In other words, for each order, each item has exactly onesalesperson; and for each order, each department has exactly one salesperson.

    ER Construct FDs

    Customer(many):Job(one) cust-no -> job-titleOrder(many): Customer(one) order-no -> cust-noSalesperson(many): Department(one) sales-id -> dept-noItem(many): Department(one) item-no -> dept-noOrder(many): Item(many): order-no,item-no->sales-id

    Salesperson(one)Order(many): Department(many): order-no,dept-no-> sales-id

    Salesperson(one)

    52

  • 7/27/2019 01~Lecture Notes

    53/66

    Figure 7.1

    create table customer(cust_no char(6),job_title varchar(256),

    primary key (cust_no),foreign key (job_title) referencesjobon delete set null on update cascade);

    create tablejob(job_no char(6),job_title varchar(256),primary key (job_no));

    create table order(order_no char(9),cust_no char(6) not null,primary key (order_no),foreign key (cust_no) references customer

    on delete set null on update cascade);

    create table salesperson(sales_id char(10)sales_name varchar(256),dept_no char(2),primary key (sales_id),

    53

  • 7/27/2019 01~Lecture Notes

    54/66

    foreign key (dept_no) references departmenton delete set null on update cascade);

    create table department(dept_no char(2),

    dept_name varchar(256),manager_name varchar(256),primary key (dept_no));

    create table item(item_no char(6),dept_no char(2),primary key (item_no),foreign key (dept_no) references department

    on delete set null on update cascade);

    create table order_item_sales(order_no char(9),item_no char(6),sales_id varchar(256) not null,primary key (order_no, item_no),foreign key (order_no) references order

    on delete cascade on update cascade,foreign key (item_no) references item

    on delete cascade on update cascade,foreign key (sales_id) references salesperson

    on delete cascade on update cascade);

    create table order_dept_sales(order_no char(9),dept_no char(2),sales_id varchar(256) not null,primary key (order_no, dept_no),foreign key (order_no) references order

    on delete cascade on update cascade,foreign key (dept_no) references department

    on delete cascade on update cascade,foreign key (sales_id) references salesperson

    on delete cascade on update cascade);

    Table Primary key Likely non-keys

    customer cust_no job_title, cust_name, cust_addressorder order_no cust_no, item_no, date_of_purchase,pricesalesperson sales_id dept_no, sales_name, phone_noitem item_no dept_no, color, model_no

    54

  • 7/27/2019 01~Lecture Notes

    55/66

    order_item_sales order_no,item_no sales_idorder_dept_sales order_no,dept_no sales_id

    55

  • 7/27/2019 01~Lecture Notes

    56/66

    56

  • 7/27/2019 01~Lecture Notes

    57/66

  • 7/27/2019 01~Lecture Notes

    58/66

  • 7/27/2019 01~Lecture Notes

    59/66

    Figure 8.10 Star schema for the productivity tracking process

    Table 8.2 Data warehouse bus for widget example

    Shape

    Color

    Texture

    Density

    Size

    EstimateDate

    WinDate

    Customer

    Promotion

    CostCenter

    SchedStartDate

    SchedStartTime

    SchedFinishDate

    SchedFinishTime

    ActualStartDate

    ActualStartTime

    ActualFinishDate

    ActualFinishTime

    Employee

    InvoiceDate

    Estimating x x x x x x x x x x

    Scheduling x x x x x x x x x

    Productivity Tracking x x x x x x

    Job Costing x x x x x x x x x

    59

    dd job numberfk cost center idfk employee idfk actual start date id

    fk actual start time idfk actual finish date idfk actual finish time idwidget quantityfinished on timeestimated hoursactual hours

    Productivity Detail

    Employee

    Cost CenterActual Start Date

    Actual Start Time

    Actual Finish Date

    Actual Finish Time

  • 7/27/2019 01~Lecture Notes

    60/66

    Figure 8.11 Star schema for the job costing process

    Figure 8.12 Schema for the job costing daily snapshot

    60

    fk shape idfk color idfk texture idfk density idfk size id

    fk estimate date iddd estimate numberfk win date iddd job numberfk customer idfk promotion idfk cost center idfk invoice date idwidget quantityestimated hourshourly rateestimated costmarkupdiscountpriceactual hoursactual cost

    Job Costing DetailShape Color

    Texture Density

    Size

    Customer

    Promotion Cost Center

    EstimateDate

    Win Date

    InvoiceDate

    fk invoice date idwidget quantityestimated hoursestimated cost

    priceactual hoursactual cost

    Job Costing Daily Snapshot

    InvoiceDate

  • 7/27/2019 01~Lecture Notes

    61/66

    On-Line Analytical Processing (OLAP)

    Figure 8.13 Product graph labeled with aggregation level coordinates

    If we express Equation 8.1 in different terms, the problem of exponential explosionbecomes more apparent. Let g be the geometric mean of the number of hierarchical levelsin the dimensions. Then Equation 8.1 becomes Equation 8.2.

    Possible views = gd (8.2)

    61

    d

    Possible views = h i (8.1)i=1

    Calendar Dimension(first ordinate)

    0: date id1: month2: quarter3: year4: all

    Customer Dimension(second ordinate)

    0: cust id1: city2: state3: all

    (0, 0)

    (1, 0) (0, 1)

    (1, 1) (0, 2)

    (1, 2) (0, 3)

    (1, 3)

    (2, 0)

    (2, 1)

    (2, 2)

    (2, 3)

    (3, 0)

    (3, 1)

    (3, 2)

    (3, 3)

    (4, 0)

    (4, 1)

    (4, 2)

    (4, 3)

    Fact Table

    P=0.9

    0.9 0.9

    0.9 0.9 0.9 0.9

    0.1

    0.1

    0.1

    0.1

    0.1 0.1 0.1

    0.7290.0810.081 0.0810.0090.009 0.0090.001

    a = 2 leftsP

    2= 0.009

    C = 3 bins

    a = 1 leftP

    1= 0.081

    C = 3 bins

    a = 0 leftsP

    0= 0.729

    C = 1 bin32

    31

    30

    a = 3 lefts

    P3 = 0.001

    C = 1 bin33

  • 7/27/2019 01~Lecture Notes

    62/66

    Selection of Materialized Views

    Figure 8.16 Example hypercube lattice structure

    Table 8.3 Two iterations of HRU, based on Figure 8.16

    Table 8.4 First iteration of PGA, based on Figure 8.16

    Table 8.5 Second iteration of PGA, based on Figure 8.16

    62

    5.2M x 4 = 20.8M0 x 4 = 00 x 4 = 0

    5.99M x 2 = 11.98M5.8M x 2 = 11.6M5.9M x 2 = 11.8M

    6M - 1

    0 x 2 = 00 x 2 = 0

    0.79M x 2 = 1.58M0.6M x 2 = 1.2M

    5.9M x 2 = 11.8M

    0.8M - 1

    {p, s}{c, s}{c, p}

    {s}{p}{c}

    {}

    Iteration 1 Benefit Iteration 2 Benefit

    c = Customer

    p = Parts = Supplier

    {c, p, s} 6M

    {p, s} 0.8M {c, s} 6M {c, p} 6M

    {s} 0.01M {p} 0.2M {c} 0.1M

    {} 1

    Fact Table

    Candidates

    {p, s}{s}{}

    Iteration 1 Benefit

    5.2M x 4 = 20.8M5.99M x 2 = 11.98M

    6M - 1

    Candidates

    0 x 2 = 00.79M x 2 = 1.58M5.9M x 2 = 11.8M

    6M - 1

    {c, s}{s}{c}{}

    Iteration 2 Benefit

  • 7/27/2019 01~Lecture Notes

    63/66

    63

  • 7/27/2019 01~Lecture Notes

    64/66

    IX. CASE Tools for Logical Database Design

    Software tools provide significant value to database designers by

    a) Dramatically reduced the complexity of conceptual and logical design, both

    of which can be rather subtle to do well.

    b) Automating the transformation of the logical design to a physical design (at

    least the basic physical design).) to create the physical database.

    c) Providing reporting, round trip engineering and reverse engineering that

    make such tools invaluable in maintaining systems over a long period of

    time.

    Key Capabilities to Watch For

    Complete round trip engineering

    UML design

    Schema evolution, Change management

    Reverse engineering of existing systems

    Team support, allowing multiple people to work on the same project concurrently.

    Integrated with Eclipse and .NET and other tooling products

    Component and convention re-use (being able to reuse naming standard, domain,

    logical models over multiple design projects)

    Reusable assets (extensibility, template, ...)

    Reporting

    Basic Transformations of the Conceptual Model to SQL Tables

    Transform each entity into a table containing the key and nonkey attributesof the entity.

    Transform every many-to-many binary or binary recursive relationship into

    a relationship table with the keys of the entities and the attributes of the

    relationship.

    64

  • 7/27/2019 01~Lecture Notes

    65/66

    Transform every ternary or higher-level n-ary relationship into a relationship

    table.

    Similarly these tools produce the transformation table types described in Chapter 5:

    An entity table with the same information content as the original entity.

    An entity table with the embedded foreign key of the parent entity.

    A relationship table with the foreign keys of all the entities in the

    relationship.

    65

  • 7/27/2019 01~Lecture Notes

    66/66

    Figure 9.3 Rational Data Architect entity relationship modeling (courtesy IBM)

    Generating a database from a design

    Database support

    Collaborative support

    1. Concurrency control.

    2. Merge and collaboration capabilities that allow designers to combine designs

    or merge their latest changes into a larger design project.

    Distributed development

    Application life cycle tooling integration

    Design compliance checking

    Design and Normalization

    Discover 1st 2nd 3rd normalization

    Index and Storage

    check for excessive indexing

    Naming Standards

    Security Compliance

    Sarbanes-Oxley compliance

    Valid data model and rules

    Model syntax checks

    Reporting

    Modeling a Data Warehouse

    Semi-Structured data, XML

    ##