01~lecture notes
TRANSCRIPT
-
7/27/2019 01~Lecture Notes
1/66
Database Modeling and Design: Logical Design
4th Edition
Toby Teorey, Sam Lightstone, Tom Nadeau
Lecture Notes
Contents
I. Introduction ...................................................................2Relational database life cycle 3Characteristics of a good database design process 6
II. The Entity-Relationship (ER) Model ....7
Basic ER concepts 7Ternary relationships 11
III. The Unified Modeling Language (UML)....13Class diagrams 13Activity diagrams 19Rules of thumb for UML 21
IV. Requirements Analysis and Conceptual Data Modeling...22Requirements analysis 22
Conceptual data modeling 24View integration methods 25Entity Clustering 30
V. Transforming the Conceptual Model to SQL...32
VI. Normalization and normal forms 38First normal form to third normal form (3NF) and BCNF 383NF synthesis algorithm (Bernstein) 43
VII. An Example of Logical Database Design48
VIII. Business Intelligence.......52Data warehousing 52On-line analytical processing (OLAP) 58
IX. CASE Tools for Logical Database Design.60
1
-
7/27/2019 01~Lecture Notes
2/66
I. Introduction
Introductory Concepts
dataa fact, something upon which an inference is based (information or knowledge has
value, data has cost)
data itemsmallest named unit of data that has meaning in the real world (examples:last name, address, ssn, political party)
data aggregate (or group) -- a collection of related data items that form awhole concept; a simple group is a fixed collection, e.g. date (month, day, year); arepeating group is a variable length collection, e.g. a set of aliases.
recordgroup of related data items treated as a unit by an application program(examples: presidents, elections, congresses)
filecollection of records of a single type (examples: president, election)
databasecollection of interrelated stored data that serves the needs of multiple userswithin one or more organizations; a collection of tables in the relational model.
database management system (DBMS) -- a generalized software system for storingand manipulating databases. Includes logical view (schema, sub-schema), physical view(access methods, clustering), data manipulation language, data definition language,utilities - security, recovery, integrity, etc.
database administrator (DBA) -- person or group responsible for the effective use ofdatabase technology in an organization or enterprise.
Objectives of Database Management
1. Data availabilitymake an integrated collection of data available to a wide variety ofusers
* at reasonable costperformance in query update, eliminate or control dataredundancy* in meaningful formatdata definition language, data dictionary* easy accessquery language (4GL, SQL, forms, windows, menus);
embedded SQL, etc.; utilities for editing, report generation, sorting
2. Data integrityinsure correctness and validity* checkpoint/restart/recovery* concurrency control and multi-user updates* accounting, audit trail (financial, legal)
3. Privacy (the goal) and security (the means)* schema/sub-schema, passwords
2
-
7/27/2019 01~Lecture Notes
3/66
4. Management controlDBA: lifecycle control, training, maintenance
5. Data independence (a relative term) -- avoids reprogramming of applications, allowseasier conversion and reorganization
* physical data independenceprogram unaffected by changes in the storage structure oraccess methods
* logical data independenceprogram unaffected by changes in the schema
* Social Security Administration example- changed benefit checks from $999.99 to $9999.99 format- had to change 600 application programs- 20,000 work hours needed to make the changes (10 work years)
*Y2K (year 2000) problemmany systems store 2-digit years (e.g. 02-OCT-98) in
their programs and databases, that give incorrect results when used in date arithmetic(especially subtraction), so that 00 is still interpreted as 1900 rather than 2000. Fixingthis problem requires many hours of reprogramming and database alterations for manycompanies and government agencies.
Relational Database Lifecycle
1. Requirements analysis* natural data relationships (process-independent)* usage requirements (process-dependent)* hardware/software platform (OS, DBMS)* performance and integrity constraints* result: requirements specification document, data dictionary entries
2. Logical database design2.1 Conceptual data modeling2.2 View integration2.3 Transformation of the conceptual model to SQL tables2.4 Normalization of SQL tables*result: global database schema, transformed to table definitions
3. Physical database design
* index selection* materialized views, clustering, partitioning, denormalization*data distribution over the network
4. Database implementation, monitoring, and modification
3
-
7/27/2019 01~Lecture Notes
4/66
4
-
7/27/2019 01~Lecture Notes
5/66
5
-
7/27/2019 01~Lecture Notes
6/66
6
Figure 1.2
-
7/27/2019 01~Lecture Notes
7/66
Characteristics of a Good Database Design Process
* Iterative requirements analysis- interview top-down- use simple models for data flow and data relationships
- verify model
* Stepwise refinement and iterative re-design
* Well-defined design review team and process-database designers-DBMS software group-end users in the application areas when to review
* When to do a design review- after requirements analysis & conceptual design- after physical design
- after implementation (tuning) meeting format
*Design review rules- short documentation in advance- formal presentation- criticize product, not person- goal is to locate problems, do solutions off line
7
-
7/27/2019 01~Lecture Notes
8/66
II. Entity-Relationship (ER) Model
Basic ER Modeling Concepts
Entity - a class of real world objects having common characteristics and properties aboutwhich we wish to record information.
Relationship - an association among two or more entities
* occurrence - instance of a relationship is the collective instances of the related entities* degree - number of entities associated in the relationship (binary, ternary, other n-ary)* connectivity - one-to-one, one-to-many, many-to-many* existence dependency (constraint) - optional/mandatory
Attribute - a characteristic of an entity or relationship
* Identifier - uniquely determines an instance of an entity* Identity dependence - when a portion of an identifier is inherited from another entity* Multi-valued - same attribute having many values for one entity* Surrogate - system created and controlled unique key (e.g. Oracles create sequence)
8
-
7/27/2019 01~Lecture Notes
9/66
9
Figure 2.1
-
7/27/2019 01~Lecture Notes
10/66
Figure 2.2
10
-
7/27/2019 01~Lecture Notes
11/66
Figure 2.3
11
-
7/27/2019 01~Lecture Notes
12/66
12
Super-class (super-type)/subclass (subtype) relationship
Generalization
* similarities are generalized to a super-class entity, differences are specialized to a subclass entity,
called an ISA relationship (specialization is the inverse relationship)
* disjointness constraint - there is no overlap among subclasses
* completeness constraint - constrains subclasses to be all-inclusive of the super-class or not (i.e.total or partial coverage of the superclass)
* special property: hierarchical in nature
* special property: inheritance - subclass inherits the primary key of the super-class, super-class hascommon nonkey attributes, each subclass has specialized non-key attributes
Aggregation
* part-of relationship among entities to a higher type aggregate entity (contains is the inverserelationship)
* attributes within an entity, data aggregate (mo-day-year)
* entity clustering variation: membership or is-member-of relationship
-
7/27/2019 01~Lecture Notes
13/66
Figure 2.4
Constraints in ER modeling
* exclusion constraint - restricts an entity to be related to only of several other
* entities at a given point in time
- mandatory/optional- specifies lower bound of connectivity of entity instances- participating in a relationship as 1 or 0
* uniqueness constraint one-to-one functional dependency among key attributesin a relationship: binary, ternary, or higher n-ary
13
-
7/27/2019 01~Lecture Notes
14/66
Ternary Relationships
Figure 2.6
III. The Unified Modeling Language (UML)
14
-
7/27/2019 01~Lecture Notes
15/66
Class Diagrams
Figure 3.1 Basic UML class diagram constructs
Figure 3.2 Selected UML relationship types (parallel to Figure 2.2)
15
Employe
emanaged
manager
1
*
Department Division1*
Skill Projectskillused
*Employee
assignment* *
Department Employe
e
manager
11
*
Department Employe
e1
Department Employe
e
manager
10..1
Office Employe
e
occupant
0..*1
Employe
e
Project**
WorkAssignment
task-assignmentstart-date
reflexiveassociation
binaryassociation
ternaryassociation
Degree
Multiplicitiesone-to-one
one-to-
many
many-to-many
optional
mandatory
Existence
-
7/27/2019 01~Lecture Notes
16/66
Figure 3.3 UML generalization constructs (parallel to Figure 2.4)
Figure 3.4 UML aggregation constructs (parallel to Figure 2.5)
16
Manager
Secretary
Employee
Engineer
Technician
Individual
Employee Customer
EmpCust
Completeenumeration of
sub-classes
Program
ElectronicDocumentation
SoftwareProduct
Teacher TextBook
Course
-
7/27/2019 01~Lecture Notes
17/66
Figure 3.5 UML n-ary relationship (parallel to Figure 2.7)
Figure 3.6 UML constructs illustrating primary keys
Example from the Music Industry
17
Car
pk vinmileage
color
Primary key as astereotype
Compositionexample withprimary keys
Invoice
pkinv_numcustomer_idinv_date
LineItem
pkinv_numpkline_numdescriptionamount
1 .. *
-
7/27/2019 01~Lecture Notes
18/66
Figure 3.7 Example of related packages
Figure 3.8 Example illustrating classes grouped into packages
18
Music Media Distributio
n
Distribution
StudioPublisherRetailStore
Medi
aMusicMediaAlbumCDTrack
Musi
cGroupArtistComposerLyricistMusicianInstrumentSongRendition
-
7/27/2019 01~Lecture Notes
19/66
Figure 3.9 Relationships between classes in the music package
19
StudioMusic
Media
Rendition
Producer
Album CD
Track
Publishe
rGrou
p
Artist
-
7/27/2019 01~Lecture Notes
20/66
Figure 3.10 Classes of the media package, and related classes
20
-
7/27/2019 01~Lecture Notes
21/66
Activity Diagrams
Figure 3.11 UML activity diagram constructs
21
-
7/27/2019 01~Lecture Notes
22/66
22
Customer Manufacturer
Generatequote
Requestquote
[acceptable]
Review quote
[unacceptable]
Placeorder
Enterorder
Produce
order
Ship order
Receiveorder
Generateinvoice
Receiveinvoice
Pay Recordpayment
-
7/27/2019 01~Lecture Notes
23/66
Figure 3.12 UML activity diagram, manufacturing example
23
-
7/27/2019 01~Lecture Notes
24/66
Rules of Thumb for UML Usage
1. Decide what you wish to communicate first, and then focus yourdescription. Illustrate the details that further your purpose, and elide the rest..Be concise.
2. Keep each UML diagram to one page. Diagrams are easier to understand ifthey can be seen in one shot.
3. Use the UML when it is useful. Don't feel compelled to write a UMLdocument just because you feel you need a UML document.
4. Accompany your diagrams with textual descriptions..
5. Take care to clearly organize each diagram. Avoid crossing associations. Groupelements together if there is a connection in your mind.
24
-
7/27/2019 01~Lecture Notes
25/66
IV. Requirements Analysis and Conceptual Data Modeling
Requirements Analysis
Purpose - identify the real-world situation in enough detailto be able to define database components. Collect two types of data: natural data (input tothe database) and processing data (output from the database).
Natural data requirements (what goes into the database)
1. Organizational objectives- sell more cars this year- move into to recreational vehicle market
2. Information system objectives
- keep track of competitors products and prices- improve quality and timing of data to management regarding production schedule delays,etc.- keep track of vital resources needed to produce and market a product
3. Organizational structure/chart
4. Administrative and operational policies- annual review of employees- weekly progress reports- monthly inventory check- trip expense submission
5. Data elements, relationships, constraints, computing environment
Processing requirements (what comes out of the database)
1. Existing applications - manual, computerized
2. Perceived new applications
* quantifies how data is used by applications
* should be a subset of data identified in the natural relationships(but may not be due to unforeseen applications)
* problem - many future applications may be unknown
25
-
7/27/2019 01~Lecture Notes
26/66
Data and Process Dictionary Entries for Requirements Analysis
in the Database Design Lifecycle
Entity Description (possibly in a data dictionary)Name customerReference-no 4201
Cardinality 10,000Growth rate 100 per monthSynonyms user, buyerRole (or description) someone who purchases or rents a
product made by the company.Security level 0 (customer list is public)Subtypes adults, minorsKey attribute(s) cust-noNon-key attribute(s) cust-name, addr, phone, payment-status Relationship toother entities salesperson, order, productUsed in which applications billing, advertising
Attribute description (data elements in a data dictionary)Name cust-noReference-no 4202Range of legal values 1 to 999,999Synonyms cno, customer-numberData type integerDescription customer id number set by the company.Key or nonkey keySource of data table of allowable id numbersUsed in applications billingAttribute trigger /*describes actions that occur when a
data element is queried or updated*/
Relationship descriptionName purchaseReference-no 511037Degree binaryEntities and connectivity customer(0,n), product(1,n)Synonyms buyAttributes (of the relationship) quantity, order-noAssertions a customer must have purchased at
least one product, but some productsmay not have been purchased as yet by
any customers.
Process (application) descriptionName payrollReference-no 163Frequency bi-weeklyPriority 10Deadline noon FridaysData elements used emp-name, emp-salary
26
-
7/27/2019 01~Lecture Notes
27/66
Entities used employeeData volume (how many entities) implicit from entity cardinality
Interviews at different levels
Top management - business definition, plan/objectives, future plans
Middle management - functions in operational areas, technical areas, job-titles, job functionsEmployees - individual tasks, data needed, data outSpecific end-users of a DBMS - applications and data of interest
Basic rules in interviewing
1. Investigate the business first2. Agree with the interviewee on format for documentation (ERD, DFD, etc.)3. Define human tasks and known computer applications4. Develop and verify the flow diagram(s) and ER diagram(s)5. Relate applications to data (this helps your programmers)
Example: order entry clerk
Function: Take customer orders and either fill them or make adjustments.Frequency: daily
Task Def Volume Data Elements
1. Create order 2000 A, B, E, H2. Validate order 2000 A, B, G, H, J3. Fill out error form 25 A, C4. Reserve item/price 6000 A, D, H
5. Request alternate items 75 A, E, I, K,M6. Enter unit price 5925 A, F, J, N
Conceptual Data Modeling
1. Classify entities and attributes* entities should contain descriptive information* multivalued attributes should be classified as entities* attributes should be attached to the entities they most directly describe
2. Identify the generalization hierarchies
3. Define relationships binary, unary, ternary
27
-
7/27/2019 01~Lecture Notes
28/66
View Integration Methods
Goal in schema integration- to create a non-redundant unified (global) conceptual schema
(1) completeness - all components must appear in the global schema
(2) minimality - remove redundant concepts in the global schema
(3) understandability - does global schema make sense?
1. Comparing of schemas
* look for correspondence (identity) among entities
* detect possible conflicts- naming conflictshomonyms - same name for different conceptssynonyms - different names for the same concept- structural conflictstype conflicts - different modeling construct for the same concept (e. g. order as an entity,attribute, relationship)- dependency conflicts - connectivity is different for different views (e.g. job-title vs. job-title-history)- key conflicts - same concept but different keys are assigned (e.g. ID-no vs. SSN)- behavioral conflicts - different integrity constraints (e.g. null rules for optional/mandatory:insert/delete rules)
* determine inter-schema properties- possible new relationships to combine schemas- possible abstractions on existing entities or create new super-classes (super-types)
2. Conforming of schemas
* resolve conflicts (often user interaction is required)
* conform or align schemas to make compatible for integration
* transform the schema via
- renaming (homonyms, synonyms, key conflicts)- type transformations (type or dependency conflicts)- modify assertions (behavioral conflicts)
28
-
7/27/2019 01~Lecture Notes
29/66
3. Merging and restructuring
* superimpose entities
* restructure result of superimposition
29
-
7/27/2019 01~Lecture Notes
30/66
Figure 4.5
30
-
7/27/2019 01~Lecture Notes
31/66
Figure 4.6
31
-
7/27/2019 01~Lecture Notes
32/66
Figure 4.7
32
-
7/27/2019 01~Lecture Notes
33/66
Entity Clustering for ER Models
Motivation
* conceptual (ER) models are difficult to read and understand for large and complex databases, e.g.
10,000 or more data elements
* there is a need for a tool to abstract the conceptual database schema (e. g. clustering of the ERdiagram)
* potential applications
- end user communication
- application design team communication
- documentation of the database conceptual schema (in coordination with the data dictionary)
Clustering Methodology
Given an extended ER diagram for a database.....
Step 1. Define points of grouping within functional areas.
Step 2. Form entity clusters* group entities within the same functional area* resolve conflicts by combining at a higher functional grouping
Step 3. Form higher entity clusters.
Step 4. Validate the cluster diagram.* check for consistency of interfaces.* end-users must concur with each level.
33
-
7/27/2019 01~Lecture Notes
34/66
Figure 4.8
34
-
7/27/2019 01~Lecture Notes
35/66
V. Transforming the Conceptual Data Model to SQL
Tables
* Entity directly to a SQL table
* Many-to-many binary relationship directly to a SQL table, taking the 2 primarykeys in the 2 entities associated with this relationship as foreign keys in the new table
* One-to-many binary relationship primary key on one side entity copied as aforeign key in the many side entitys table
* Recursive binary relationship same rules as other binary relationships
* Ternary relationship directly to a SQL table, taking the 3 primary keys of the 3entities associated with this relationship as foreign keys in the new table
*Attribute of an entity directly to be an attribute of the table transformed from thisentity
* Generalization super-class (super-type) entity directly to a SQL table
* Generalization subclass (subtype) entity directly to a SQL table, but with theprimary key of its super-class (super-type) propagated down as a foreign key into its table
35
Figure 4.9
-
7/27/2019 01~Lecture Notes
36/66
* Mandatory constraint (1 lower bound) on the one side of a one-to-many
relationship the foreign key in the many side table associated with the primary keyin the one side table should be set as not null (when the lower bound is 0, nulls areallowed as the default in SQL)
36
-
7/27/2019 01~Lecture Notes
37/66
37
-
7/27/2019 01~Lecture Notes
38/66
Figure 5.1
38
-
7/27/2019 01~Lecture Notes
39/66
Figure 5.3
39
-
7/27/2019 01~Lecture Notes
40/66
Figure 5.5
40
-
7/27/2019 01~Lecture Notes
41/66
Figure 5.7
41
-
7/27/2019 01~Lecture Notes
42/66
VI.Normalization
First normal form (1NF) to third normal form (3NF) and BCNF
Goals of normalization1. Integrity2. Maintainability
Side effects of normalization* Reduced storage space required (usually, but it could increase)* Simpler queries (sometimes, but some could be more complex)* Simpler updates (sometimes, but some could be more complex)
First normal form (1NF) -- a table R is in 1NF iff all underlyingdomains contain only atomic values, i.e. there are no repeating groups in
a row.
functional dependencygiven a table R, a set of attributes B is functionally dependent onanother set of attributes A if at each instant of time each A value is associated with only one Bvalue. This is denoted by A -> B. A trivial FD is of the form XY --> X (subset).
super-key -- a set of one or more attributes, which, when taken collectively, allows us to identifyuniquely an entity or table.
candidate keyany subset of the attributes of a super-key that is also a super-key, but notreducible.
primary key -- arbitrarily selected from the set of candidate keys, as needed for indexing.
Third normal form (3NF)A table is in 3NF if, for every nontrivial FD X --> A, either:
(1) attribute X is a super-key, or(2) attribute A is a member of a candidate key (prime attribute)
Boyce-Codd normal form (BCNF)A table is in BCNF if, for every nontrivial FD X --> A,
(1) attribute X is a super-key.
42
-
7/27/2019 01~Lecture Notes
43/66
First Normal Form (single table)
TABLE SUPPLIER_PART (100k rows, 73 bytes/row => 7.3 MB)SNUM SNAME STATUS CITY PNUM PNAME WT QTY SHIPDATES1 SMITH 20 LONDON P1 NUT 12 3 1-4-90S1 SMITH 20 LONDON P2 BOLT 22 2 2-17-90S1 SMITH 20 LONDON P3 WRENCH 27 6 11-5-89S1 SMITH 20 LONDON P4 WRENCH 24 2 6-30-91S1 SMITH 20 LONDON P5 CLAMP 22 1 8-12-91S1 SMITH 20 LONDON P6 LEVEL 19 5 4-21-91S2 JONES 10 PARIS P1 NUT 12 3 5-3-90S2 JONES 10 PARIS P2 BOLT 22 4 12-31-90S3 BLAKE 10 PARIS P3 WRENCH 27 4 3-25-91S3 BLAKE 10 PARIS P5 CLAMP 22 2 3-27-91S4 CLARK 20 LONDON P2 BOLT 22 2 10-31-89
S4 CLARK 20 LONDON P4 WRENCH 24 3 7-14-90S4 CLARK 20 LONDON P5 CLAMP 22 7 8-20-90S5 ADAMS 30 ATHENS P5 CLAMP 22 5 8-11-91
Functional dependenciesSNUM --> SNAME, STATUS,CITYCITY --> STATUSPNUM --> PNAME, WTSNUM,PNUM,SHIPDATE --> QTY
Attribute sizes (bytes)
SNUM 5 PNAME 10SNAME 20 WT 5STATUS 2 QTY 5CITY 10 SHIPDATE 8PNUM 8 Total size 73
Third Normal Form
TABLE PART (100 rows, 23 bytes/row => 2.3 KB)PNUM PNAME WT Functional dependenciesP1 NUT 12 PNUM --> PNAME, WTP2 BOLT 17
P3 WRENCH 17P4 WRENCH 24P5 CLAMP 12P6 LEVEL 19
TABLE SHIPMENT (100k rows, 26 bytes/row => 2.6 MB)SNUM PNUM QTY SHIPDATE Functional dependency
43
-
7/27/2019 01~Lecture Notes
44/66
S1 P1 3 1-4-90 SNUM, PNUM, SHIPDATE--> QTYS1 P2 2 2-17-90S1 P3 6 11-5-89S1 P4 2 6-30-90S1 P5 1 8-12-91S1 P6 5 4-21-91S2 P1 3 5-3-90S2 P2 4 12-31-90S3 P3 4 3-25-91S3 P5 2 3-27-91S4 P2 2 10-31-89S4 P4 3 7-14-90S4 P5 7 8-20-90S5 P5 5 8-11-91
NOT Third Normal Form
TABLE SUPPLIER (200 rows, 37 bytes/row => 7.4 KB)SNUM SNAME STATUS CITY Functional dependenciesS1 SMITH 20 LONDON SNUM --> SNAME, STATUS, CITYS2 JONES 10 PARIS CITY --> STATUSS3 BLAKE 10 PARISS4 CLARK 20 LONDONS5 ADAMS 30 ATHENSDecomposition of Table Supplier into two Third Normal Form (3NF) Tables
44
-
7/27/2019 01~Lecture Notes
45/66
Third Normal Form
TABLE SUPPLIER_W/O_STATUS (200 rows, 35 bytes/row => 7 KB)SNUM SNAME CITY Functional dependencyS1 SMITH LONDON SNUM --> SNAME, CITY
S2 JONES PARISS3 BLAKE PARISS4 CLARK LONDONS5 ADAMS ATHENS
TABLE CITY_AND_STATUS (100 rows, 12 bytes/row => 1.2 KB)CITY STATUS Functional dependencyLONDON 20 CITY --> STATUSPARIS 10ATHENS 30
In general, the FDs can be derived from1. Explicit assertions given2. ER diagram (implied by ER constructs)3. Intuition (your experience with the problem data)
45
-
7/27/2019 01~Lecture Notes
46/66
Functional Dependency Inference rules
(Armstrongs Axioms)
1. ReflexivityIf Y is a subset of the attributes of X, then X->Y.
X = ABCD, Y = ABC => X->YX->X trivial case
2. AugmentationIf X->Y and Z is a subset of table R (i.e. Z is any set of attributes in R), then XZ -> YZ .
3. TransitivityIf X->Y and Y->Z then X->Z.
4. Pseudo-transitivityIf X->Y and YW->Z then XW->Z.(transitivity is a special case of pseudo-transitivity when W is null)
5. UnionIf X->Y and X->Z then X->YZ.
6. DecompositionIf X->YZ then X->Y and X->Z.
Superkey Rule 1.Any FD involving all attributes of a table definesa super-key on the LHS of the FD.
Given: any FD containing all attributes in the table R(W,X,Y,Z), i.e. XY -> WZ.Proof:(1) XY -> WZ given(2) XY -> XY by the reflexivity axiom(3) XY -> XYWZ by the union axiom(4) XY uniquely determines every attribute in table R, as shown in (3)(5) XY uniquely defines table R, by the definition of a table as having no duplicate rows(6) XY is therefore a super-key, by the definition of a super-key.
Super-key Rule 2.Any attribute that functionally determines aSuper-key of a table, is also a super-key for that table.
Given: Attribute A is a super-key for table R(A,B,C,D,E), and E -> A.Proof:(1) Attribute A uniquely defines each row in table R, by the def. of a super-key(2) A -> ABCDE by the definition of a super-key and a relational table(3) E -> A given(4) E -> ABCDE by the transitivity axiom(5) E is a super-key for table R, by the definition of a super-key.
46
-
7/27/2019 01~Lecture Notes
47/66
3NF Synthesis Algorithm (Bernstein)
Basic definitions
g e H set of FDs
H+ closure of H - set of all FDs derivable from H using all the FD inference rules
H cover of H - any set of FDs from which every FD in H+ canbe derived
H(non-redundant) non-redundant cover of H, i.e. a cover which contains no proper subset
which is also a cover. Can be determined with quadratic complexity O(n2).
ExampleGiven a set of FDs H, determine a minimal set of tables in 3NF,while preserving all FDs and maintaining only lossless decomposition/joins.
H: AB->C DM->NP D->KLA->DEFG D->ME->G L->DF->DJ PR->SG->DI PQR->ST
Step 1: Eliminate any extraneous attributes in the left handsides of the FDs. We want to reduce the left hand sides of as many FDs as possible.In general: XY->Z and X->Z => Y is extraneous (Reduction Rule 1)
XYZ->W and X->Y => Y is extraneous (Reduction Rule 2)For this example we mix left side reduction with the union and decomposition axioms:
DM->NP => D->NP => D -> MNPD->M D->M
PQR->ST => PQR->S, PQR->T => PQR->.TPR->S PR->S PR->S
Step 2: Find a non-redundant cover H of H, i.e. eliminate any FDderivable from others in H using the inference rules (most frequently the transitivity axiom).
A->E->G => eliminate A->G from the coverA->F->D => eliminate A->D from the cover
Step 3: Partition H into tables such that all FDs with the
same left side are in one table, thus eliminating any non-fully functional FDs. (Note: creatingtables at this point would be a feasible solution for 3NF, but not necessarily minimal.)
R1: AB->C R4: G->DI R7: L->DR2: A->EF R5: F->DJ R8: PQR->TR3: E->G R6: D->KLMNPR9: PR->S
Step 4: Merge equivalent keys, i.e. merge tables where all FDs satisfy 3NF.
47
-
7/27/2019 01~Lecture Notes
48/66
4.1 Write out the closure of all LHS attributes resulting from Step 3, based on transitivities.
4.2 Using the closures, find tables that are subsets of other groups and try to merge them. Use Rule1 and Rule 2 to establish if the merge will result in FDs with super-keys on the LHS. If not, tryusing the axioms to modify the FDs to fit the definition of super-keys.
4.3 After the subsets are exhausted, look for any overlaps among tables and apply Rules 1 and 2(and the axioms) again.
In this example, note that R7 (L->D) has a subset of the attributes of R6 (D->KLMNP). Thereforewe merge to a single table with FDs D->KLMNP, L->D because it satisfies 3NF: D is a super-key by Rule 1 and L is a super-key by Rule 2.
Final 3NF (and BCNF) table attributes, FDs, and candidate keys:R1: ABC (AB->C with key AB) R5: DFJ (F->DJ with key F)R2: AEF (A->EF with key A) R6: DKLMNP (D->KLMNP, L->D, w/keys D, L)
R3: EG (E->G with key E) R7: PQRT (PQR->T with key PQR)R4: DGI (G->DI with key G) R8: PRS (PR->S with key PR)
Step 4a.Check to see whether all tables are also BCNF. For any table that is not BCNF, add theappropriate partially redundant table to eliminate the delete anomaly.
48
-
7/27/2019 01~Lecture Notes
49/66
Maiers Example using 3NF Synthesis
[Maier,D. The Theory of Relational Databases, Computer Science Press, 1983]
R = {A,B,C,D,E,F,G,H,I,J,K }
Functional dependencies (FDs):(1) E --> A B C D F G H I J K (7) H I --> J(2) A B C --> E D F G H I J K (8) I J --> H(3) A B D --> E C F G H I J K (9) H J --> I(4) G --> H I J(5) C F --> K(6) D F --> K
Step 1 - No reduction of determinants necessary.Step 2 - Find non-redundant cover.
(4) G->HIJ => eliminate HIJ from (1), (2), and (3)(7) HI->J => reduce (4) to G->HI, eliminating J from (4)(5) CF -> K => eliminate K from (1) and (3)(6) DF->K => eliminate K from (2)(1) E->DFG => eliminate DFG from (2)(1) E->CFG => eliminate CFG from (3)
Step 3 - Partition into groups with the same left side.G1: E->ABCDFG G6: DF->KG2: ABC->E G7: HI->JG3: ABD->E G8: IJ->HG4: G->HI G9: HJ->IG5: CF->K
Step 4 - Merge equivalent keys, forming new groups. Construct final set of tables, attributes, FDs, ancandidate keys.
R1: ABCDEFG ( E->ABCDFG, ABC->E, ABD->E with keys E, ABC, ABD)R2: GHI (G->HI with key G)R3: CFK (CF->K with key CF)R4: DFK (DF->K with key DFR5: HIJ (HI->J, IJ->H, HJ->I with keys HI, IJ, HJ)
49
-
7/27/2019 01~Lecture Notes
50/66
Example of a 3NF table that is not BCNF,
i.e. it has further anomalies:
S = student, C = course, I = instructor
SC -> I For each course, each student is taught by only one instructor. A course may be taught by mthan one instructor.
I -> C Each instructor teaches only one course.
This table is 3NF with a candidate key SC:SCI student course instructor
Sutton Math Von NeumannSutton Journalism MurrowNiven Math Von NeumannNiven Physics Fermi
Wilson Physics Einstein
Delete anomaly: If Sutton drops Journalism, then we have no record of Murrow teaching JournalismHow can we decompose this table into BCNF?
Decomposition 1 (bad)........eliminates the delete anomalySC (no FDs) and I -> C (two tables)Problems - 1. lossy join
2. dependency SC -> I is not preserved
SC student course IC instructor courseSutton Math Von Neumann Math
Sutton Journalism Murrow JournalismNiven Math Fermi PhysicsNiven Physics Einstein PhysicsWilson Physics
----------------join SC and IC ------------------SCI student course instructor
Sutton Math Von NeumannSutton Journalism MurrowNiven Math Von NeumannNiven Physics Fermi
Niven Physics Einstein (spurious row)Wilson Physics Fermi (spurious row)Wilson Physics Einstein
50
-
7/27/2019 01~Lecture Notes
51/66
Decomposition 2 (better).....eliminates the delete anomalySI (no FD) and I -> CAdvantages eliminates the delete anomaly, losslessDisadvantage - dependency SC -> I is not preserved
SI student instructor IC instructor courseSutton Von Neumann Von Neumann MathSutton Murrow Murrow JournalismNiven Von Neumann Fermi PhysicsNiven Fermi Einstein PhysicsWilson Einstein Dantzig Math (new)Sutton Dantzig (new)
The new row is allowed in SI using unique(student,instructor) in the create table command, and thejoin of SI and IC is lossless. However, a join of SI and IC now produces the following two rows:
student course instructorSutton Math Von Neumann
Sutton Math Dantzig which violates the FD SC -> I.
Oracle, for instance, has no way to automatically check SC->I, although you could write a proceduredo this at the expense of a lot of overhead.
Decomposition 3 (tradeoff between integrity and performance)SC -> I and I -> C (two tables with redundant data)Problems -extra updates and storage cost
51
-
7/27/2019 01~Lecture Notes
52/66
VII. An Example of Logical Database Design
Requirements Specification
The management of a large retail store would like a database to keep track of salesactivities. The requirements analysis for this database led to the following six entities andtheir unique identifiers:
Entity Entity key Key length(max) No. ofin characters occurrences
Customer cust-no 6 80,000Job job-no 24 80Order order-no 9 200,000Salesperson sales-id 20 150
Department dept-no 2 10Item item-no 6 5,000
The following assertions describe the data relationships:
Each customer has one job-title, but different customers may have the same job-title. Each customer may place many orders, but only one customer may place a particularorder. Each department has many salespeople, but each salesperson must work in only onedepartment. Each department has many items for sale, but each item is sold in only one
department. (Item means item type, like IBM PC). For each order, items ordered in different departments must involve different
salespeople, but all items ordered within one department must be handled byexactly one salesperson. In other words, for each order, each item has exactly onesalesperson; and for each order, each department has exactly one salesperson.
ER Construct FDs
Customer(many):Job(one) cust-no -> job-titleOrder(many): Customer(one) order-no -> cust-noSalesperson(many): Department(one) sales-id -> dept-noItem(many): Department(one) item-no -> dept-noOrder(many): Item(many): order-no,item-no->sales-id
Salesperson(one)Order(many): Department(many): order-no,dept-no-> sales-id
Salesperson(one)
52
-
7/27/2019 01~Lecture Notes
53/66
Figure 7.1
create table customer(cust_no char(6),job_title varchar(256),
primary key (cust_no),foreign key (job_title) referencesjobon delete set null on update cascade);
create tablejob(job_no char(6),job_title varchar(256),primary key (job_no));
create table order(order_no char(9),cust_no char(6) not null,primary key (order_no),foreign key (cust_no) references customer
on delete set null on update cascade);
create table salesperson(sales_id char(10)sales_name varchar(256),dept_no char(2),primary key (sales_id),
53
-
7/27/2019 01~Lecture Notes
54/66
foreign key (dept_no) references departmenton delete set null on update cascade);
create table department(dept_no char(2),
dept_name varchar(256),manager_name varchar(256),primary key (dept_no));
create table item(item_no char(6),dept_no char(2),primary key (item_no),foreign key (dept_no) references department
on delete set null on update cascade);
create table order_item_sales(order_no char(9),item_no char(6),sales_id varchar(256) not null,primary key (order_no, item_no),foreign key (order_no) references order
on delete cascade on update cascade,foreign key (item_no) references item
on delete cascade on update cascade,foreign key (sales_id) references salesperson
on delete cascade on update cascade);
create table order_dept_sales(order_no char(9),dept_no char(2),sales_id varchar(256) not null,primary key (order_no, dept_no),foreign key (order_no) references order
on delete cascade on update cascade,foreign key (dept_no) references department
on delete cascade on update cascade,foreign key (sales_id) references salesperson
on delete cascade on update cascade);
Table Primary key Likely non-keys
customer cust_no job_title, cust_name, cust_addressorder order_no cust_no, item_no, date_of_purchase,pricesalesperson sales_id dept_no, sales_name, phone_noitem item_no dept_no, color, model_no
54
-
7/27/2019 01~Lecture Notes
55/66
order_item_sales order_no,item_no sales_idorder_dept_sales order_no,dept_no sales_id
55
-
7/27/2019 01~Lecture Notes
56/66
56
-
7/27/2019 01~Lecture Notes
57/66
-
7/27/2019 01~Lecture Notes
58/66
-
7/27/2019 01~Lecture Notes
59/66
Figure 8.10 Star schema for the productivity tracking process
Table 8.2 Data warehouse bus for widget example
Shape
Color
Texture
Density
Size
EstimateDate
WinDate
Customer
Promotion
CostCenter
SchedStartDate
SchedStartTime
SchedFinishDate
SchedFinishTime
ActualStartDate
ActualStartTime
ActualFinishDate
ActualFinishTime
Employee
InvoiceDate
Estimating x x x x x x x x x x
Scheduling x x x x x x x x x
Productivity Tracking x x x x x x
Job Costing x x x x x x x x x
59
dd job numberfk cost center idfk employee idfk actual start date id
fk actual start time idfk actual finish date idfk actual finish time idwidget quantityfinished on timeestimated hoursactual hours
Productivity Detail
Employee
Cost CenterActual Start Date
Actual Start Time
Actual Finish Date
Actual Finish Time
-
7/27/2019 01~Lecture Notes
60/66
Figure 8.11 Star schema for the job costing process
Figure 8.12 Schema for the job costing daily snapshot
60
fk shape idfk color idfk texture idfk density idfk size id
fk estimate date iddd estimate numberfk win date iddd job numberfk customer idfk promotion idfk cost center idfk invoice date idwidget quantityestimated hourshourly rateestimated costmarkupdiscountpriceactual hoursactual cost
Job Costing DetailShape Color
Texture Density
Size
Customer
Promotion Cost Center
EstimateDate
Win Date
InvoiceDate
fk invoice date idwidget quantityestimated hoursestimated cost
priceactual hoursactual cost
Job Costing Daily Snapshot
InvoiceDate
-
7/27/2019 01~Lecture Notes
61/66
On-Line Analytical Processing (OLAP)
Figure 8.13 Product graph labeled with aggregation level coordinates
If we express Equation 8.1 in different terms, the problem of exponential explosionbecomes more apparent. Let g be the geometric mean of the number of hierarchical levelsin the dimensions. Then Equation 8.1 becomes Equation 8.2.
Possible views = gd (8.2)
61
d
Possible views = h i (8.1)i=1
Calendar Dimension(first ordinate)
0: date id1: month2: quarter3: year4: all
Customer Dimension(second ordinate)
0: cust id1: city2: state3: all
(0, 0)
(1, 0) (0, 1)
(1, 1) (0, 2)
(1, 2) (0, 3)
(1, 3)
(2, 0)
(2, 1)
(2, 2)
(2, 3)
(3, 0)
(3, 1)
(3, 2)
(3, 3)
(4, 0)
(4, 1)
(4, 2)
(4, 3)
Fact Table
P=0.9
0.9 0.9
0.9 0.9 0.9 0.9
0.1
0.1
0.1
0.1
0.1 0.1 0.1
0.7290.0810.081 0.0810.0090.009 0.0090.001
a = 2 leftsP
2= 0.009
C = 3 bins
a = 1 leftP
1= 0.081
C = 3 bins
a = 0 leftsP
0= 0.729
C = 1 bin32
31
30
a = 3 lefts
P3 = 0.001
C = 1 bin33
-
7/27/2019 01~Lecture Notes
62/66
Selection of Materialized Views
Figure 8.16 Example hypercube lattice structure
Table 8.3 Two iterations of HRU, based on Figure 8.16
Table 8.4 First iteration of PGA, based on Figure 8.16
Table 8.5 Second iteration of PGA, based on Figure 8.16
62
5.2M x 4 = 20.8M0 x 4 = 00 x 4 = 0
5.99M x 2 = 11.98M5.8M x 2 = 11.6M5.9M x 2 = 11.8M
6M - 1
0 x 2 = 00 x 2 = 0
0.79M x 2 = 1.58M0.6M x 2 = 1.2M
5.9M x 2 = 11.8M
0.8M - 1
{p, s}{c, s}{c, p}
{s}{p}{c}
{}
Iteration 1 Benefit Iteration 2 Benefit
c = Customer
p = Parts = Supplier
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
Fact Table
Candidates
{p, s}{s}{}
Iteration 1 Benefit
5.2M x 4 = 20.8M5.99M x 2 = 11.98M
6M - 1
Candidates
0 x 2 = 00.79M x 2 = 1.58M5.9M x 2 = 11.8M
6M - 1
{c, s}{s}{c}{}
Iteration 2 Benefit
-
7/27/2019 01~Lecture Notes
63/66
63
-
7/27/2019 01~Lecture Notes
64/66
IX. CASE Tools for Logical Database Design
Software tools provide significant value to database designers by
a) Dramatically reduced the complexity of conceptual and logical design, both
of which can be rather subtle to do well.
b) Automating the transformation of the logical design to a physical design (at
least the basic physical design).) to create the physical database.
c) Providing reporting, round trip engineering and reverse engineering that
make such tools invaluable in maintaining systems over a long period of
time.
Key Capabilities to Watch For
Complete round trip engineering
UML design
Schema evolution, Change management
Reverse engineering of existing systems
Team support, allowing multiple people to work on the same project concurrently.
Integrated with Eclipse and .NET and other tooling products
Component and convention re-use (being able to reuse naming standard, domain,
logical models over multiple design projects)
Reusable assets (extensibility, template, ...)
Reporting
Basic Transformations of the Conceptual Model to SQL Tables
Transform each entity into a table containing the key and nonkey attributesof the entity.
Transform every many-to-many binary or binary recursive relationship into
a relationship table with the keys of the entities and the attributes of the
relationship.
64
-
7/27/2019 01~Lecture Notes
65/66
Transform every ternary or higher-level n-ary relationship into a relationship
table.
Similarly these tools produce the transformation table types described in Chapter 5:
An entity table with the same information content as the original entity.
An entity table with the embedded foreign key of the parent entity.
A relationship table with the foreign keys of all the entities in the
relationship.
65
-
7/27/2019 01~Lecture Notes
66/66
Figure 9.3 Rational Data Architect entity relationship modeling (courtesy IBM)
Generating a database from a design
Database support
Collaborative support
1. Concurrency control.
2. Merge and collaboration capabilities that allow designers to combine designs
or merge their latest changes into a larger design project.
Distributed development
Application life cycle tooling integration
Design compliance checking
Design and Normalization
Discover 1st 2nd 3rd normalization
Index and Storage
check for excessive indexing
Naming Standards
Security Compliance
Sarbanes-Oxley compliance
Valid data model and rules
Model syntax checks
Reporting
Modeling a Data Warehouse
Semi-Structured data, XML
##