matthew p. johnson, ocl1, cisdd cuny, f20041 orcl1 oracle 8i: sql & pl/sql session #2 matthew p....
TRANSCRIPT
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
1
ORCL1 Oracle 8i:SQL & PL/SQLSession #2
Matthew P. Johnson
CISDD, CUNY
Fall, 2004
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
2
Agenda Last time: E/R models, some design issues This time: More design “carving at the joints”
Redundancy Whether an element should be an attribute or entity
set Replacing a relationships with entity sets
Constraints Identifying & specifying key attributes to an entity set Recognizing other types of single-valued constraints Representing referential integrity constraints Identifying & representing general constraints
Weak entity sets
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
3
Design Principles Faithfulness Avoiding redundancy Simplicity Choice of relationships Picking elements
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
4
Avoiding redundancy Say everything exactly once
Minimize database storage requirements More important: prevent possible update errors
simplest but not only e.g.: modify data one place but not the other – more later
Example: Spot the redundancy
Studios MoviesOwn
StudioName
Name
Length
Name
Address
Redundancy: Movies “knows” the studio two ways
Phone
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
5
Spot more redundancy
Different redundancy: studio info listed for every movie!
Movies
StudioName
Name
Length
SAddress
SPhone
Name Length Studio SAddress SPhonePulp Fiction … Miramax NYC 212-…Sylvia … Miramax NYC 212-…Jay & Sil. Bob … Miramax NYC 212-…
…
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
6
Don’t add relships that are implied
Students Courses
TAs
Enrolls
TA-of
Assist
Suppose each course again has <=1 TA
Q: Is the following good design?
A: If TAs other than the course’s TA can help students, then yes;
if not, then no: we can connect Students and TAs by going through Courses; redundant!
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
7
Correct E/R models may contain loops
Person plays multiple roles: employee of company buyer of product
price
address name ssn
Person
buys
makes
employs
CompanyProduct
name category
stockprice
name
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
8
More design
Repeating TA names & IDs – redundant TA is not TAing any course now lose TA’s data! TA should get its own ES
Students CoursesEnrolls
Q: What’s wrong with this design?
A:
TA-Name TA-ID
TA-Email
Course-ID CName
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
9
Opposite problem: Entity or attribute? Some E/Rs improved by removing entities Can convert Entity E attributes of F
1. R:FE is many-one one-one counts because special case
2. Attributes for E are independent of each other knowing one att val doesn’t tell us another att val
Then remove E add all attributes of E to F
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
10
Students CoursesEnrolls
TA-Name AssistsTA
Entity attributeCName
Room
Students CoursesEnrolls
CName
Room
TA-Name
Course-ID
Course-ID
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
11
Convert TA entity again?
No! Multiple TAs allowed Violates condition (1) Redundant course data
Students CoursesEnrolls
AssistsTA
CName CID Room TA-NameDBMS 46 123 HowardDBMS 46 123 Wesley
…
CName
Room
Course-ID
TA-Name
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
12
Convert TA entity again?
Students CoursesEnrolls
AssistsTA
CName
Room
Course-ID
TA-ID TA-Favorite-Color
No! TA has dependent fields Violates condition (2)
How can it tell? Redundant TA data
CName TA-Name TA-ID TA-ColorDBMS Ralph 678 GreenA.Soft. Ralph 678 Green
…
TA-Name
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
13
Entity or attributes? Should student address be an entity or an attribute? If student may have multiple addresses, must be entity
campus address, permanent address attributes cannot be set-valued
If we need to examine structure of address, must be entity find all students from NYS but not NYC
If attribute, then it’s probably a simple string no structure! NB: this choice is a microcosm of entire miniworld (much) power of a DB comes from the structure
imposed on the data
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
14
Larger example DB design Application: library database. Authors have written
books about various subjects; different libraries in the system may carry these books.
Entities (with attributes in parentheses): Authors (ssn, name, phone, birthdate) Books (ISDN, title) Subjects (sname, sid) Libraries (lname)
Relations [associating entities in square brackets]: Wrote-on [Authors, Subjects] Cover [Libraries, Subjects] On [Books, Subjects]
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
15
E/R of DB designName
Author
ssn phone birthdate
wrote-on
SubjectSNameTitle
Carries
LibraryLName
On Book
ISBN
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
16
Poor initial design First design is a poor model of this system Some info not captured:
How many copies does a lib. have of a given book? What edition of a book does the library have?
Design problems: no direct relship associating authors and books no direct relship associating libraries and books
Common queries complex and difficult/expensive What libraries carry books by a given author? What books has a given author written? Who is the author of a given book?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
17
Larger example DB design 2 Application: library database as before
Entities (with attributes in parentheses): Authors (ssn, name, phone, birthdate) Books (ISDN, title) Subjects (sname, sid) Libraries (lname)
Relations [associating entities in square brackets] (attributes in parentheses): Wrote [Authors, Books] Carries [Libraries, Books] (quantity, edition) On [Books , Subjects]
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
18
E/R of improved DB design
Rule of thumb: often queried together make closely connected
Name
Author
ssn phone birthdate
wrote
BookISBN
TitleCarries
LibraryLName
Edition
Quantity
On Subject
SName
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
19
Next topic: Constraints Review: programmer-defined rules stating what
should always be true about consistent databases Restrictions on data:
Keys (e.g. SSNs uniquely identify people) Single value constraints (e.g. everyone has 1 father) Referential Integrity (e.g. person’s record refers to father
father must exist) Domain constraints (e.g. gender in M/F, age in 0..150) General constraints (e.g. no more than 10 customers per
sales rep) Can’t infer constraints from data
may hold “accidentally” they are a part of the schema
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
20
E/R keys Uniquely identifies entity in ES Attribute or set of attributes
Two entities cannot agree on all key attributes These attributes determine all others
Every ES should have a key possibly including all attributes
Primary key attributes underlined More than one possible key:
Candidate keys, primary key
Practical tip: create intentional key attribute E.g. SSN, course-id, employee-id, etc. SSN likely shorter than (name,address) Prevents quasi-redundancy
address
name ssn
Person
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
21
Single-valued constraints “at most one” value
sharp arrows E.g. attributes: could be null or one Many-one relationships: the “one” part is
single-valued. Can think of key atts as (non-null) single-
valued
TACourse Assists
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
22
Referential integrity “Exactly one value” NOT NULL attributes Relationships
Non-null value refers to entity that exists Refer to entity with foreign key HTML analogy: no broken links Programming analogy: no dangling pointers Ways of handling deletion:
Prevent deletion as long as referrer exist Enforce deletion of all referrers
InstructorCourse Taught
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
23
Referential integrity – E/R e.g.
Insertion – must refer to existing entity Suppose need to add
course: “Oracle” instructor: MPJ
Q: Which order? Q: What if relship were exactly-exactly?
i.e., referential integrity in both directions? A: Put both inserts in one xact – later
Students CoursesEnrolls
Instructor
Taught
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
24
Other kinds of constraints Domain constraints
E.g. date: must be after 1980 Enumerated type: grades A through F, no E No specific E/R notation: mention with attribute or
relationship General constraints:
A class may have no more than 100 students; a student may not have more than 6 courses:
Students CoursesEnroll <=6<=100
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
25
Next topic: Weak entity sets (2.4) Definition:
Some or all key attributes belong to another ES Why:
An entity set is part of a hierarchy (not ISA) Connecting entity sets
The key consists of 0, 1 or more of its own attributes Key attributes of entity sets from supporting
relationships
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
26
Conditions of Supporting relationships
Supporting relationship R:EF R is many-one (E-F) or one-one R is binary Referential integrity from E to F
i.e. a rounded arrow Those atts supplied to E are the key attributes of F F itself may be weak
Another entity set G, and so on recursively
A1
A2
RE F
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
27
If several supporting relationships from E to F Keys of several different entities from F appear as
foreign key of E
Other many-one relationships Not necessarily supporting
Requirements for weak entity sets
From
By
Purchases A1
A2
A3
People
StoresAt-store
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
28
Weak entity sets Example: Hierarchy – species & genus Idea: species name unique per genus only
Species
name
Belongs-to Genus
name
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
29
Video store connecting entity sets e.g. was a weak entity set
Key: date, MID,SID, CID
Weak entity sets
MID
SID
CID
Rental
StoreOf
MovieOf
BuyerOf
date
Product
Store
Customer
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
30
E/R design summary Subject/design choices:
Should a concept be modeled as an ES or an att? Should a concept be modeled as an ES or a
relship? Identifying relationships: binary or multiway?
Constraints in the ER Model: Important in determining the best design. Much data semantics can (and should) be captured Normalization improves further – later
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
31
Agenda Last time: finished E/R models per se This time: Intro to relational model Converting ER diagrams to relations Functional dependencies
Keys and superkeys in terms of FDs Finding keys for relations Rules of FDs
Normalization
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
32
Next topic: the Relational Data Model
Database Model(E/R, other)
Relational Schema
Physicalstorage
Diagrams (E/R) Tables: column names: attributes rows: tuples
Complex file organizationand index structures.
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
33
Relations as tables
Name Price Category Manufacturer
gizmo $19.99 gadgets GizmoWorksPower gizmo $29.99 gadgets GizmoWorksSingleTouch $149.99 photography CanonMultiTouch $203.99 household Hitachi
tuples/rows/records/entities
Attribute names Product
table/relation
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
34
Relational terminology Relation is composed of tuples Tuples composed of attribute values
Attribute has atomic types
Relation schema: relation name + attribute names + attribute types
Relation instance: set of tuples order doesn’t matter
Database schema: set of relation schemas Database instance: relation instance for every
relation in the schema
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
35
Relations as sets Remember: math relation is a subset of the cross-
product of the attribute value sets R subset-of S x T Product subset-of Name x Price x Cat x Mft
One member of Product relation: (gizmo, $19.99, gadgets, GizmoWorks) in Product
DB Relation instance = math relation
Q: If relations are sets, why call “instances”? A: R is a member of the powerset P(SxT)
powerset = set of all subsets
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
36
More on tuples Formally, can also be a mapping
from attribute names to (correctly typed) values: name gizmo price $19.99 category gadgets manufacturer GizmoWorks
NB: ordered tuple is equiv to mapping Both ways supported in SQL
Sometimes we refer to a tuple by itself (note order of attributes) (gizmo, $19.99, gadgets, GizmoWorks) or Product(gizmo, $19.99, gadgets, GizmoWorks).
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
37
Updates/modifications The database maintains a current database state Modifications of data:
add a tuple delete a tuple update an attribute value in a tuple
DB Relation instance = math relation Idea: we saw partic. Product DB instance
add, delete rows different DB rel. instances technically, different math relations to DBMS, still the same relation/table
Modifications to the data are frequent Updates to the schema are rare, painful (Why?)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
38
E/R models to relations Recall justification:
design is easier in E/R implementation is easier/faster in R
Parallel to program compilation: design is easier in C/Java/whatever implemen. is easier/faster in machine/byte code
Strategy1. apply semi-mechanical conversion rules
2. improve by combining some relations
3. improve by normalization involves finding functional dependencies
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
39
E/R conversion rules Entity set … relation
attributes: attributes of entity set key: key of ES
Relationship relation attributes: keys of entity-sets/roles key: depends on multiplicity
NB: mapping of types is not one-one We’ll see: mapping one tokens is not one-one
Special treatment: Weak entity sets Isa relations & subclasses
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
40
Entity Sets Entity set Students
ssn
name
address
Students
John
Howard
Name
South Carolina444-555-6666
Park Avenue111-222-3333
AddressSSN
Rel: Students
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
41
Entity Sets
Course
CourseID
CourseName
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
42
Binary many-to-many relationships Key: keys of both entities
Why we learned to recognize keys
C30.0046444-555-6666
C20.0056111-222-3333
C20.0046111-222-3333
CourseIDssn
Relation: Enrolls
EnrollsS_addr
S_NameStudents Course
Course-Name
CourseID
ssn
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
43
Many-to-one relationships
Key: keys of many entity
Movies Studiosowns
2003SyliaM202
1999Mr. Ripley.M101
YearTitleMovieID
Movies
OrlandoDisneyS73
NYCMiramaxS35
AddressNameStudioID
Studios
S35
S73
StudioID
CN22222
CN11111
CopyrightNo
M202
M101
MovieIDOwns
CopyrightNo
MovieID
Title
Year StudioID
NameAddress
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
44
Improving on many-one Note rules applied:
Movies Rel.: all atts from Movies ES Studios Rel: all atts from Studios ES Owns Rel: att key atts from Movies & Studios ESs
But: Owns:MoviesStudios is many-one for each row in Movies, there’s a(/no) row in Owns just add the Owns data to Movies
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
45
Many-to-one: a better design
Q: What if a movie’s Owns row were missing?
2003SyliaM202
1999Mr. Ripley.M101
YearTitleMovieID
Movies
S35
S73
StudioID
CN22222
CN11111
CopyrightNo
M202
M101
MovieID
Owns
CN22222
CN11111
CopyrightNo
S35
S73
StudioID
2003
1999
Year
SyliaM202
Talent Mr. Ripley
M101
TitleMovieID
Movies’
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
46
Many-to-many relationships again Won’t work for many-many relationships
acts
MovieID Title Year
M101 Mr. Ripley 1999
M202 Sylia 2003
M303 P.D. Love 2002
StarID Name Address
T400 Gwyneth P. Bev.Hills
T401 P.S. Hoffman Hollywood
T402 Jude Law Palm Springs
MovieID StarID
M101 T400
M202 T400
M101 T401
M101 T402
M303 T401
Movies
Stars
Acts
Movies Stars
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
47
Many-to-many relationships again
MovieID Title Year StarID
M101 Talented Mr. Ripley 1999 T400
M101 Talented Mr. Ripley 1999 T401
M101 Talented Mr. Ripley 1999 T402
M202 Sylia 2003 T400
M303 Punch Drunk Love 2003 T401
And here’s why:
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
48
Multiway relationships & roles
Different roles treated as different entity sets Key: keys of the many entities
Students Courses
TAs
tutors graders
enrolls
TA_SSN Name
SSN CourseID
Name Name
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
49
Multiway relationships & roles
Enrolls(S_SSN, Course_ID, Tutor_SSN, Grader_SSN)
SSN Name
111-11-1111 George
222-22-2222 Dick
TA_SSN Name
333-33-3333 Wesley
444-44-4444 Howard
555-55-5555 John
Students TAsCourseID Name
C20.0046 Databases
C20.0056 Software
Courses
S_SSN CourseID Tutor_SSN Grader_SSN
111-11-1111 C20.0046 333-33-3333 444-44-4444
222-22-2222 C20.0046 444-44-4444 555-55-5555
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
50
Converting weak ESs – differences
Atts of Crew Rel are: attributes of Crew key attributes of
supporting ESs
Crew Unit-of Studio
StudioNameCrew_ID
address
C2Miramax
C1Disney
C1Miramax
Crew_IDStudioName
Crew
Supporting relships are omitted (why?)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
51
Weak entity sets - relationships
Crew Studio
StudioNameCrew_ID
address
Insurance
IName
Address 1260 7th Av.NYBlueCross
1250 6th Av.NYAetna
AddressIName
InsuranceSubscribes
Unit-of
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
52
Weak entity sets - relationships Non-support relationships for weak ESs are
converted keys include entire weak ES key
C21
C22
C21
Crew_ID
Aetna
BlueCross
Aetna
Insurer
Universal
Disney
Universal
StudioName
Subscribes
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
53
Conversion example Video store rental example, plus some atts
Q: Conversion to relations?
Rental
VideoStore
Customer
Movie
date
yearMNameaddress
Cname
MID
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
54
Conversion example, continued Resulting binary-relationship version
Q: Conversion to relations?
Rental
Customer
Store
Movie
StoreOf
MovieOf
BuyerOf
dateyearMName
address
Cname
MID
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
55
Converting inheritance hierarchies No best way Several non-ideal methods:
E/R-style: each ES relation OO-style: each possible “object” relation nulls-style: each rooted hierarchy relation
non-applicable fields filled in with nulls
Pros & cons for each method, exist situations favoring it
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
56
Converting inheritance hierarchies
Movies
Cartoons Murder-Mysteries
isa isaVoices
Weapon
stars
length title year
Lion King
Component
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
57
Inheritance: E/R-style conversion Each ES relation
Root entity set: Movies(title, year, length)
1301993Lion King
1988
1990
1980
Year
110
115
120
length
Roger Rabbit
Scream
Star Wars
Title
Knife1990R. Rabbit
1988
Year
Knife
murderWeapon
Scream
Title
Subclass: MurderMysteries(title, year, murderWeapon)
Subclass: Cartoons(title, year)
1993Lion King
1990
Year
Roger Rabbit
Title
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
58
E/R-style & quasi-redundancy Name and year of Roger Rabbit were listed in three
different rows (in different tables) Suppose title changes (“Roger” “Roget”)
must change all three places Q: Is this redundancy? A: No!
name and year are independent multiple movies may have same name
Real redundancy reqs. dependency two rows agree on SSN must agree on rest
conflicting hair colors in these rows is an error two rows agree on movie title may still disagree
conflicting years may be correct – or may not be Better: introduce “movie-id” key att
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
59
Subclasses: object-oriented approach Every possible “subtree” (what’s this?):
1. Movies
2. Movies + Cartoons
3. Movies + Murder-Mysteries
4. Movies + Cartoons + Murder-Mysteries
Title Year length
Star Wars 1980 120
Title Year length Murder-Weapon
Scream 1988 110 Knife
Title Year length
Lion King 1990 115
Title Year length Murder-Weapon
Roger Rabbit 1988 110 Knife
1. 3.
2. 4.
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
60
Subclasses: nulls approach One relation for entire hierarchy Any non-applicable fields are NULL
Q: How do we know if a movie is a MM? Q: How do we know if a movie is a cartoon?
Title Year length Murder-Weapon
Star Wars 1980 120 NULL
Lion King 1993 130 NULL
Scream 1988 110 Knife
Roger Rabbit 1990 115 Knife
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
61
Agenda Last time: relational model Homework 1 is up, due next Tuesday This time:
1. Functional dependencies Keys and superkeys in terms of FDs Finding keys for relations
2. Rules for combining FDs Next time: anomalies & normalization
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
62
Next topic: Functional dependencies (3.4) FDs are constraints
part of the schema can’t tell from particular relation instances FD may hold for some instances “accidentally”
Finding all FDs is part of DB design Used in normalization
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
63
Functional dependencies Definition:
Notation: Read: Ai functionally determines Bj
If two tuples agree on the attributes
A1, A2, …, AnA1, A2, …, An
then they must also agree on the attributes
B1, B2, …, BmB1, B2, …, Bm
A1, A2, …, An B1, B2, …, BmA1, A2, …, An B1, B2, …, Bm
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
64
Typical Examples of FDs Product
name price, manufacturer
Person ssn name, age father’s/husband’s-name last-name zipcode state phone state (notwithstanding inter-state area codes)
Company name stockprice, president symbol name name symbol
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
65
To check A B, erase all other columns; for each rows t1, t2
i.e., check if remaining relation is many-one no “divergences” i.e., if AB is a well-defined function thus, functional dependency
Functional dependencies
Bm...B1Am...A1
t1
t2
if t 1, t 2 agree here then t 1, t 2 agree here
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
66
FDs Example
Product(name, category, color, department, price)
name colorcategory departmentcolor, category price
name colorcategory departmentcolor, category price
Consider these FDs:
What do they say ?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
67
FDs ExampleFDs are constraints:• On some instances they hold• On others they don’t
name category color department price
Gizmo Gadget Green Toys 49
Tweaker Gadget Green Toys 99
Does this instance satisfy all the FDs ?
name colorcategory departmentcolor, category price
name colorcategory departmentcolor, category price
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
68
FDs Example
name category color department price
Gizmo Gadget Green Toys 49
Tweaker Gadget Black Toys 99
Gizmo Stationary GreenOffice-supp.
59
What about this one ?
name colorcategory departmentcolor, category price
name colorcategory departmentcolor, category price
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
69
Q: Is PositionPhone an FD here?
A: It is for this instance, but no, presumably not in general
Others FDs? EmpID Name, Phone, Position but Phone Position
Recognizing FDs
EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
70
Keys of relations {A1A2A3…An} is a key for relation R if
1. A1A2A3…An functionally determine all other attributes
Usual notation: A1A2A3…An B1B2…Bk
rels = sets distinct rows can’t agree on all Ai
2. A1A2A3…An is minimal No proper subset of A1A2A3…An functionally determines
all other attributes of R
Primary key: chosen if there are several possible keys
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
71
Keys example Relation: Student(Name, Address, DoB,
Email, Credits) Which (/why) of the following are keys?
SSN Name, Address (on reasonable assumptions) Name, SSN Email, SSN Email
NB: minimal != smallest
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
72
Superkeys A set of attributes that contains a key Satisfies first condition:
functionally determines every other attribute in the relation
Might not satisfy the second condition: minimality may be possible to peel away some attributes
from the superkey keys are superkeys
key are special case of superkey superkey set is superset of key set
name;ssn is a superkey but not a key
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
73
Discovering keys for relations Relation entity set
Key of relation = (minimized) key of entity set Relation binary relationship
Many-many: union of keys of both entity sets Many(M)-one(O): only key of M (Why?) One-one: key of either entity set (but not both!)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
74
Example – entity sets Key of entity set = (minimized) key of relation
Student(Name, Address, DoB, SSN, Email, Credits)
Student
Name
Address
DoB
SSN
Credits
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
75
Example – many-many Many-many key: union of both ES keys
Student Enrolls Course
SSN Credits CourseID Name
Enrolls(SSN,CourseID)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
76
Example – many-one Key of many ES but not of one ES
keys from both would be non-minimal
Course MeetsIn Room
CourseID Name RoomNo Capacity
MeetsIn(CourseID,RoomNo)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
77
Example – one-one Keys of both ESs included in relation Key is key of either ES (but not both!)
Husbands Married Wives
SSN Name SSN Name
Married(HSSN, WSSN) or
Married(HSSN, WSSN)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
78
Discovering keys: multiway Multiway relationships:
Multiple ways – may not be obvious R:F,G,HE is many-one E’s key is included
but not part of key Recall that relship atts are implicitly many-one
Course Enrolls Student
CourseID Name SSN NameSection
RoomNo Capacity
Enrolls(CourseID,SSN,RoomNo)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
79
Combining FDs (3.5)If some FDs are satisfied, thenothers are satisfied too
If all these FDs are true:name colorcategory departmentcolor, category price
name colorcategory departmentcolor, category price
Then this FD also holds: name, category pricename, category price
Why?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
80
Splitting & combining FDsSplitting rule:
Combining rule:
Note: doesn’t apply to the left side
Q: Does it apply to the left side?
A1A2…An B1B2…BmA1A2…An B1B2…Bm
A1, A2, …, An B1
A1, A2, …, An B2
. . . . .A1, A2, …, An Bm
A1, A2, …, An B1
A1, A2, …, An B2
. . . . .A1, A2, …, An Bm
Bm...B1Am...A1
t1
t2
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
81
Reflexive rule: trivial FDs
FD A1A2…An B1B2…Bk may be Trivial: Bs are a subset of As Nontrivial: >=1 of the Bs is not among the As Completely nontrivial: none of the Bs is among the As
Trivial elimination rule: Eliminate common attributes from Bs, to get an equivalent
completely nontrivial FD
A1, A2, …, An AiA1, A2, …, An Ai with i in 1..n is a trivial FD
A1 … An
t
t’
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
82
Transitive ruleIf
and
then
A1, A2, …, An B1, B2, …, BmA1, A2, …, An B1, B2, …, Bm
B1, B2, …, Bm C1, C2, …, CpB1, B2, …, Bm C1, C2, …, Cp
A1, A2, …, An C1, C2, …, CpA1, A2, …, An C1, C2, …, Cp
A1 … Am B1 … Bm C1 ... Cp
t
t’
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
83
Example R(A,B,C) Each of three determines other two Q: What are the FDs?
Closure of singleton sets Closure of doubletons
Q: What are the keys? Q: What are the minimal bases?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
84
Examples of Keys Product(name, price, category, color)
name, category price
category color
Keys are: {name, category}
Enrollment(student, address, course, room, time)student address
room, time course
student, course room, time
Keys are: [in class]
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
85
Agenda Last time: FDs Project part 2 up soon This time:
1. Anomalies
2. Normalization Next time: Relational Algebra
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
86
Review examples: finding FDs Product(name, price, category, color)
name, category price
category color
Keys are: {name, category}
Enrollment(student, address, course, room, time)student address
room, time course
student, course room, time
Keys are: [in class]
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
87
Another review example Relation R(A,B,C) Each of three attributes determines other two Q: What are the FDs?
Closure of singleton sets Closure of doubletons
Q: What are the keys? Q: What are the minimal bases?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
88
Next topic: Anomalies (3.6) Identify anomalies in existing schema How to decompose a relation Boyce-Codd Normal Form (BCNF) Recovering information from a decomposition Third Normal Form
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
89
Types of anomalies Redundancy
Repeat info unnecessarily in several tuples Update anomalies:
Change info in one tuple but not in another Deletion anomalies:
Delete some values & lose other values too Insert anomalies:
Inserting row means NULL-ing some fields
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
90
Example of anomalies
Redundancy: name, address Update anomaly: Bill moves Delete anom.: Bill doesn’t pay bills, lose phones lose Bill! Underlying cause: SSN-phone is many-many Effect: partial dependency ssn name, address
Name SSN Mailing-address Phone
Michael 123 NY 212-111-1111
Michael 123 NY 917-111-1111
Hilary 456 DC 202-222-2222
Hilary 456 DC 914-222-2222
Bill 789 Chappaqua 914-222-2222
Bill 789 Chappaqua 212-333-3333
SSN Name, Mailing-addressSSN Name, Mailing-address SSN PhoneSSN Phone
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
91
Decomposition by projection Soln: replace anomalous R with projections of
R onto two subsets of attributes Projection: an operation in Relational Algebra
projection = SELECT in SQL
Projecting R onto attributes (A1,…,An) means removing all other attributes Result of projection is another relation Yields tuples whose fields are A1,…,An
Resulting duplicates ignored
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
92
Projection for decomposition
R1 = projection of R on A1, ..., An, B1, ..., Bm
R2 = projection of R on A1, ..., An, C1, ..., Cp
A1, ..., An B1, ..., Bm C1, ..., Cp = all attributes, usually disjoint setsR1 and R2 may (/not) be reassembled to produce original R.
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
93
Chappaqua789Bill
NY123Hilary
NY123Michael
Mailing-addressSSNName
Decomposition example
The anomalies are gone No more redundant data Easy to for Bill to move Okay for Bill to lose all phones
Break the relation into two:
Name SSN Mailing-address Phone
Michael 123 NY 212-111-1111
Michael 123 NY 917-111-1111
Hilary 456 DC 202-222-2222
Hilary 456 DC 914-222-2222
Bill 789 Chappaqua 914-222-2222
Bill 789 Chappaqua 212-333-3333
212-333-3333789
914-222-2222789
914-222-2222567
202-222-2222456
917-111-1111123
212-111-1111123
PhoneSSN
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
94
Thus: high-level strategy
PersonbuysProduct
name
price name ssn
Conceptual Model:
Relational Model:plus FD’s
Normalization:Eliminates anomalies
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
95
Using FDs to produce good schemas Start with set of relations Define FDs (and keys) for them based on real
world Transform your relations to “normal form”
(normalize them) Do this using “decomposition”
Intuitively, good design means No anomalies Can reconstruct all original information
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
96
Decomposition terminology Projection: eliminating certain attributes from
relation Decomposition: separating a relation into two by
projection Join: (re)assembling two relations
Whenever a row from R1 and a row from R2 have the same value for att A join to form a row of R3
If the original data can be reproduced after a decomposition by joining the relations, then the decomposition was lossless We join on the attributes R1 and R2 have in common (As)
If it can’t, the decomposition was lossy
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
97
A decomposition is lossless if we can recover:
R(A,B,C)
R1(B,C) R2(B,A)
R’(A,B,C) should be the same as R(A,B,C)
R’ is in general larger than R. Must ensure R’ = R
Decompose
Recover
Lossless DecompositionsLossless Decompositions
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
98
Lossless decomposition Sometimes the data can be reproduced:
(MSOffice, 100) + (MSOffice, WP) (MSOffice, 100, WP) (MSOffice, 100) + (MSOffice, DB) (MSOffice, 100, DB) (Oracle, 1000) + (Oracle, DB) (Oracle, 1000, DB)
Name Price Category
MSOffice 100 WP
Oracle 1000 DB
MSOffice 100 DB
Name Price
MSOffice 100
Oracle 1000
MSOffice 100
Name Category
MSOffice WP
Oracle DB
MSOffice DB
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
99
Lossy decomposition Sometimes it’s not:
(MSOffice, WP) + (100, WP) (MSOffice, 100, WP) (Oracle, DB) + (1000, DB) (Oracle, 1000, DB) (Oracle, DB) + (100, DB) (Oracle, 100, DB) (MSOffice, DB) + (1000, DB) (MSOffice, 1000, DB) (MSOffice, DB) + (100, DB) (MSOffice, 100, DB)
Name Price Category
MSOffice 100 WP
Oracle 1000 DB
MSOffice 100 DB
Name Category
MSOffice WP
Oracle DB
MSOffice DB
Price Category
100 WP
1000 DB
100 DB
What’swrong?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
100
Ensuring lossless decomposition
Examples: name price, so first decomposition was lossless name category, so second decomposition was lossy
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
If A1, ..., An B1, ..., Bm
Then the decomposition is lossless
R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)
Note: don’t necessarily need A1, ..., An C1, ..., Cp
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
101
Quick lossless/lossy example
At a glance: can we decompose into R1(Y,X), R2(Y,Z)?
At a glance: can we decompose into R1(X,Y), R2(X,Z)?
X Y Z
1 2 3
4 2 5
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
102
Next topic: Normal Forms First Normal Form = all attributes are atomic
As opposed to set-valued Assumed all along
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
103
Most important: BCNFA simple condition for removing anomalies from relations:
I.e.: The left side must always contain a keyI.e: If a set of attributes determines other attributes, it must determine all the attributes
A relation R is in BCNF if:
If As Bs is a non-trivial dependency
in R , then As is a superkey for R
A relation R is in BCNF if:
If As Bs is a non-trivial dependency
in R , then As is a superkey for R
Codd: Ted Codd, IBM researcher, inventor of relational model, 1970
Boyce: Ray Boyce, IBM researcher, helped develop SQL in the 1970s
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
104
BCNF decomposition algorithmRepeat choose A1, …, Am B1, …, Bn that violates the BNCF condition
split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2
Until no more violations
Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition
split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2
Until no more violations
A’s OthersB’s
R1 R2
//Heuristic: choose Bs as large as possible
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
105
Boyce-Codd Normal Form Name/phone example is not BCNF:
{ssn,phone} is key FD: ssn name,mailing-address holds
Violates BCNF: ssn is not a superkey
Its decomposition is BCNF Only superkeys anything else
Name SSN Mailing-address Phone
Michael 123 NY 212-111-1111
Michael 123 NY 917-111-1111
Name SSN Mailing-address
Michael 123 NY
SSN PhoneNumber
123 212-111-1111
123 917-111-1111
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
106
BCNF Decomposition Larger example: multiple decompositions {Title, Year, Studio, President, Pres-Address} FDs:
Title Year Studio Studio President President Pres-Address Studio President, Pres-Address (why?)
No many-many this time Problem cause: transitive FDs:
Title,year studio president
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
107
BCNF Decomposition Illegal: As Bs, where As don’t include key Decompose: Studio President, Pres-Address
As = {studio} Bs = {president, pres-address} Cs = {title, year}
Result:1. Studios(studio, president, pres-address)2. Movies(studio, title, year)
Is (2) in BCNF? Is in (1) BCNF? Key: Studio FD: President Pres-Address Q: Does president studio? If so, president is a key But if not, it violates BCNF
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
108
BCNF Decomposition Studios(studio, president, pres-address) Illegal: As Bs, where As don’t include key Decompose: President Pres-Address
As = {president} Bs = {pres-address} Cs = {studio}
{Studio, President, Pres-Address} becomes {President, Pres-Address} {Studio, President}
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
109
BCNF and two-att relations Must a two-attribute relation be in BCNF?
Case 1: there are no non-trivial FDs Case 2: A B but not B A Case 3: B A but not A B Case 4: Both A B and B A
Note that relations may have multiple keys BCNF requires a key on the left, not all keys
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
110
Agenda Last time: Normalization Homework 1 due now Project part 2 is up, due on the 19th (Thurs.) This time:
1. Finish BCNF
2. 3NF
3. 4NF
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
111
BCNF Review Q: What’s required for BCNF?
Q: What’s the slogan for BCNF?
Q: Who are B & C?
Q: What are the two types of violations?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
112
BCNF Review Q: How do we fix a non-BCNF relation?
Q: If AsBs violates BCNF, what do we do? Q: In this case, could the decomposition be lossy?
Q: Under what circumstances could a decomposition be lossy?
Q: How do we combine two relations?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
113
Decomposition algorithm example R(N,O,R,P) F = {N O, O R, R N}
Key: N,P Violations of BCNF: N O, OR, N OR
which kinds of violations are these? Pick N OR (on board) Can we rejoin? (on board) What happens if we pick N O instead? Can we rejoin? (on board)
Name Office Residence Phone
George Pres. WH 202-…
George Pres. WH 486-…
Dick VP NO 202-…
Dick VP NO 307-…
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
114
Lossless BCNF decomposition Consider simple relation: R(A,B,C) Only FD: A B (assume C!A) Key: A,C
Diff vars from text! Also goes through if assumption is false BCNF violation (which kind?): no key on the left Thus: Decomposition to BCNF: Create R1(A,B) and R2(A,C) Could this be lossy? We will join R1 and R2 on A to find out
Q: If C A, then what kind do we have?
Q: Since C ! A, what kind of bad FD do we have?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
115
Lossless BCNF decomposition Suppose R contains (b,a,c) and (b’,a,c’) In projection onto (B,A):
(b,a,c) (b,a), (b’,a,c’) (b’,a) In projection onto (A,C):
(b,a,c) (a,c), (b’,a,c’) (a,c’) In joining, (b’,a), (a,c) (b’,a,c) Q: Is/must/can this be correct? A: Yes! A B, so b = b’ So this was lossless We assumed C!A, but argument also goes
through when CA Moral: BCNF decomp alg is always lossless
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
116
BCNF summary BCNF decomposition is lossless
Can reproduce original by joining Saw last time: Every 2-attribute relation is in
BCNF Final set of decomposed relations might be
different depending on Order of bad FDs chosen
Saw last time: But all results will be in BCNF
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
117
A problem with BCNF Relation: R(Title, Theater, Neighboorhood) FDs:
Title,N’hood Theater Assume movie can’t play twice in same neighborhood
Theater N’hood Keys:
{Title, N’hood} {Theater, Title}
Title Theater N’hood
City of God Angelica Village
Fog of War Angelica Village
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
118
A problem with BCNF BCNF violation: Theater N’hood Decompose:
{Theater, N’Hood} {Theater, Title}
Resulting relations:
VillageAngelica
N’hoodTheater
R1
Fog of WarAngelica
City of GodAngelica
TitleTheater
R2
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
119
Problem - continued Suppose we add new rows to R1 and R2:
Their join:
City of GodVillageFilm Forum
Village
Village
N’hood
Fog of War
City of God
Title
Angelica
Angelica
Theater
(R’)
Theater N’hood
Angelica Village
Film Forum Village
Theater Title
Angelica City of God
Angelica Fog of War
Film Forum City of God
R1 R2
A and B could not enforce FD Title,N’hood Theater
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
120
Third normal form: motivation There are some situations in which
BCNF is not dependency-preserving, and Efficient checking for FD violation on updates is
important
In these cases BCNF is too severe a req. Solution: define a weaker normal form, called
Third Normal Form in which FDs can be checked on individual relations
without performing a join (no inter-relational FDs) to which relations can be converted, preserving both
data and FDs
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
121
Third Normal Form BCNF decomposition is not dependency-preserving! We now define the (weaker) Third Normal Form
Turns out: this example was already in 3NF
A relation R is in 3rd normal form if :
For every nontrivial dependency A1, A2, ..., An Bfor R, {A1, A2, ..., An } is a super-key for R, or B is part of a key, i.e., B is prime
A relation R is in 3rd normal form if :
For every nontrivial dependency A1, A2, ..., An Bfor R, {A1, A2, ..., An } is a super-key for R, or B is part of a key, i.e., B is prime
Tradeoff:BCNF = no FD anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
122
BCNF: vices and virtues Be clear on the problem just described v. the
arg. that BCNF decomp is lossless BCNF decomp does not lose data
Resulting relations can be rejoined to obtain the original
But: it can can lose dependencies After decomp, possible to add rows whose
corresponding rows would be illegal in (rejoined) original
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
123
Recap: goals of normalization When we decompose a relation R with FDs F into
R1..Rn we want:
1. lossless-join decomposition – no data lost
2. no/little redundancy: the relations Ri should be in either BCNF or at least 3NF
3. Dependency preservation: if Fi be the set of dependencies in F+ that include only attributes in Ri:
F is the “sum” of the FDs of the new relations (F1 F2 F3 … Fn)+ = F+
Otherwise checking updates for violation of FDs may require computing joins, which is expensive
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
124
Dependency preservation Saw that last req. didn’t hold in move-theater
example Did it hold in R(N,O,R,P) example?
(on board)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
125
Testing for 3NF For each dependency X Y, use attribute closure
to check if X is a superkey If X is not a superkey, verify that each attribute in Y
is prime This test is rather more expensive, since it involves finding
candidate keys Testing for 3NF is NP-complete (in what?) Interestingly, decomposition into 3NF can be done in
polynomial time Testing for 3NF is harder than decomposing into 3NF!
Optimization: need to check only FDs in F, need not check all FDs in F+ (why?)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
126
3NF Example R = (J, K, L) F = (JK L, L K) Two candidate keys: JK and JL R is in 3NF
JK L JK is a superkey L K K is prime
BCNF decomposition yields R1 = (L,K), R2 = (L,J)
testing for JK L requires a join There is some redundancy in R
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
127
BCNF and 3NF Comparison Example of problems due to redundancy in 3NF
R = (J, K, L) F = (JK L, L K)
A schema that is in 3NF but not BCNF has the problems of: redundancy (e.g., the relationship between l1 and k1) need to use null values (if allowed!), e.g. to represent the
relationship between l2 and k2 when there is no corresponding value for attribute J
J K L
j1 k1 l1
j2 k1 l1
j3 k1 l1
NULL k2 l2
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
128
Comparison of BCNF and 3NF It is always possible to decompose a relation
into relations in 3NF such that: the decomposition is lossless the dependencies are preserved
It is always possible to decompose a relation into relations in BCNF such that: the decomposition is lossless but it may not be possible to preserve
dependencies But may eliminate more redundancy
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
129
The Normal Forms (so far) 1NF: every attribute has an atomic value 2NF: 1NF and no partial dependencies 3NF: for each FD X Y either
it is trivial, or X is a superkey, or Y is a part of some key
BCNF: 3NF and third 3NF option disallowed I.e, 2NF and no transitive dependencies
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
130
Distinguishing examples 1NF but not 2NF: R(Name, SSN ,Mailing-
address,Phone) Key: SSN,Phone Partial: ssn name, address
2NF but not 3NF: R(Title,Year,Studio,Pres,Pres-Addr) Key: Title,Year Transitive: studio president
3NF but not BCNF: R(Title, Theater, N’hood) Title,N’hood Theater Prime-on-right: Theater N’hood
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
131
Design Goals Goal for a relational database design is:
No redundancy Lossless Join Dependency Preservation
If we cannot achieve this, we accept one of dependency loss use of more expensive inter-relational methods to preserve
dependencies data redundancy due to use of 3NF
Interesting: SQL does not provide a direct way of specifying FDs other than superkeys can specify FDs using assertions, but they are expensive to test
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
132
3NF 3NF means we may have anomalies Example: TEACH(student, teacher, subject)
student, subject teacher (students not allowed in the same subject with two teachers)
teacher subject (each teacher teaches one subject) Subject is prime, so this is 3NF
But we have anomalies: Insertion: cannot insert a teacher until we have a
student taking his subject If we convert to BCNF, we lost student,
subject teacher
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
133
BCNF and over-normalization What is the problem? Schema overload – trying to capture two meanings:
1) subject X can be taught by teacher Y 2) student Z takes subject W from teacher V
What to do? 3NF has anomalies, normalizing to BCNF loses FDs One soln: keep the 3NF TEACH and another
(BCNF) relation SUBJECT-TAUGHT (teacher, subject)
Still (more!) redundancy, but no more insert and delete anomalies
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
134
Normalization Review Q: What’s required for BCNF?
Q: What are the two types of violations?
Q: What’s the loophole for 3NF?
Q: How do we fix a non-BCNF relation?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
135
Normalization Review Q: If AsBs violates BCNF, what do we do?
Q: In this case, could the decomposition be lossy?
Q: How do we combine two relations?
Q: Can BCNF decomp. lose FDs?
Q: Can 3NF decomp. lose FDs?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
136
Redundancy in BCNF
Lots of redundancy! Key? All fields
None determined by others! Non-trivial FDs? None! In BCNF? Yes!
Name Streets Citys Jobs
Michael 111 East 60th Street New York Mayor
Michael 222 Brompton Road London Mayor
Michael 111 East 60th Street New York CEO
Michael 222 Brompton Road London CEO
Hilary 333 Some Street Chappaqua Senator
Hilary 444 Embassy Row Washington Senator
Hilary 333 Some Street Chappaqua First Lady
Hilary 444 Embassy Row Washington First Lady
Hilary 333 Some Street Chappaqua Lawyer
Hilary 444 Embassy Row Washington Lawyer
Now what? New concept, leading
to another normal form: Multivalued
dependencies
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
137
E/R to relational model
courses Depts
Computer-allocation
room
number
givenBy
name chair