data modeling cmpsci 445avid.cs.umass.edu/courses/445/f2008/07-datamodeling.pdf · data modeling...

35
1 Data Modeling CMPSCI 445 Fall 2008

Upload: others

Post on 02-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

1

Data ModelingCMPSCI 445

Fall 2008

Page 2: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

Exercise

• Design a schema to store the data used by iTunes (or a similar application)

• Things to keep in mind:– Can all necessary information be represented?– Are properties represented in more than one

place?– Are common operations efficient?

2

Page 3: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

iTunes information

• Database includes: songs, artists, albums, album artwork, playlists

• Sample fields include: song name, artist, year, albumName, trackNum, genre, lastPlayedDate, playCount, albumArtwork, songRating, albumRating?, albumReleaseDate, artistDOB

• playLists: an ordered list of songs 3

Page 4: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

4

Relational Schema Design

PersonbuysProduct

name

price name ssn

ER Model

Relational Schemaplus Integrity Constraints

Normalized schema

Conceptual Design

Logical design

SchemaRefinement

Page 5: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

5

Entity / Relationship Diagrams

Entity sets Product

address

buys

Attributes

Relationships

Page 6: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

6address name ssn

Person

buys

makes

employs

CompanyProduct

name category

stockprice

name

price

Page 7: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

7

Keys in E/R Diagrams

• Every entity set must have a key

Product

name category

price

Page 8: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

8

Multiplicity of E/R Relations

• one-one:

• many-one

• many-many

123

abcd

123

abcd

123

abcd

Page 9: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

9address name ssn

Person

buys

makes

employs

CompanyProduct

name category

stockprice

name

price

“A product is made by at most one company”

Page 10: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

10

Referential Integrity Constraints

CompanyProduct makes

CompanyProduct makes

Each product made by at most one company.(Some products made by no company)

Each product made by exactly one company.

Page 11: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

11

Multi-way Relationships

Purchase

Product

Person

Store

Page 12: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

Q: what does the arrow mean ?

Rental

VideoStore

Person

Movie

Invoice

Arrows in Multiway Relationships

A: if I know the store, person, invoice, I know the movie too

Page 13: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

13

Q: what do these arrow mean ?

Rental

VideoStore

Person

Movie

Invoice

Arrows in Multiway Relationships

A: store, person, invoice determines movie and store, invoice, movie determines person

Page 14: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

14

Roles in Relationships

Purchase

What if we need an entity set twice in one relationship?

Product

Person

Store

salesperson buyer

Page 15: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

15

Attributes on Relationships

Purchase

Product

Person

Store

date

Page 16: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

Product

name category

price

isa isa

Educational ProductSoftware Product

Age Groupplatforms

Subclasses

Page 17: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

17

Design Principles

PurchaseProduct Person

What’s wrong?

Page 18: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

18

Design Principles:What’s Wrong?

Purchase

Product

Store

date

personNamepersonAddr

Moral: pick the right kind of entities.

Page 19: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

19

Design Principles:What’s Wrong?

Purchase

Product

Person

Store

dateDates

Moral: don’t complicate life unnecessarily.

Page 20: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

20

From E/R Diagramsto a Relational Schema

• Entity set relation• Relationship relation

Page 21: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

21

Entity Set to Relation

Product

name category

price

Product(name, category, price)

name category price

gizmo gadgets $19.99

Page 22: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

22

Relationships to Relations

makes CompanyProduct

name category

Stock price

name

Makes(product-name, product-category, company-name, year)

Start Yearprice

(watch out for attribute name conflicts)

Page 23: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

23

Relationships to Relations

makes CompanyProduct

name category

Stock price

name

No need for Makes. Modify Product:

Product(name, category, price, startYear, companyName)

Start Yearprice

Page 24: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

24

Multi-way Relationships to Relations

Purchase

Product

Person

Storename price

ssn name

name address

Purchase(prodName,stName,ssn)

Page 25: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

25

Product

name category

price

isa isa

Educational ProductSoftware Product

Age Groupplatforms

Name Price Category

Gizmo 99 gadget

Camera 49 photo

Toy 39 gadget

Name platforms

Gizmo unix

Name Age Group

Gizmo todler

Toy retired

Product

Sw.Product

Ed.Product

Page 26: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

Normalization

26

Page 27: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

Evils of Redundancy• When a database schema is poorly designed we get

anomalies.• Redundancy is at the root of several problems

associated with relational schemas:

Redundant storage: data is repeatedUpdate anomalies: need to change in several placesInsertion anomalies: may not be able to add data we want toDeletion anomalies: may lose data when we donʼt want to

Page 28: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

AnomaliesHourly_emps( ssn, name, lot, rating, hourly_wages, hours_worked)

Suppose hourly wages is determined by rating. rating → hourly_wages

ssn name lot rating hourly_wages hours_worked

29 brutus 48 8 10 40

85 art 22 8 10 30

95 bob 35 5 7 30

96 frodo 35 5 7 32

22 dustin 35 8 10 40

•Redundant storage: association between rating 8 and hourly wages 10 repeated 3 times. •Update anomalies: hourly_wages updated in first tuple but not second•Insertion anomalies: must know hourly_wage for rating value•Deletion anomalies: delete all tuples with certain rating value, lost assoc.

Page 29: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

Can null values fix problems?• Not really.• Insertion anomaly:

– What if we know rating and hourly_wages for some rating, but there is no employee with that rating?

– No. ssn canʼt be null.• Deletion anomaly:

– If last employee with some rating and hourly_wages value is deleted, replace with nulls?

– No. ssn canʼt be null.29

Page 30: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

Schema Refinement

• Integrity constraints, in particular functional dependencies, can be used to identify schemas with such problems and to suggest refinements.

• Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).

• Decomposition should be used judiciously: Is there reason to decompose a relation? What problems (if any) does the decomposition cause?

29

Page 31: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

Data Anomalies

Anomalies:• Redundancy = repeat data• Update anomalies = Fred moves to “Bellevue”• Deletion anomalies = Joe deletes his phone number: what is his city ?

Persons may have several phones:

SSN Name, City

Name SSN PhoneNumber City

Fred 123-45-6789 206-555-1234 Seattle

Fred 123-45-6789 206-555-6543 Seattle

Joe 987-65-4321 908-555-2121 Westfield

but not SSN PhoneNumber

Page 32: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

32

Relation DecompositionBreak the relation into two:

Name SSN City

Fred 123-45-6789 Seattle

Joe 987-65-4321 Westfield

SSN PhoneNumber

123-45-6789 206-555-1234

123-45-6789 206-555-6543

987-65-4321 908-555-2121

Anomalies have gone:• No more repeated data• Easy to move Fred to “Bellevue” (how ?)• Easy to delete all Joeʼs phone number (how ?)

Name SSN PhoneNumber City

Fred 123-45-6789 206-555-1234 Seattle

Fred 123-45-6789 206-555-6543 Seattle

Joe 987-65-4321 908-555-2121 Westfield

Page 33: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

33

Decompositions in General

R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

Page 34: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

34

Lossless Decomposition

• Sometimes it is correct:Name Price Category

Gizmo 19.99 GadgetOneClick 24.99 Camera

Gizmo 19.99 Camera

Name Price

Gizmo 19.99OneClick 24.99

Name Category

Gizmo GadgetOneClick Camera

Gizmo Camera

Page 35: Data Modeling CMPSCI 445avid.cs.umass.edu/courses/445/f2008/07-DataModeling.pdf · Data Modeling CMPSCI 445 Fall 2008. Exercise • Design a schema to store the data used by ... Evils

• Sometimes it is not:

Lossy Decomposition

Name Price Category

Gizmo 19.99 Gadget

OneClick 24.99 Camera

Gizmo 19.99 Camera

Name Category

Gizmo Gadget

OneClick Camera

Gizmo Camera

Price Category

19.99 Gadget

24.99 Camera

19.99 Camera

Whatʼswrong ??