cpsc 504: data management review of relational model 1/2 laks v.s. lakshmanan dept. of cs ubc

21
CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

Post on 20-Dec-2015

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

CPSC 504: Data ManagementReview of Relational Model

1/2Laks V.S. Lakshmanan

Dept. of CS UBC

Page 2: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

The Basic Model

• A DB = a (finite) set of (finite) relations. • A relation has a well-defined schema – its

set of attributes, each with its domain: e.g., dom(songName), dom(Artist), dom(AlbumNo), dom(fileFormat), …

• A relation – a subset of the Cartesian (aka cross) product of its attribute domains:

• e.g., myfavs dom(AlbumNo) x dom(songName) x dom(Artist) x dom(fileFormat).

Page 3: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

How do you get there?

• In practice, you often use the ER (for entity-relationship) model as an informal spec of a design of a relational database schema.

• Schema = structure or table template.

• Relation = table instance.

Page 4: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

Example ER schema

Actor MovieStars in

Name

DOB Addr

Role

Title Year Dir

A schema in the ER model is specified using an ER diagram.

Page 5: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

Some constraints• Each relation has a key: candidate keys,

primary keys, superkeys, …? • Functional dependency (FD): formalization

and generalization of key. • r(R), X, Y R. r |= XY iff any pair of tuples

in r that agree on X, necessarily agree on Y. • e.g.: AlbumNoArtist, AlbumNo, songNameAlbumNo, …,

fileFormat. • {AlbumNo, songName} – a key, but not

AlbumNo. • Note: ER diagrams can express a rich class

of (but not all) constraints.

Page 6: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

More on FDs • FDs help determine keys: there are efficient

algorithms as well as an assortment of (computationally) hard problems.

• FDs are pointers to redundancies in data: e.g., same Artist (value) will appear many times, once for each song in an album of that artist.

• Why is redundancy (in representation) bad? efficient algorithms for redundancy removal –

the so-called normal forms. • There is a full assortment of them – 1NF, 2NF,

3NF, BCNF, 4NF, … • FDs/keys help with indexing, with query

optimization, with data exchange, etc.

Page 7: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

More on Normal Forms

• 1NF really says tuples must be “flat”. • Other NFs really seek to capture different

degrees of redundancy (and eliminate them).

• 3NF, BCNF – some of the most important ones in practice.

• Normalization algorithms: start from U, the set of all attributes, and figure out which attr sets should stay together, which ones should split up.

Page 8: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

Some more on FDs

• Constraints such as FDs are called integrity constraints. Why?

• ICs are statements in predicate logic. • What does an FD really say? • Need to reason with ICs, so:

– Can determine keys, – Can check if certain ICs are enforced. – Can understand interaction of ICs.

Page 9: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

Last bits on FDs

• FDs admit a sound and complete axiom system: – Reflexivity: if X Y, then YX. – Augmentation: if WZ and X Y, then

WYZX. – Transitivity: if XY and YZ, then XZ.

• Not all classes of ICs admit a S&C axiom system.

• Some consequences of the above axiom system for FDs.

Page 10: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

ICs, Tableaux, and Queries • An FD can also be stated in a

“pictorial” form: A B C D

x1 x2 _ _

x1 x3 _ _

x2 = x3

Schema/columnheaders

Pattern rows.

Summary row/ Consequent/Conclusion.

Page 11: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

ICs, Tableuax, and Queries

• Tableaux are general; can express what conditions must hold when certain patterns are present in data.

• Can express equality generating dependencies (egd’s) and tuple generating dependencies (tgd’s).

• Can also express queries (by using summary row for saying what you want in the output).

Page 12: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

ICs, Tableaux, and Queries • Tableaux can also be expressed in

the form of rules, also known as “Horn clauses”.

• Examples: X2 = X3 r(X1,X2,_,_), r(X1,X3,_,_).

Which really is saying … X2 = X3 r(X1,X2,Y3,Y4),

r(X1,X3,Y5,Y6). Logically speaking, what are these

rules saying? p(X,Y) a(X,Z), b(Z,Y), a(Y,W), W>100.

Page 13: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

Some more on NFs – 3NF

• Def. of 3NF: r(R), given set of applicable FDs F, is in 3NF iff for every XA that holds for r, X is a superkey for r according to F OR A belongs to some key.

• e.g.: r(AlbumNo, Artist, songName, fileFormat) is not in 3NF, while r1(AlbumNo, Artist), r2(AlbumNo, songName, fileFormat) are. Why?

Page 14: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

Some more on NFs – BCNF

• Def. of BCNF: r(R), given applicable FDs F, is in BCNF iff for every XA that holds for r, X is a superkey for r according to F.

• Clearly stronger than 3NF. • e.g.: {S(treet), (s)T(ate), Z(ip)}, with

STZ, ZT. – it’s in 3NF. • Not in BCNF. What’s the redundancy here? • {ZT, ZS} is in BCNF. • Can you anticipate any problems with the

design {ZT, ZS}? • Majority of 3NF designs in practice tend to

be in BCNF.

Page 15: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

More on ICs

• Inclusion depencency (IND): r1(Emp, Dept, Sal), r2(Mgr, Dept, Budget). r2[Mgr] r1[Emp]: every manager is an employee.

• INDs are a generalization of referential integrity constraints.

• FDs, INDs. – most fundamental. • What can we say about reasoning

with INDs, or with INDs + FDs?

Page 16: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

More on INDs

• Good news -- a complete axiomatization. • Reflexivity: r[X]r[X]. • Projection+permutation: if r[A1,…,Am]

s[B1…Bm], then r[Ai1…Aik] s[Bi1…Bik]. • Transitivity: if r[X] s[Y] and s[Y]

t[Z], then r[X] t[Z]. • But, while testing whether F |= f can be

done in linear time, testing I |= i is PSPACE-complete!

Page 17: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

Story of FDs+INDs

• Recall: FDs admit complete axiomatization, as do INDs.

• But, put together, there is no complete (k-ary) axiomatization for INDs+FDs!

• We have seen some (very) positive results and some negative results.

• What approaches can you think of for tackling the negative situations?

Page 18: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

Last bits on ICs • There is a rich landscape of lot more classes of

ICs. • Fundamentally split into two syntactic classes:

– Equality generating dependencies (egds), e.g., FDs. – Tuple generating dependencies (tgds), e.g., INDs.

• The value is NOT that these fancier ICs arise as is in practice.

• Rather, they arise in solving various problems: e.g., integrating data from diverse sources.

• E.g. tgd: (\forall M,D,B) [r2(M, D, B) (\exists S) r1(M, D, S)].

• What is it saying, logically speaking?

Page 19: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

More Reading

• J.D. Ullman: Principles of Database and Knowledge-Base Systems, vol. I and II. CS Press, 1988.

• R. Ramakrishnan & J. Gehrke. Database Management Sytems. Mc-Graw Hill, latest edition.

• S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. 1995. Available online from DBLP.

Page 20: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

Some Resources for Research Literature

• DBLP • Citeseer• ACM digital library• If you aren’t already a member,

consider becoming ACM student member: really a great deal!

Page 21: CPSC 504: Data Management Review of Relational Model 1/2 Laks V.S. Lakshmanan Dept. of CS UBC

Some top conferences/journals for DMM Research

• ACM SIGMOD/PODS • VLDB / PVLDB • IEEE ICDE • ICDT / EDBT • KDD, PKDD, ICDM, SDM, … • CIKM • WWW, WSDM, ICWSM, • TODS, IS, VLDBJ, TKDE, TCS, JACM,

JCSS, I&C, …