introduction to databases: relational and xml models and languages instructors: bertram ludaescher...
TRANSCRIPT
Introduction to Databases:Introduction to Databases:Relational and XML Models Relational and XML Models and Languagesand Languages
Introduction to Databases:Introduction to Databases:Relational and XML Models Relational and XML Models and Languagesand Languages
Instructors:
Bertram LudaescherKai Lin
Instructors:
Bertram LudaescherKai Lin
2Introduction to Databases, B. Ludaescher & K. Lin
Overview
• 09:15-10:20 Relational Databases (1h05’)• 10:20-10:30 BREAK (10’)• 10:30-11:50 Relational Databases (1h20’)• 11:50-13:15 LUNCH (1h25’)• 13:15-13:45 Demo & Hands-on (30’)• 13:45-15:10 XML: Basics (1h25’)• 15:10-15:30 BREAK (20’)• 15:30-16:30 XML: Querying (1h)• 16:30-17:00 Demo & Hands-on (30’)
3Introduction to Databases, B. Ludaescher & K. Lin
Scope• Today: Introduction to Databases, in
particular– Relational database model– Relational Operations and Queries– Constraints– XML “data” model– Querying and transforming XML– Some demos & simple hands-on exercise
• Tomorrow: – Introduction to Knowledge Representation and
Ontologies
• But first: … déjà vu …
4Introduction to Databases, B. Ludaescher & K. Lin
5Introduction to Databases, B. Ludaescher & K. Lin
What is a Database System?
• Database (system) = – Database Instance (set of tables of rows)– Database Management System (DBMS)
• Origins in the commercial world: – to organize, query, and manipulate data
more effectively, efficiently, and independently
• Scientific databases– often special features:
• spatial, temporal, spatiotemporal, GIS, units, uncertainty, raw & derived data, …
6Introduction to Databases, B. Ludaescher & K. Lin
Why not just use files as “databases”?• For some applications: yeah… why not?• But in general:
– scanning & ’grep’ing large files can be very inefficient– no language support for selecting desired data, joining
them, etc.• cannot express the kinds of questions/queries you’d like to ask• ‘grep’ is no substitute for a query language
– redundant and/or inconsistent storage of data– no transaction management and concurrency control
among multiple users– no security– no recovery– no data independence (application data)– no data modeling support– …
7Introduction to Databases, B. Ludaescher & K. Lin
Features of a Database System• A data model (relational, object-
oriented, XML) prescribes how data can be organized:– as relations (tables) of tuples (rows)– as classes of (linked) objects– as XML trees
• A (database) schema (stored in the “data dictionary”) defines the structure of a specific database instance:– Relational schema– OO schema– XML Schema (or XML DTD)
8Introduction to Databases, B. Ludaescher & K. Lin
Features of a Database System• Data is treated uniformly and separately from the
application• Efficient data access • Queries and views are expressed over the schema• Integrity constraints (checking and enforcement)• Transactions combine sets of operations into
logical units (all-or-nothing)• Synchronization of concurrent user transactions• Recovery (after system crash)
– not to be confused w/ backup– instead: guarantee consistency by “roll-back” of partially
executed transactions (how? Hint: logging)
• …
9Introduction to Databases, B. Ludaescher & K. Lin
DB features, e.g., Concurrency Control
• Concurrent execution of simultaneous requests – long before web servers where around... – transaction management guarantees consistency
despite concurrent/interleaved execution• Transaction (= sequence of read/write operations)
– Atomicity: a transaction is executed completely or not at all
– Consistency: a transaction creates a new consistent DB state, i.e., in which all integrity constraints are maintained
– Isolation: to the user, a transaction seems to run in isolation
– Durability: the effect of a successful (“committed”) transaction remains even after system failure
10Introduction to Databases, B. Ludaescher & K. Lin
Levels of Abstraction: Architecture Overview
DBinstances
Physical level
Logical (“conceptual”) level
View 1 View 2 View nConceptual…
Level
Index structures
Tables
Export schemas
ER-Model(Entity-Relationship) OO Models (Classes…)
part of DB design conceptual design
… often lost in the process…
logical dataindependence
physical data independence
User
11Introduction to Databases, B. Ludaescher & K. Lin
Database Design: Entity-Relationship (ER) Model
• Entities:• Relationships:• Attributes:• ER Model:
– initial, high-level DB design (conceptual model)– easy to map to a relational schema (database tables)– comes with more constraints (cardinalities, aggregation) and
extensions: EER (is-a => class hierarchies)– related: UML (Unified Modeling Language) class diagrams
Employee Departmentworks-for
Name Salary ManagerName
since
12Introduction to Databases, B. Ludaescher & K. Lin
The Relational Model
• Relation/Table Name:– employee, dept
• Attributes = Column Names:– Emp, Salary, DeptNo, Name, Mgr
• Relational Schema:– employee(Emp:string,
Salary:integer, DeptNo:integer), ...
• Tuple = Row of the table:– (“tom”, “60000”, “1”)
• Relation = Set of tuples:– {(...), (...), ...}
Emp Salary DNotom 60k 1tim 57k 1sally 45k 3carol 30k 1carol 35k 2….
Employee
DNo Name Mgr 1 Toys carol2 Comp. carol3 Shoes sam
Department
FK: foreign key, pointing to another key
13Introduction to Databases, B. Ludaescher & K. Lin
Ex: Creating a Relational Database in SQL
CREATE TABLE employee (ssn CHAR(11),name VARCHAR(30),deptNo INTEGER,PRIMARY KEY (ssn),
FOREIGN KEY (deptNo) REFERENCES department )
CREATE TABLE department (deptNo INTEGER,name VARCHAR(20),manager CHAR(11),PRIMARY KEY (deptNo),
FOREIGN KEY (manager) REFERENCES employee(ssn) )
14Introduction to Databases, B. Ludaescher & K. Lin
What is a Query?• Intuitively:
– An “executable question” in terms of a database schema
– Evaluating a query Q against a database instance D yields a set of answer objects:
• Relational tuples or XML elements
• Example:– Who are the employees in the ‘Toys’ dept.?– Who is (are) the manager(s) of ‘Tom’?– Show all pairs (Employee, Mgr)
• Technically: – A mapping from an input schema (the given table
schemas) to a result schema (the new columns you are interested in) defined in some query language
15Introduction to Databases, B. Ludaescher & K. Lin
Why (Declarative) Query Languages?
• Things we talk and think about in PLs and QLs … – Assembly languages:
• registers, memory locations, jumps, ...
– C and the likes: • if-then-else, for, while, memory (de-)allocation,
pointers, ...
– Object-oriented languages:• C++: C plus objects, methods, classes, ...• Java: objects, methods, classes, references, ... • Smalltalk: objects, objects, objects, ... • OQL: object-query language
,,Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt.”“The limits of my language mean the limits of my world.”
Ludwig Wittgenstein, Tractatus Logico-Philosophicus
“If you have a hammer, everything looks like a nail.”
16Introduction to Databases, B. Ludaescher & K. Lin
Why (Declarative) Query Languages?
• Things we talk and think about in PLs and QLs …– Functional languages (Haskell, ML):
• (higher-order) functions, fold(l|r), recursion, patterns, ...
=> Relational languages (SQL, Datalog)• relations (tables), tuples (rows); conceptual level: ER• relational operations: , , , , ..., ,,,,,..., , , |X|
=> Semistructured/XML (Tree) & Graph Query Languages
• trees, graphs, nodes, edges, children nodes, siblings, … • XPath, XQuery, …
• Also: – Focus on what, and not how!
17Introduction to Databases, B. Ludaescher & K. Lin
Example: Querying a Relational Database
Emp SalaryDeptNoanne 62k 2john 60k 1
Employee
DeptNoMgr
1 anne2 anne
Department
SELECT e.Emp, d.MgrFROM Employee e, Department dWHERE e.DeptNo = d.DeptNo
Emp Mgrjohn anneanne anne
result
join
input tables
SQL query (or view def.)
answer (or view)
we don’t say how to evaluate this expression
18Introduction to Databases, B. Ludaescher & K. Lin
Example Query: SQL vs DATALOG
• “List all employees and their managers”
• In SQL:SELECT e.name, d.managerFROM Employee e, Department dWHERE e.deptNo = d.deptNo
• In DATALOG:q(E, M) :- employee(E, S, D), department(D, N,
M).
a “join” operation
19Introduction to Databases, B. Ludaescher & K. Lin
Important Relational Operations• select(R, Condition)
– filter rows of a table wrt. a condition
• project(R, Attr) – remove unwanted columns; keep rest
• join(R1, A2, R2, A2, Condition) – find “matches” in a “related” table – e.g. match R1.foreign key = R2.primary key
• cartesian product(R1, R2)• union (“OR”), intersection (“AND”)• set-difference (“NOT IN”)
20Introduction to Databases, B. Ludaescher & K. Lin
Relational Operations (in DATALOG)
condition
Y1=Y2Y
independent
same
multiple rules union results
(query) output :– (query) input(query) output :– (query) input
21Introduction to Databases, B. Ludaescher & K. Lin
Demo
Relational Queries in DATALOG
22Introduction to Databases, B. Ludaescher & K. Lin
Queries, Views, Integrity Constraints• … can all be seen as “special queries”
• Query q(…) :- … ad-hoc queries
• View v(…) :- … exported views;
• Integrity Constraints– ic (…) :- …. MgrSal < EmpSal … – say what shouldn’t happen– if it does: alert the user (or refuse an update,
…)
23Introduction to Databases, B. Ludaescher & K. Lin
Query Evaluation vs Reasoning• Query evaluation
– Given a database instance D and a query Q, run Q(D)– What databases do all the time
• Reasoning (aka “Semantic Query Optimization”)– Given a query Q and a constraint C, “optimize” Q&C
(e.g., given C, Q might be unsatisfiable)– Given Q1 and Q2 decide whether Q1 Q2– Given Q1,Q2, C decide whether Q1 Q2 | C
– Note: we are NOT given a database instance D here; just the schema and the query/IC expressions
24Introduction to Databases, B. Ludaescher & K. Lin
Summary QLs for Relational Databases
Natural Hoin: same attribute name add condition that values must match
25Introduction to Databases, B. Ludaescher & K. Lin
Relational Algebra
26Introduction to Databases, B. Ludaescher & K. Lin
Relational Algebra
27Introduction to Databases, B. Ludaescher & K. Lin
Relational Algebra
28Introduction to Databases, B. Ludaescher & K. Lin
Relational Algebra
29Introduction to Databases, B. Ludaescher & K. Lin
Relational Algebra
30Introduction to Databases, B. Ludaescher & K. Lin
Hands-on Part
DBDesigner 4
31Introduction to Databases, B. Ludaescher & K. Lin
DBDesigner 4 • An open source (GPL) database design tool 1. Goto http://www.fabforce.net/dbdesigner4/ 2. Download and install (5 min)3. Open the example schema Order (File -> Open -> Order), and
examine the relations between the tables forumtopic and forumpost. Find the foreign key used in the relation postHasTopic (5 min)
4. Connect to a sample MySQL database host: geon07.sdsc.edu port: 3306 database: summer_institute username: root password: [blank]
(5 min)5. Select all records in the table forumtopic (2 min)6. Select all records in the table forumpost, and sort the result
according to their idforumpost (3min)7. Find all forum posts with the topic Cars (5 min)8. Insert a record into the table forumpost (5 min)
32Introduction to Databases, B. Ludaescher & K. Lin
Additional Material
(not presented)
33Introduction to Databases, B. Ludaescher & K. Lin
Non-Relational Data Models• Relational model is “flat”: atomic data values
– nesting is modeled via “pointers” (foreign keys) and “Skolem-ids”
– extension: nested relational model (“tables within tables”, cf. nested HTML tables)
– values can be nested lists {...}, tuples (...), sets [...]– ISO standard(s): SQL– identity is value based
• Object-oriented data model:– complex (structured) objects with object-identity (oid)– class and type hierarchies (sub-/superclass, sub-/supertype)– OODB schema may be very close to “world model” (no
translation into tables)(+) queries fit your OO schema(-) (new) queries that don’t fit nicely
– ODMG standard, OQL (Object Query Language)
34Introduction to Databases, B. Ludaescher & K. Lin
Example: Object Query Language (OQL)
• Q: what does this OQL query compute?• Note the use of path expressions like e.manager.children=> Semistructured/Graph Databases
SELECT DISTINCT STRUCT( E: e.name, C: e.manager.name, M: ( SELECT c.name FROM c IN e.children
WHERE FOR ALL d IN e.manager.children: c.age > d.age ) ) FROM e IN Employees;
35Introduction to Databases, B. Ludaescher & K. Lin
A Graph Database
36Introduction to Databases, B. Ludaescher & K. Lin
Querying Graphs with OO-Path Expressions
?- dblp."Inf. Systems".L.P, substr("Volume",L), P : person.spouse[lives_in = P.lives_in].
?- dblp."Inf. Systems".L."Michael E. Senko".Answer:
L="Volume 1, 1975”;L="Volume 5, 1980".
37Introduction to Databases, B. Ludaescher & K. Lin
Constructs for Querying Graphs
Example: ?- dblp . any* . (if(vldb)| if(sigmod))
38Introduction to Databases, B. Ludaescher & K. Lin
Keys, Keys, and more Keys• A key is a minimal set of attributes that:
– uniquely identify a tuple– determine every other attribute value in a tuple
• There may be many keys for a relation; we designate one as the primary key
• The phrase candidate key is used in place of “key” where “the key” denotes the primary key
• A superkey is a superset of a key (i.e., not necessarily minimal)
39Introduction to Databases, B. Ludaescher & K. Lin
Example of “good” design
Employee(EName, SSN, BDate, Address, DNumber)Employee(EName, SSN, BDate, Address, DNumber)
Example of “bad” design (why is it bad?)
Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)
Normalization of Relations
Department(DName, DNumber, DMgrSSN)Department(DName, DNumber, DMgrSSN)
40Introduction to Databases, B. Ludaescher & K. Lin
The description of the department (DName, DMgrSSN) is repeated for every employee that works in that department.
Redundancy!
The department is described redundantly.This leads to update anomalies! (… and wastes space)
Digression (for experts only): • redundancy can be used to increase performance; e.g. “materialized views”)
What’s Wrong?
Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)
41Introduction to Databases, B. Ludaescher & K. Lin
Update Anomalies (caused by redundancy)
Insertion AnomaliesIf you insert an employee
Need to know which department he/she worksNeed to know the description information for that department
If you want to insert a department, you can’t … until there is at least one employee
Deletion Anomalies: if you delete an employee, is that dept. gone? was this the last employee in that dept?
* Modification Anomalies: if you change DName, for example, it needs to be changed everywhere!
42Introduction to Databases, B. Ludaescher & K. Lin
Null values also cause problemsNull values might help in special cases, but are
not a general solution to update anomalies
For example, they may:– Waste space– Make it hard to specify and understand joins– Make it hard to aggregate (count, sum, etc.)– Have different meanings:
• Attribute does not apply to this tuple• Attribute value is unkown• Value is known but absent (not yet recorded)
– Causes problems with queries (can’t interpret query answers)
43Introduction to Databases, B. Ludaescher & K. Lin
Why worry about normalization?
• To … • … reduce redundancy & update anomalies
• … reduce the need for null values
• Last not least: it’s a solved problem:•all algorithms & proofs have been worked out
Here: Normalization based on FDs (there’s more…)
44Introduction to Databases, B. Ludaescher & K. Lin
Functional DependenciesStatement that if two tuples agree on attributes A
they must agree on attributes B
A B
If the value of the first attribute(s), A, is known, then the value of the second attribute(s), B, is known (read “A determines B”)
We want to know if it is always true in the application
45Introduction to Databases, B. Ludaescher & K. Lin
Functional Dependencies
Examples of functional dependencies: social-security-number employee-name course-number course-title
Examples that are NOT functional dependencies
course-number - book course-number - car-color
46Introduction to Databases, B. Ludaescher & K. Lin
Remember what it means to be a function:
x f(x) x g(x) x h(x)1 2 1 2 1 101 3 2 2 2 202 5 3 5 3 303 5
we are looking for functional relationships(that must occur in a relation) among attribute values
What is a functional dependency?
f is not a functionfor x=1, f(x) is not unique
47Introduction to Databases, B. Ludaescher & K. Lin
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)
EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)
What are the FDs?
48Introduction to Databases, B. Ludaescher & K. Lin
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)
EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)
12
3
4
5
6
7
8
9
10
What are the FDs?
49Introduction to Databases, B. Ludaescher & K. Lin
EMPLOYEE (SSN, NAME, SALARY, JOB_DESC)
Why do we care?
If all FDs are “implied by the key”
it means that the DBMS only enforces keys (not FDs) and the DBMS is going to enforce the keys anyway
otherwise, we have to perform expensive operations to maintain consistency (code or check statements)
50Introduction to Databases, B. Ludaescher & K. Lin
Decomposition
Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD) based on the projection operator.
Decomposition should be used judiciously:– Is there reason to decompose a relation?
(via Normal Forms)– What problems (if any) does the
decomposition cause? (lost information or dependencies?)
51Introduction to Databases, B. Ludaescher & K. Lin
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)
EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)
EMP1(SSN, ENAME, BDATE, ADDRESS, DNUM)
X(DNUM, DNAME, DMGRSSN)
EMP2(SSN, ENAME)
X(PNUM, PNAME, PLOCATION)
Y(SSN, PNUM, HOURS)
12
3 4
5 6
7
8 9
10
How can we decompose using the Project Operator?
52Introduction to Databases, B. Ludaescher & K. Lin
Correct DecompositionsHow do we know if a decomposition is correct?That we haven’t lost anything?
• We have three goals:
lossless-join decomposition (don’t throw any information away)(be able to reconstruct the original relation)
dependency preservationall of the FDs end up in just one relation (not split across two or more relations)
Boyce-Codd Normal Form (BCNF) - no redundancy
53Introduction to Databases, B. Ludaescher & K. Lin
Lossless Decompositions• What is a lossless decomposition?• What is a lossy decomposition?
When R is decomposed into R1 and R2
Check to see if (R1 R2) = R
if it is a lossy decomposition, then R1 R2
gives you TOO MANY tuples.
Note: we are doing a natural join
54Introduction to Databases, B. Ludaescher & K. Lin
Example: a lossy decomposition
Now join them using natural join: we get extra tuples!!!
1 smith p2 billing
Original: Employee SSN Name Project Ptitle
1 Smith p1 accounting 2 Jones p1 accounting 3 Smith p2 billing
Decomposition: Employee SSN Name Project PID Ptitle Name
1 Smith p1 accounting Smith 2 Jones p1 accounting Jones 3 Smith p2 billing Smith
55Introduction to Databases, B. Ludaescher & K. Lin
Testing for a Lossless DecompositionFor decomposition into two relations
Let R1 and R2 form a decomposition of R.
R1 and R2 are both sets of attributes from R.
For the decomposition to be lossless ... The attributes in common must be a key for 1 of
the relations!
Note: You are joining a key and a foreign key
56Introduction to Databases, B. Ludaescher & K. Lin
Example: test for a lossless decomposition
Employee(SSN, name, project, p-title)
decomposition: Employee (SSN, name) Project (project, p-title, name)
Which attribute is in common? Employee.Name and Project.Name
Is name a key for either of these two tables?NO! We have a problem.
57Introduction to Databases, B. Ludaescher & K. Lin
Example: test for a lossless decomposition
Employee(SSN, name, project, p-title)
decomposition: Employee (SSN, name) Project (project, p-title)
Which attribute is in common? None
Is this decomposition lossless? NO! We have a problem.
58Introduction to Databases, B. Ludaescher & K. Lin
Example: test for a lossless decomposition
Employee(SSN, name, project, p-title)
decomposition: Employee (SSN, name, project) Project (project, p-title)
Which attribute is in common? Employee.project and Project.project
Is project a key for either of these two tables?YES! We have a lossless decomposition.
59Introduction to Databases, B. Ludaescher & K. Lin
Some Preliminary Normal Form Definitions
Prime Attribute - an attribute A is prime if it is a member of a key, otherwise it is nonprime
Partial Dependency - given an FD XY, Y is partially dependent on X if there is a proper subset X of X such that XY.
Transitive Dependency - A is transitively
dependent upon X if XY, Y-/X, and YA and AXY.
60Introduction to Databases, B. Ludaescher & K. Lin
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)
EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)
12
3
4
5
6
7
8
9
10
What are the FDs?
61Introduction to Databases, B. Ludaescher & K. Lin
Normal Forms Based on FDs
1NF - all attribute values (domain values) are atomic(part of the definition of the relational model)
2NF - 1NF + no key partial dependencies (every nonprime attribute fully depends on every key of R)
R(A B C D) B C (not allowed)
3NF - 2NF + no nonprime attribute is transitively dependent upon any key
R(A B C D) AB C, C D (not allowed)
BCNF - 3NF + no attribute is transitively dependent upon any key (all FDs determined by a key)
R(A B C D) AB C, C B (not allowed)
62Introduction to Databases, B. Ludaescher & K. Lin
BCNF, Lossless, and Dependency-Preserving
(first choice)
3NF, Lossless and Dependency-Preserving
(second choice)
because sometimes you can’t preserve all dependencies
What’s the Goal?
63Introduction to Databases, B. Ludaescher & K. Lin
Finding all of the FDsArmstrong’s Axioms
Reflexivity: If X Y, then X YTrivially, all attributes A A, e.g., name name and gender gender
Augmentation: If X Y, then XZ YZ for any ZTransitivity: If X Y and Y Z, then X Z
X, Y, and Z are sets of attributes in RArmstrong’s Axioms are a sound & complete set of inference rules
Union: If X Y and X Z, then X YZDecomposition: If X YZ, then X Y and X Z
64Introduction to Databases, B. Ludaescher & K. Lin
Finding all FDs: the closure of a set of FDs
The closure of a set of FDs:
Let F be a set of FDs and F+ the closure of F.
F+ is the set of all FDs implied (or derivable) from F using Armstrong’s Axioms
F+ is computed by applying the inference rules until no new FDs can be found
65Introduction to Databases, B. Ludaescher & K. Lin
What is “Dependency Preserving”
Suppose F is the original set of FDs and G is the set of FDs after decomposition
If we compute F+ and G+, and F+ = G+ then the decomposition is dependency preserving
66Introduction to Databases, B. Ludaescher & K. Lin
Other Results (textbook)
Algorithms To:– Compute F+
– Compute the Minimal Cover for F– Find a dependency-preserving
decomposition into 3NF– Find a lossless-join decomposition into
BCNF– Find a lossless-join & dependency
preserving decomposition into 3NF
67Introduction to Databases, B. Ludaescher & K. Lin
addr(number, street, city, state, zip)
number, street, city, state zipzip state
If we decompose, this FD won’t occur within one relation. Thus, since the DBMS only enforces keys (and not FDs directly), this can’t be enforced.
(If we leave it alone, it is in 3NF)
Example: Not dependency preserving
68Introduction to Databases, B. Ludaescher & K. Lin
Lossless join decomposition algorithm1. Set D = {R} (the current set of relations)
2. While there is a relation in D that’s not in BCNFChoose a relation scheme Q that is not in BCNFFind a FD X Y in Q that violates BCNFReplace Q in D by (Q – Y) and (X Y)
End While;
3. Identify dependences that are not preserved (X A).
4. Add XA as a table to the set D
69Introduction to Databases, B. Ludaescher & K. Lin
Tuning the Conceptual Schema
The choice of conceptual schema should be guided by the workload, in addition to redundancy issues:
– We may settle for a 3NF schema rather than BCNF.
– E.g., the workload may influence our choice to decompose a relation into 3NF or BCNF.
– We may further decompose a BCNF schema!
– We might denormalize (i.e., undo a decomposition step), or add fields to a relation. We must take care to avoid the problems caused by the redundancy!
70Introduction to Databases, B. Ludaescher & K. Lin
Decomposition of one table into several
A relation is replaced by a collection of relations that are projections. Most important case. (Vertical partitioning)
• Sometimes, we might want to replace a relation by a collection of relations that are selections. (Horizontal partitioning)
– Each new relation has the same schema as the original, but a subset of the rows.
– Collectively, new relations contain all rows of the original. Typically, the new relations are disjoint.
Why might we do this?
71Introduction to Databases, B. Ludaescher & K. Lin
Tuning the Conceptual Schema
Vertical decomposition using the project operator
Coursec# cname instructor room days
Course2c# cname
Course1c# room days
Course3c# instructor
72Introduction to Databases, B. Ludaescher & K. Lin
Tuning the Conceptual Schema
Horizontal partition/decomposition using the select operator
Coursec# cname instructor room days
Undergraduate-Coursec# cname instructor room days
Graduate-Coursec# cname instructor room days
73Introduction to Databases, B. Ludaescher & K. Lin
Tuning the Conceptual Schema
Denormalizing……always introduces redundancy!Course-Offering (offering#, quarter, c#, instructor, time, days)
Course-Offering (offering#, quarter, c#, cname, instructor, room, time, days)
This introduces redundancy - which must be managed.
Course (c#, cname) Instructor-Room (instructor, room)
74Introduction to Databases, B. Ludaescher & K. Lin
Denormalization interferes with dependency preservation
Course-Offering (offering#, quarter, c#, cname, instructor, room, time,
days)
Note that having the FD instructor room (and nothing else) in a single table allows the FD to be enforced (by enforcing the candidate key of instructor).
Instructor-Room (instructor, room)
Denormalization