Download - Dzielenie relacyjne (Relational division) Bazy i hurtownie danych, TWO1, 2010 dbdw-winter_2010-11
Dzielenie relacyjne(Relational division)
Bazy i hurtownie danych, TWO1, 2010
https://ophelia.cs.put.poznan.pl/webdav/dbdw/students/dbdw-winter_2010-11/
Bazy i hurtownie danych 2
References
1. V.M. Matos, R. Grasser, Assessing performance of the relational division operator. Data Base Management 2001, 22-20-30, 1-15
2. V.M. Matos, R. Grasser, A Simpler (and Better) SQL Approach to Relational Division, Journal of Information System Education 2002, 13 (2), 85-88.
21/11/2010
Bazy i hurtownie danych 3
Relational Division
The basic operators of the relational algebra:– Union (UNION)– Difference (MINUS/EXCEPT)– Cartesian product– Projection & selection (SELECT ... FROM...)
Additional operators added to the relational algebra: – Join most popular in practice– Rename (AS)– Intersection (INTERSECT)– Division
21/11/2010
Bazy i hurtownie danych 4
Relational Division
The division operator is less common than select-project-join queries, however, it is applicable to many common queries:– Find suppliers who supply all the engine parts– Find students who have taken all the core courses– Find customers who have ordered all items from a given line of products
The division operator can be also employed in data mining algorithms (e.g., generation of association rules)
21/11/2010
Bazy i hurtownie danych 5
Informal Definition
The division operator allows verifying whether or not a candidate subject is related to each of the values held in the base set.
The base set is called the divisor (or denominator T2[B]), and the table holding the subject’s data is called the dividend (or nominator T1[A, B]).
The expression T1[A, B]/T2[B] selects the A values from the dividend table T1[A, B], whose B values are a superset of those B values held in the divisor table T2[B].
21/11/2010
Bazy i hurtownie danych 7
More Practical Example #1
Given the relations ORDERS[SID, PID, QTY] and PRODUCTS[PID, PRICE] find all the stores that ordered at least 10 of all products priced over 15 $
21/11/2010
SID PID QTY
s2 p2 25
s2 p3 20
s2 p4 20
s1 p1 25
s1 p4 4
s1 p3 12
s1 p2 30
s3 p4 12
s3 p2 14
ORDERS
PID PRICE
p1 10
p2 25
p3 18
p4 20
PRODUCTS
ORDERS[SID,PID] / PRODUCTS[PID] = { s2 }
Bazy i hurtownie danych 8
More Practical Example #2
Given the relations ORDERS[SID, PID, PMNT] and PRODUCTS[PID, PRICE] we focus on ordered products priced over 15$ and paid either with cash or a credit card. What would be the result of ORDERS[SID, PID, PMNT] / PRODUCTS[PID]?
21/11/2010
SID PID PMNT
s1 p1 cash
s1 p2 bank
s1 p2 credit
s1 p3 cash
s1 p4 bank
s1 p3 credit
s2 p2 cash
s2 p2 bank
s2 p3 credit
ORDERS
PID PRICE
p1 10
p2 25
p3 18
p4 14
PRODUCTS
ORDERS[SID,PID,PMNT] / PRODUCTS[PID] = { (s1, credit) }
Bazy i hurtownie danych 9
Formal Definition: Relational Algebra
Let’s assume that the numerator table T1 always consists of two columns A and B, and the denominator has only one B attribute. Then, the expression T1[A, B]/T2[B] is semantically equivalent to:
T1[A, B]/T2[B] = T1[A] – ((T1[A] × T2[B]) – T1[A, B])[A]
21/11/2010
Bazy i hurtownie danych 11
Formal Definition:Tuple-Calculus
Using relational tuple-calculus language, the division operator can be rephrased as follows:
T1[A, B]/T2[B] = { t1[A] / t1 T1 and for-all t2 (t2 T2 exists t3 (t3 T1 and
(t1[A] = t3[A]) and(t2[B] = t3[B]))) }
21/11/2010
Bazy i hurtownie danych 12
Formal Definition: NFNF Databases(Non First-Normal Form)
Assumption of a NFNF format (fields with sets of atomic values) results in a much simplified definition of the division operator in tuple-calculus (attribute *B in T1 and T2 is defined as a set of atomic values):
T1[A, B]/T2[B] = { t1[A] / t1 T1 and t2 T2 and t2[*B] t1[*B] }
21/11/2010
Bazy i hurtownie danych 14
SQL Implementation: Q0
SELECT A FROM T1WHERE B IN (SELECT B FROM T2)GROUP BY AHAVING COUNT(*) = (SELECT COUNT(*) FROM T2)
21/11/2010
Bazy i hurtownie danych 15
SQL Implementation: Q1(Byzantine Method )
Based on the formal predicate calculus definition modified to fit SQL:– The universal quantifier for-all x (f(x)) replaced by not exists x
(not f(x))– The implication X Y replaced by (not(X) or Y)
T1[A, B]/T2[B] = { t1[A] / t1 T1 and not exists t2 (not(not( t2 T2)
or (exists t3 (t3 T1 and (t1[A] = t3[A]) and(t2[B] = t3[B]))))) }
21/11/2010
Bazy i hurtownie danych 16
SQL Implementation: Q1
Previous definition is equivalent (following De Morgan’s law not (P or Q) = not P and not Q ) to:
T1[A, B]/T2[B] = { t1[A] / t1 T1 and not exists t2 (( t2 T2) and
(not exists t3 (t3 T1 and (t1[A] = t3[A]) and(t2[B] = t3[B])))) }
21/11/2010
Bazy i hurtownie danych 17
SQL Implementation: Q1
SELECT DISTINCT x.A FROM T1 AS xWHERE NOT EXISTS (SELECT *
FROM T2 yWHERE NOT EXISTS (SELECT *
FROM T1 AS zWHERE (z.A=x.A) AND (z.B=y.B)))
21/11/2010
Bazy i hurtownie danych 18
SQL Implementation: Q2
Based on the algebraic definition of the division operator and broken into two steps:
SELECT DISTINCT y.A, z.B INTO T3FROM T1 AS y, T2 AS zWHERE NOT EXISTS (SELECT *
FROM T1WHERE (T1.A = y.A) AND (T1.B=z.B))
SELECT DISTINCT A FROM T1WHERE NOT EXISTS (SELECT *
FROM T3WHERE (T3.A=T1.A))
21/11/2010
T1[A, B]/T2[B] = T1[A] – ((T1[A] × T2[B]) – T1[A, B])[A]
Bazy i hurtownie danych 19
SQL Implementation: Q3
Based on the definition for the NFNF and tuple-calculus:
SELECT DISTINCT x.AFROM T1 AS xWHERE (SELECT COUNT(*) FROM T2) =
(SELECT COUNT(*) FROM T1, T2 WHERE (T1.A=x.A) AND (T1.B=T2.B))
21/11/2010
Bazy i hurtownie danych 20
Zero Division
The divide operator is defined in such a way that T1[A,B]/T2[B] produces exactly all A values in T1 each time that T2[B] is empty.
An empty set would be a more appropriate answer this is how Q0 works.
21/11/2010
Bazy i hurtownie danych 21
Experimental Evaluation of Q0…Q3
Assume basic structure of tables (T1[A, B], T2[B], integer or char) Conduct an experiment with the following settings:
– number of A-values in T1 = 10 000,– number of B-values in T1 = 100,– number of B-values in T2 = 0, 20, 40, 60, 80, 100.
Use provided scripts to generate sample tables and to run specific queries (Q0…Q3).
Observer performance of specific queries (execution time – CPU time). Collect the observations in a tabular and graphical form.
21/11/2010
https://ophelia.cs.put.poznan.pl/webdav/dbdw/students/dbdw-winter_2010-11/