lecture 4 ra - github pages...•need to write c program, use a graph engine, or modern sql… name1...
TRANSCRIPT
-
CS639:DataManagementfor
DataScienceLecture4:RelationalAlgebra
TheodorosRekatsinas1
-
2
Announcements• Assignment1
• HintsandGrading
-
Today’sLecture
1. TheRelationalModel&RelationalAlgebra
2. RelationalAlgebraPt.II
3
-
1.TheRelationalModel&RelationalAlgebra
4
-
Whatyouwilllearnaboutinthissection
1. TheRelationalModel
2. RelationalAlgebra:BasicOperators
5
-
6
Levelsofabstraction
-
Motivation
TheRelationalmodelisprecise,implementable,andwecanoperateonit
(query/update,etc.)
Databasemapsinternallyintothisprocedurallanguage.
-
TheRelationalModel:Schemata
• RelationalSchema:
Students(sid: string, name: string, gpa: float)
AttributesString,float,int,etc.arethedomains oftheattributes
Relationname
-
9
TheRelationalModel:Data
sid name gpa
001 Bob 3.2
002 Joe 2.8
003 Mary 3.8
004 Alice 3.5
Student
Anattribute (orcolumn)isatypeddataentrypresentineachtupleintherelation
Thenumberofattributesisthearity oftherelation
-
10
TheRelationalModel:Data
sid name gpa
001 Bob 3.2
002 Joe 2.8
003 Mary 3.8
004 Alice 3.5
Student
Atuple orrow (orrecord)isasingleentryinthetablehavingtheattributesspecifiedbytheschema
Thenumberoftuplesisthecardinality oftherelation
-
11
TheRelationalModel:DataStudent
Arelationalinstance isaset oftuplesallconformingtothesameschema
InpracticeDBMSsrelaxthesetrequirement,andusemultisets.
sid name gpa
001 Bob 3.2
002 Joe 2.8
003 Mary 3.8
004 Alice 3.5
-
• Arelationalschema describesthedatathatiscontainedinarelationalinstance
ToReiterate
LetR(f1:Dom1,…,fm:Domm)bearelationalschema then,aninstanceofRisasubsetofDom1 xDom2 x…xDomn
Inthisway,arelationalschema Risatotalfunctionfromattributenames totypes
-
• Arelationalschema describesthedatathatiscontainedinarelationalinstance
OneMoreTime
ArelationRofarity t isafunction:R:Dom1 x…xDomt à {0,1}
Then,theschemaissimplythesignatureofthefunction
I.e.returnswhetherornotatupleofmatchingtypesisamemberofit
Noteherethatordermatters,attributenamedoesn’t…We’ll(mostly)workwiththeothermodel(lastslide)in
whichattributenamematters,orderdoesn’t!
-
Arelationaldatabase
• Arelationaldatabaseschema isasetofrelationalschemata,oneforeachrelation
• Arelationaldatabaseinstance isasetofrelationalinstances,oneforeachrelation
Twoconventions:1. Wecallrelationaldatabaseinstancesassimplydatabases2. Weassumeallinstancesarevalid,i.e.,satisfythedomainconstraints
-
RemembertheCMS
• RelationDBSchema• Students(sid:string,name:string,gpa:float)• Courses(cid:string,cname:string,credits:int)• Enrolled(sid:string,cid:string,grade:string)
Sid Name Gpa101 Bob 3.2123 Mary 3.8
Students
cid cname credits564 564-2 4308 417 2
Coursessid cid Grade123 564 A
Enrolled
RelationInstances
15
Notethattheschemasimposeeffectivedomain/typeconstraints,i.e.Gpacan’tbe“Apple”
-
2nd PartoftheModel:Querying
“FindnamesofallstudentswithGPA>3.5”
Wedon’ttellthesystem howorwhere togetthedata- justwhatwewant,i.e.,Queryingisdeclarative
SELECT S.nameFROM Students SWHERE S.gpa > 3.5;
Tomakethishappen,weneedtotranslatethedeclarativequeryintoaseriesofoperators…we’llseethisnext!
-
Virtuesofthemodel
• Physicalindependence(logicaltoo),Declarative
• Simple,elegantclean:Everythingisarelation
-
RelationalAlgebra
-
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
Declarativequery(fromuser)
Translatetorelationalalgebraexpression
Findlogicallyequivalent- butmoreefficient- RAexpression
Executeeachoperatoroftheoptimizedplan!
-
RDBMSArchitecture
HowdoesaSQLenginework?
SQLQuery
RelationalAlgebra(RA)
Plan
OptimizedRAPlan Execution
RelationalAlgebraallowsustotranslatedeclarative(SQL)queriesintopreciseandoptimizableexpressions!
-
• Fivebasicoperators:1. Selection: s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-
• Derivedorauxiliaryoperators:• Intersection,complement• Joins(natural,equi-join,thetajoin,semi-join)• Renaming: r• Division
RelationalAlgebra(RA)
-
Keepinmind:RAoperatesonsets!
• RDBMSsusemultisets,howeverinrelationalalgebraformalismwewillconsidersets!
• Also:wewillconsiderthenamedperspective,whereeveryattributemusthaveauniquename• àattributeorderdoesnotmatter…
NowontothebasicRAoperators…
-
• Returnsalltupleswhichsatisfyacondition• Notation: sc(R)• Examples• sSalary >40000 (Employee)• sname =“Smith” (Employee)
• Theconditionccanbe=,,³,
1.Selection(𝜎)
SELECT *FROM StudentsWHERE gpa > 3.5;
SQL:
RA:𝜎"#$&'.)(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students(sid,sname,gpa)
-
sSalary >40000 (Employee)
SSN Name Salary1234545 John 2000005423341 Smith 6000004352342 Fred 500000
SSN Name Salary5423341 Smith 6000004352342 Fred 500000
Anotherexample:
-
• Eliminatescolumns,thenremovesduplicates• Notation:P A1,…,An (R)• Example:projectsocial-securitynumberandnames:• P SSN,Name (Employee)• Outputschema:Answer(SSN,Name)
2.Projection(Π)
SELECT DISTINCTsname,gpa
FROM Students;
SQL:
RA:Π45$67,"#$(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students(sid,sname,gpa)
-
P Name,Salary (Employee)
SSN Name Salary1234545 John 2000005423341 John 6000004352342 John 200000
Name SalaryJohn 200000John 600000
Anotherexample:
-
NotethatRAOperatorsareCompositional!
SELECT DISTINCTsname,gpa
FROM StudentsWHERE gpa > 3.5;
Students(sid,sname,gpa)
HowdowerepresentthisqueryinRA?
Π45$67,"#$(𝜎"#$&'.)(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))
𝜎"#$&'.)(Π45$67,"#$(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))
Aretheselogicallyequivalent?
-
• EachtupleinR1witheachtupleinR2• Notation:R1´ R2• Example:• Employee´ Dependents
• Rareinpractice;mainlyusedtoexpressjoins
3.Cross-Product(×)
SELECT *FROM Students, People;
SQL:
RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,sname,gpa)People(ssn,pname,address)
-
ssn pname address1234545 John 216 Rosse
5423341 Bob 217 Rosse
sid sname gpa001 John 3.4
002 Bob 1.3
𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒
×
ssn pname address sid sname gpa1234545 John 216 Rosse 001 John 3.4
5423341 Bob 217 Rosse 001 John 3.4
1234545 John 216 Rosse 002 Bob 1.3
5423341 Bob 216 Rosse 002 Bob 1.3
People StudentsAnotherexample:
-
• Changestheschema,nottheinstance• A‘special’operator- neitherbasicnorderived• Notation:r B1,…,Bn (R)
• Note:thisisshorthandfortheproperform(sincenames,notordermatters!):• r A1àB1,…,AnàBn (R)
Renaming(𝜌)
SELECTsid AS studId,sname AS name,gpa AS gradePtAvg
FROM Students;
SQL:
RA:𝜌4?@ABA,5$67,"C$A7D?EF"(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students(sid,sname,gpa)
Wecareaboutthisoperatorbecause weareworkinginanamedperspective
-
sid sname gpa001 John 3.4
002 Bob 1.3
𝜌4?@ABA,5$67,"C$A7D?EF"(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)
Students
studId name gradePtAvg001 John 3.4
002 Bob 1.3
Students
Anotherexample:
-
• Notation:R1⋈R2
• JoinsR1 andR2 onequalityofallsharedattributes• IfR1 hasattributesetA,andR2 hasattributesetB,andtheyshareattributesA⋂B=C,canalsobewritten:R1⋈ 𝐶R2
• OurfirstexampleofaderivedRA operator:• Meaning:R1⋈ R2 =PAUB(sC=D(𝜌J→L(R1)´ R2))• Where:
• Therename𝜌J→L renamesthesharedattributesinoneoftherelations• TheselectionsC=Dchecksequalityofthesharedattributes• TheprojectionPAUBeliminatestheduplicatecommonattributes
NaturalJoin(⋈)
SELECT DISTINCTssid, S.name, gpa,ssn, address
FROM Students S,People P
WHERE S.name = P.name;
SQL:
RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒
Students(sid,name,gpa)People(ssn,name,address)
-
ssn P.name address1234545 John 216 Rosse
5423341 Bob 217 Rosse
sid S.name gpa001 John 3.4
002 Bob 1.3
𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒
⋈
sid S.name gpa ssn address001 John 3.4 1234545 216 Rosse
002 Bob 1.3 5423341 216 Rosse
PeoplePStudentsSAnotherexample:
-
NaturalJoin
• GivenschemasR(A,B,C,D),S(A,C,E),whatistheschemaofR⋈S?
• GivenR(A,B,C),S(D,E),whatisR⋈S?
• GivenR(A,B),S(A,B),whatisR⋈S?
-
Example:ConvertingSQLQuery->RA
SELECT DISTINCTgpa,address
FROM Students S,People P
WHERE gpa > 3.5 ANDsname = pname;
Π"#$,$AAC744(𝜎"#$&'.)(𝑆 ⋈ 𝑃))
Students(sid,sname,gpa)People(ssn,sname,address)
-
LogicalEquivalenceofRAPlans
• GivenrelationsR(A,B)andS(B,C):
• Here,projection&selectioncommute:• 𝜎EM)(ΠE(𝑅)) = ΠE(𝜎EM)(𝑅))
• Whatabouthere?• 𝜎EM)(ΠP(𝑅))?= ΠP(𝜎EM)(𝑅))
-
1.Union(È) and2.Difference(–)
• R1È R2• Example:• ActiveEmployeesÈ RetiredEmployees
• R1– R2• Example:• AllEmployees -- RetiredEmployees
R1 R2
R1 R2
-
WhataboutIntersection(Ç) ?
• Itisaderivedoperator• R1Ç R2=R1– (R1– R2)• Alsoexpressedasajoin!• Example
• UnionizedEmployeesÇ RetiredEmployees
R1 R2
-
RAExpressionsCanGetComplex!
PersonPurchasePersonProduct
sname=fred sname=gizmo
P pidP ssn
seller-ssn=ssn
pid=pid
buyer-ssn=ssn
P name
-
RAhasLimitations!
• Cannotcompute“transitiveclosure”
• FindalldirectandindirectrelativesofFred• CannotexpressinRA!!!
• NeedtowriteCprogram,useagraphengine,ormodernSQL…
Name1 Name2 RelationshipFred Mary FatherMary Joe CousinMary Bill SpouseNancy Lou Sister