cse 414 final examination - courses.cs.washington.edupreparedstatement: preparestatement...
TRANSCRIPT
CSE414FinalExamination
Name:__________Solution____________
Topic Points Max
I RelationalData 30
II DBApplications 36
III SemistructuredData 12
IV DBMSImplementation 48
V BigData 26
Total 152
• Theexamisclosedbook.• Noelectronicdevicesmaybeused.• Youareallowedtwoletter-sizedpagesofnotes(bothsides,8+ptfont).
• Allquestionshaveoneanswerunlessstatedotherwise.• Questionswithoneanswerareworth2points.• Questionswithmanyanswersareworth1pointperavailableoption(4-6pointstotal).
• Youhave1hourand50minutestocompletethetest.• Ifyouarenotdoneafter1hourand30minutes,pleasestayinyourseatuntiltheend.
I.RelationalData
Considerthefollowingschemaforasimplesocialnetwork.Itmaintainsacollectionof
users,identifiedbyunique“handles”(shortnames).
User(handle, name, home_city, bio) Friend(handle1, handle2) Message(handle, text, from_city)
Thenextfewproblemswillconsiderqueriesforansweringthefollowingquestions:
A. Findthehandlesofuserswhowrotemessageswhileincitiesotherthantheoneinwhichtheylive.
B. Findthehandlesofuserswhoarefriendswithsomeonewhoisfriendswiththem.
C. FindthenamesofuserswhoeitherliveinKansasCityorhavewrittenamessagefromKansasCity.
D. FindthenamesofuserswhobothliveinKansasCityandhavewrittenamessagefromKansasCity.
Afteryouhavereadthesequestions,turnthepage….
(Therearenoproblemsonthispage.)
Foreachextendedrelationalalgebraexpressionbelow,indicatewhichofthequestions
above(A–D),ifany,itanswers.Ifnoquestionmatches,thenleavethespaceblank.
1. δ(πhandle1σhandle1=handle3[Friend⋈(ρhandle2,handle3Friend)]) ________B_______
(Recallthattheρoperatorsimplyrenamesthecolumnsofatable.)
2. δ(πhandle[δ(πhandle,from_cityMessage)–πhandle,home_cityUser]) ________A_______
3. δ(πname[([πhandleσhome_city=‘KansasCity’User]∪ ________C_______
[πhandleσfrom_city=‘KansasCity’Message])⋈User])
4. δ[πhandleσfrom_city<>home_city(User⋈Message)] ________A_______
ForeachSQLquerybelow,indicatewhichofthequestionsfrombefore(A–D),ifany,it
answers.Ifnoquestionmatches,thenleavethespaceblank.
5. SELECT name ________C_______ FROM User U WHERE home_city = ‘Kansas City’ OR EXISTS (SELECT name FROM Message WHERE handle = U.handle AND from_city = ‘Kansas City’)
6. (SELECT name FROM User ________D_______ WHERE home_city = ‘Kansas City’) INTERSECT (SELECT name FROM User U, Message M WHERE U.handle = M.handle AND from_city = ‘Kansas City’)
7. SELECT DISTINCT U.handle _______A________ FROM User U, Message M WHERE U.handle = Message.handle AND from_city <> home_city
8. SELECT F1.handle1 ________________ FROM Friend F1, Friend F2 WHERE F1.handle2 = F2.handle2 AND F1.handle1 = F2.handle1 GROUP BY F1.handle1
ForeachDatalogquerybelow,indicatewhichofthequestionsfromearlier(A–D),ifany,it
answers.Ifnoquestionmatches,thenleavethespaceblank.
Ineachitem,theruleoftheformQN,whereNisanumber,definesthequeryinquestion.
9. Q1(h) :- User(h, _, hc, _), ________A_______ Message(h, _, fc), hc <> fc.
10. Q2(h) :- User(h, _, “Kansas City”, _). ________________ Q2(h) :- User(h, _, hc, _), Message(h, _, fc), hc = “Kansas City”.
11. Q3(h) :- User(h, _, “Kansas City”, _), ________D_______ Message(h, _, “Kansas City”).
12. FoF(h1, h2, 1) :- Friend(h1, h2). ________B_______ FoF(h1, h2, d) := FoF(h1, h3, d-1), Friend(h3, h2). Q4(h) :- FoF(h, h, 2).
13. ConsiderthefollowingSQLquery:
SELECT handle, bio, min(home_city) FROM User NATURAL JOIN Message GROUP BY handle, from_city
Whichsectionofthequeryisillegal?
handle bio
min(home_city) from_city
14. Considerthefollowingexpressioninstandardrelationalalgebra:
πfrom_city[Message⋈(σhandle>‘c’User–σhandle>‘d’User)]
Canthesameresultbecomputedusingasimpleselect-from-wherequery(i.e.,onewith
nosubqueries,grouping,oraggregation)?
Yes No
15. Whichofthefollowingisanadvantageofusinganauto-generated,uniqueintegerIDastheprimarykeyforUserinsteadofthehandle?(Choosethebestanswer.)
usesmuchlessdiskspace makessearchesbyprimarykey
muchmoreefficient
makeshandlechanges makeshashindexesanoption
muchmoreefficient insteadofonlyB+treeindexes
II.DBApplications
1. Whichofthefollowingaretupleconstraintsthatcanbeautomaticallycheckedbya
DBMSwhentuplesarechangedoraddedtothetable?Circleallthatapply.
valueincolumnisastring valueincolumnisthekeyofatupleintableT
valueincolumnisnotnull valueincolumnisapositiveinteger
valueincolumnisunique valuesintwocolumnsaredifferent
inthatcolumnofthetable
2. SupposethatR(a,b,c)andS(c,d)arerelations.IfRhasadependencyb→c,thenthesamedependencymustexistin:
R⨉S: Yes No
R⋈S: Yes No
3. ForeachJDBCstatementtypebelow,towhichmethoddoyoupassthestringdescribing
yourSQLquery?(RecallthatStatementscanexecutemultiplequeries.)
Statement: createStatement executeQuery PreparedStatement: prepareStatement executeQuery
4. UsingSQLServer,ifoneJDBCConnectionsetsitsisolationleveltoreaduncommitted,onlythatconnectionmayseebehaviorthatisnotserializable?
Yes No
5. Whichtierofa3-tieredarchitectureismostlikelytobecometheperformance
bottleneckasyouscaletoaverylargenumberofusers?
browsers harddisks
webservers DBserver
ConsiderthefollowingSQLschema:
CREATE TABLE Company( name VARCHAR(100) PRIMARY KEY, address VARCHAR(200)); CREATE TABLE Product( name VARCHAR(100) PRIMARY KEY, price FLOAT, made_by VARCHAR(100) FOREIGN KEY REFERENCES Company);
6. WhichofthefollowingE/Rdiagramscouldproducethatschema?
7. WhichofthefollowingdescribestherelationshipbetweenProductsandCompanies?
onetoone onetomany
manytoone manytomany
8. WhichofthefollowingtypesofrelationshipsbetweenProductsandCompaniescould
berepresentedinSQLwithoutanadditionaltable?Circleallthatapply.
onetoone onetomany
manytoone manytomany
ThenexttwoquestionsrefertotheschemaforUserandMessagethatwesawinPartI.
9. Whichofthefollowingchangestothedataforasingleuserwould,mostlikely,require
changingmorerowsofthedatabase?
changinghandle changingname
10. Whichofthefollowingwouldbethebestwaytoreducethenumberofrowsthatneedtobechangedthecaseabove?
addanindexonMessage(handle) use(handle, name)astheprimarykeyofUser
addanindexonMessage(text) useanauto-generated,uniqueID
astheprimarykeyofUser
Supposethatwechangedtheschemafromearliersothat(1)theauthor’snameisincluded
(inadditiontotheirhandle)and(2)thecity’szipcodeisincluded:
Message(handle, name, text, from_city, city_zip)
11. Supposethatwealloweduserstochangeboththeirnamesandhandles.Eachitembelowgivesonechoiceofthemanynamesandhandlesauserhashadthatwecould
storewiththemessage.Inwhichsituations,wouldhandle→nameinMessagestillbeafunctionaldependency?Circleallthatapply.
currenthandle currenthandle
nameattimewritten currentname
handleattimewritten firsthandleofauthor
nameattimewritten currentname
12. WhichofthefollowingcouldresultfromaBCNFdecompositionofthistable?Circleallthatapply.
City(name, zip) Message(name, text, from_city) Author(handle, name) Message(handle, text, from_city)
13. Whichofthesewouldmakesensetoaddtotheschema?Circleallthatapply.
City Author
III.SemistructuredData
SupposethatwestoreallthedataforoursocialnetworkinasingledatasetofUsers:
[{“handle”: “biebs”, “name”: “Justin Bieber”, “home_city”: “Somewhere, Canada”, “bio”: “…”, “friends”: [“kimkardashian”, “shaq”, …], “messages”: [ {“text”: “:-* :-* :-* :-*”, “from_city”: “Los Angeles, CA”}, {“text”: “New. Music. Friday.”, “from_city”: “Los Angeles, CA”}, … ]} … ]
ConsiderthefollowingSQL++query:
SELECT DISTINCT m.from_city, n.from_city FROM Users u, u.messages m, Users v, v.messages n WHERE u.handle <> v.handle AND m.text = n.text
1. WhichoftheSQLqueriesbelowreturnthesameresultastheSQL++queryabove?
(EachSQLqueryusesthesameschemaforUserandMessageaswesawinPartI.)Circleallthatapply.
SELECT DISTINCT m.from_city, n.from_city FROM User u, Message m, User v, Message n WHERE u.handle = m.handle AND v.handle = m.handle AND u.handle <> v.handle AND m.text = n.text SELECT DISTINCT m.from_city, n.from_city FROM Message m, Message n WHERE m.handle <> n.handle AND m.text <> n.text SELECT m.from_city, n.from_city FROM Message m, Message n WHERE m.handle <> n.handle AND m.text <> n.text GROUP BY m.from_city, n.from_city SELECT DISTINCT m.from_city, n.from_city FROM Message m, Message n, User v WHERE m.handle <> v.handle AND m.text = n.text
ConsiderthefollowingSQL++query:
SELECT max(coll_count(m)) FROM User u, u.messages m WHERE m.text > 3 AND u.handle = “biebs” ORDER BY u.name
2. Whichpartsofthequeryaboveareillegal?Circleallthatapply.
coll_count(m) ORDER BY u.name
u.handle = “biebs” m.text > 3
ConsiderthefollowingSQL++query:
SELECT u.handle, count(*) FROM User u, u.messages m WHERE u.home_city = ‘Los Angeles, CA’ GROUP BY u.handle
3. WhichofthefollowingSQL++queriesreturnthesameresult?Circleallthatapply.
SELECT u.handle, coll_count(u.messages) FROM User u WHERE u.home_city = ‘Los Angeles, CA’ SELECT u.handle, coll_count((SELECT * FROM u.messages)) FROM User u WHERE u.home_city = ‘Los Angeles, CA’
4. WhichofthefollowingSQLqueriesreturnthesameresult?Circleallthatapply.
SELECT u.handle, count(*) FROM User u, Message m WHERE u.handle = m.handle AND u.home_city = ‘Los Angeles, CA’ GROUP BY u.handle
SELECT u.handle, count(*) FROM User u NATURAL JOIN Message WHERE u.home_city = ‘Los Angeles, CA' GROUP BY u.handle
IV.DBMSImplementation
1. SupposethatA=100,transactionT1wantstoincreaseAby100,andT2wantsto
doubleA.ConsiderthescheduleR1(A),R2(A),W1(A),W2(A).Whichofthefollowing
propertiesholdforthisschedule?Circleallthatapply.
transactionsareisolated serializable
followsstrict2PLlocking conflictserializable
2. SupposethatT1andT2areaccessingelementBinaglobally-distributeddatabase
fromoppositesidesoftheplanet.Whichofthefollowingismostdifficulttosupport?
R1(B)beforeR2(B) R1(B)beforeW2(B)
W1(B)beforeR2(B) W1(B)beforeW2(B)
3. Intheabsenceofrollbacks,whichofthefollowingensuresthatlockingresultina
serializableschedule?Circleallthatapply.
two-phaselocking usingreadandwritelocks
stricttwo-phaselocking usingread,pending,reserved,and
exclusivelocks
4. WhichoftheACIDpropertiesaremostatriskofbeingviolatediftheDBMScrashesmid-
waythroughcommittingresultstodisk.Circletwo.
atomicity consistency
isolation durability
5. Intheabsenceofrollbacks,whichofthefollowingcanbeusedtodeterminewhethera
scheduleisconflictserializable?Circleallthatapply.
checkforcyclesinprecedencegraph
simulatea2PLschemeandcheckfordeadlocks
simulatestrict2PLandcheckfordeadlocks
seeifyoucantochangethescheduleintoaserialorder
byswappingadjacentelements
6. Ifaddingmoreclientsisdecreasingtherateoftransactionspersecond,whichofthe
followingtechniqueswilllikelyincreasetheratewiththeleastpossibilityofbugs?
readcommittedisolationlevel addmorewebservers
readuncommittedisolationlevel admissioncontrol
7. SupposewearecomputingR⋈S,withbothtableslarge(1M+rows).Theestimatedcostofablocknestedloopjoinisapproximatelycutinhalfif(circleallthatapply):
numberoftuplesofRhalved tuplesperblockofRisdoubled
withnumberofblocksunchanged withnumberoftuplesunchanged
numberoftuplesofSishalved tuplesperblockofSisdoubled
withnumberofblocksunchanged withnumberoftuplesunchanged
8. Comparedtoregular(tuple-based)nestedloopjoin,theestimatedcostofablocknested
loopjoinis:
alwaysfaster neverfaster
sometimesfaster,sometimesslower neverslower
9. Ifeachblockcontainsonlyasingletuple,thencomparedtoablocknestedloopjoin,the
estimatedcostofanindexedjoinis:
alwaysfaster neverfaster
sometimesfaster,sometimesslower neverslower
10. Supposethattheprimarykeyoftableisanauto-generated,UUIDwithseeminglyrandomvalues.Comparedtoanunclusteredindexontheprimarykey,theclustered
indexcreatedautomaticallybySQLservermakeseachquery:
alwaysfaster alwaysslower
sometimesfaster,sometimesslower alwaysthesame
Inthenextfewproblems,wewillcomputethecost
ofthequeryplanontheright,whichfindstheresult
oftheexpressionπe,h(R⋈S⋈T)overthetablesR(e,f),S(f,g),andT(g,h).(Bothjoinsnatural.)
Theplanontherightisstillincomplete,however,becausethechoiceofalgorithmfortwoparts,JoinA
andJoinB,havenotyetbeenfilledin.
Inthenextfewproblems(startingonthenextpage),
wewillconsiderthecostofthisplanwithdifferent
choicesofalgorithmsforthesetwopoints.
Youcanassumethefollowingstatisticsaboutthesetables:
Table #tuples #blocks
R 1,000 100
S 5,000 200
T 100,000 10,000
R⋈S 5,000 20
Furthermore,youcanassumethefollowingstatisticsabouttheircolumns:
Column #distinct Low High
R.f 100 1 1000
S.f 1000 1 2000
S.g 5000 1 2,000
T.g 1000 1 10,000
UsefulFormulas
EstimatedcostofXjoinY:
• Usingablocknestedloop,thecostisB(X)+B(X)B(Y),whereB(X)is#blocksinX.• UsingaclusteredindexonY(A),thecostisB(X)+T(X)B(Y)*E,whereT(X)is#
tuplesinX,andEistheselectivityoftheconditionA = c.• UsinganunclusteredindexonY(A),thecostisB(X)+T(X)T(Y)*E.
Estimatedselectivityofconditions:
• ForA = c,theselectivityis1/(#distinctvaluesofA)• ForA < c,theselectivityis(c-lowestvalueofA)/(highest-lowestvalueofA)
11. WhatistheestimatedcostofJoinBintheplanifweimplementitwithablocknestedloopjoin?
20+20*10,000=200,020
12. WhatistheestimatedcostofJoinBtheifweuseanindexedjoin?AssumethatwehaveaclusteredindexonT(g).
20+5,000*10,000/1000=50,020
13. WhatistheestimatedcostofJoinAintheplanifweimplementitwithablocknestedloopjoin?
100+100*200=20,100
14. WhatistheestimatedcostofJoinAifweuseanindexedjoin?AssumethatwehaveanunclusteredindexonS(f).
100+1,000*5,000/1000=5,100
15. WhatisthetotalcostofthisplanifweusethebestchoiceofjoinalgorithmforAandB?
Inadditiontothetwojoinsabove,weneedtowriteR⋈Stodisk,whichcosts20IOs.Thefinalprojectionhasnocost.Thus,thetotalcostis55,170.
YouareimplementingaDBapplicationusingSQLServer.Atfirst,asyouaddmoreusers,
youseeaniceincreaseincompletedtransactionspersecond.However,afterreachinga
highlevelofsimultaneoususers,youfindthatcompletedtransactionspersecondstarts
droppingasyouaddmoreusers.
YoudeterminethattheproblemisSQLServer’srow-levellocking.Toimprove
performance,youwouldliketorunsomequeriesusingtable-levellockinginstead.
Yourideaisasfollows.Youcreateaspecialtable:
Tables(name VARCHAR(100), last_write_by INT);
Itwillcontainonerowforeachtable,indicatingthenameofthattableandthenumberof
thetransactionthatlastwrotetoit.Eachtransactionwillstartbyreadingtherowsof
Tablesforthetablesthatitswantstoreadandwritingtherowsforthetablesthatitwantstowrite,whichwillrequireSqlServertoacquirereadorwritelocksonthoserows.Inthat
manner,therow-levellocksinSQLServerTablesforactastable-levellocks.
16. Afterthetransactionhasacquiredtheappropriatetablelocksanditexecutesitsoperationsagainsttheactualtables,willSQLServerstillacquirelocksontherowsof
thosetables?
Yes No
17. Toimproveperformance,youdecidetouseasecondDBconnectiontoexecutethequeryonceyouhaveacquiredthetablelocksusingyourprimaryDBconnection.What
istheweakestisolationlevelthatyoucansafelyuseforthissecondconnectionifyou
stillwantyourtransactionstobeACID?
readuncommitted readcommitted
repeatableread serializable
18. Whichadditionalchangeswouldmakedeadlockimpossibleinthissystem?
none(alreadyimpossible) takewritelocksbeforereadlocks
locktablesinalphabeticalorder takereadlocksbeforewritelocks
19. SupposethatSQLServerusedread/writelockswhereareadlockisalwaysgrantedprovidednowritelockhasyetbeenacquired.Inonesentence,describeaproblemwiththisapproachthatwouldbefixedbyswitchingtoSQLite’sschemewith4locktypes.
Writerscouldbewaitingforeverforatimewithnoreadlocks.
V.BigDataSystems
1. WhichfeaturesdoparallelDBsprovidethatdataflowenginesdon’t?Circleallthatapply.
declarativequeries scaletopetabytesofdata
transactions computeresultsofnon-monotonequeries
2. Thepurposeofawatermarkinastreamingsystemistocontrolwhat?
whichdatacanbeignored batchsize
howquicklyresultsmustbe numberofMapReducejobsexecuted
updatedwhennewdataarrives
3. The“shufflesort”phaseofMapReduceismostlikewhichoperationinSQL?
innerjoin groupby
outerjoin count(*)
4. ThephysicalplansinSparkSQLaddwhatelementbeyondwhatisinthelogicalplans?
wheretoacquirelocks choiceofbroadcastvsparalleljoin
wheretowritetodisk choiceofhashvsB+treeindexes
5. AdataflowpipelinewithT“reshuffle”operationscanbeexecutedwithatmosthowmanyMapReducejobs?
T-1 T
T+1 2T
Youarehiredbyawebcompanythatmakesapopularproductwithover100millionusers.
Toachievethatscale,theybuilttheirappontopofaNoSQLsystem.However,theyfound
thatitishardtoanswercomplexquestionsaboutthedatastoredinthatsystem.
TheCEOhasnowhiredyoutofixthisproblem.Shetellsyouthatyoucanhireateamof
programmerstohelpyou,butyoucannotchangeawayfromtheNoSQLsystemtheyare
alreadyusing—thatwouldbetoocostly.
Thankfully,theNoSQLsystemtheyuse,anextensiblerecordsystem(or“BigTable”),does
supportMapReduce.(Unfortunately,thesystemprovidesyouwithnostatisticsaboutthe
columnsintherecords.)Afterthinkingitover,youdecidetobuildaqueryenginethatlets
employeesexecuteSQLqueriesoverthedatabyturningSQLintoMapReducejobs.
6. Whichofthefollowingwouldyourteamnotneedtoimplementinthecodethatyou
writeforthissystem?Circleallthatapply.
parserforSQL conversionfromSQLtoRA
estimateselectivityof putRAoperatorsintogroups
conditionsinwhereclause thatrequireonlyonereshuffle
takelocksondataelements startbackupjobsforanystragglers
accordingtotwo-phaselocking
7. Whichofthefollowingwouldbethebestestimatorofthecostofaphysicalplaninthis
system?
totalCPUcycles totalCPUcyclespernode
totalnetworkI/O numberofreshuffles
8. Supposethatyouwantedtomaintainstatisticsaboutthecolumns.Yourideaistodo
thatbyaddingarecord,foreachcolumn,thatstoresitsstatistics.Whichofthefollowing
featureswouldyourequireoftheNoSQLsystem?Circleallthatapply.
row-leveltransactions generaltransactions
two-phaselocking stricttwo-phaselocking
9. DosomeNoSQLsystemsofferthatfeature?
Yes No