compsci516 database systems lecture 20 (additional ... · mongodb • mongodb is an open source...
TRANSCRIPT
CompSci 516DatabaseSystems
Lecture20(Additional/OptionalSlides)
NoSQL/MongoDB
Instructor:Sudeepa Roy
DukeCS,Fall2018 CompSci516:DatabaseSystems 1
DukeCS,Fall2018 CompSci516:DatabaseSystems 2
AdditionalandOptionalSlidesonMongoDB
(MaybeusefulforHW3)https://docs.mongodb.comhttps://docs.mongodb.com/manual/reference/sql-comparison/
MongoDB
• MongoDBisanopensourcedocumentstorewritteninC++• providesindexesoncollections• lockless• providesadocumentquerymechanism• supportsautomaticsharding• Replicationismostlyusedforfailover• doesnotprovidetheglobalconsistencyofatraditionalDBMS
– butyoucangetlocalconsistencyontheup-to-dateprimarycopyofadocument
• supportsdynamicquerieswithautomaticuseofindices,likeRDBMSs
• alsosupportsmap-reduce– helpscomplexaggregationsacrossdocs
• providesatomicoperationsonfields
DukeCS,Fall2018 CompSci516:DatabaseSystems 3
Optionalslide:Readyourself
MongoDB:AtomicOpsonFields• Theupdatecommandsupports“modifiers”thatfacilitateatomic
changestoindividualvalues– $setsetsavalue– $inc incrementsavalue– $pushappendsavaluetoanarray– $pushAll appendsseveralvaluestoanarray– $pullremovesavaluefromanarray,and$pullAll removesseveral
valuesfromanarray• Sincetheseupdatesnormallyoccur“inplace”,theyavoidthe
overheadofareturntriptotheserver• Thereisan“updateifcurrent”conventionforchangingadocument
onlyiffieldvaluesmatchagivenpreviousvalue• MongoDBsupportsafindAndModify commandtoperforman
atomicupdateandimmediatelyreturntheupdateddocument– usefulforimplementingqueuesandotherdatastructuresrequiring
atomicity
DukeCS,Fall2018 CompSci516:DatabaseSystems 4
Optionalslide:Readyourself
MongoDB:Index• MongoDBindicesareexplicitlydefinedusinganensureIndex call– anyexistingindicesareautomaticallyusedforqueryprocessing
• Tofindallproductsreleasedlastyear(2015)orlatercostingunder$100youcouldwrite:
• db.products.find({released:{$gte:newDate(2015,1,1,)},price{‘$lte’:100},})
DukeCS,Fall2018 CompSci516:DatabaseSystems 5
Optionalslide:Readyourself
MongoDB:Data
• MongoDBstoresdatainabinaryJSON-likeformatcalledBSON– BSONsupportsboolean,integer,float,date,stringandbinarytypes
–MongoDBcanalsosupportlargebinaryobjects,eg.imagesandvideos
– Thesearestoredinchunksthatcanbestreamedbacktotheclientforefficientdelivery
DukeCS,Fall2018 CompSci516:DatabaseSystems 6
Optionalslide:Readyourself
MongoDB:Replication
• MongoDBsupportsmaster-slavereplicationwithautomaticfailoverandrecovery– Replication(andrecovery)isdoneatthelevelofshards
– Replicationisasynchronousforhigherperformance,sosomeupdatesmaybelostonacrash
DukeCS,Fall2018 CompSci516:DatabaseSystems 7
Optionalslide:Readyourself
AdditionalSlidesonMongoDBandJSON
DukeCS,Fall2018 CompSci516:DatabaseSystems 8
ACK:SlidesbyProf.JunYang
JSON(JavaScriptObjectNotation)
• Verylightweightdataexchangeformat– MuchlessverboseandeasiertoparsethanXML– IncreasinglyusedfordataexchangeoverWeb:manyWebAPIsuseJSONtoreturnresponses/results
• BasedonJavaScript– ConformstoJavaScriptobject/arraysyntax—youcandirectlymanipulateJSONrepresentationsinJavaScript
• Butithasgainedwidespreadsupportbyallprogramminglanguages
9
ExampleJSONvs.XML
10
<bibliography><book ISBN="ISBN-10" price="80.00"><title>Foundations of Databases</title><author>Abiteboul</author><author>Hull</author><author>Vianu</author><publisher>Addison Wesley</publisher><year>1995</year><section><title>Section 1</title><section><title>Section 1.1</title></section><section><title>Section 1.2</title></section>
</section><section><title>Section 2</title>
</section></book>
</bibliography>
[{ "ISBN": "ISBN-10","price": 80.00,"title": "Foundations of Databases","authors": [ "Abiteboul", "Hull", "Vianu" ],"publisher": "Addison Wesley","year": 1995,"sections": [{ "title": "Section 1","sections": [{ "title": "Section 1.1” },{ "title": "Section 1.2” }
]},{ "title": "Section 2" }
]}, … …
]
JSONdatamodel
• Twobasicconstructs– Array:comma-separatedlistof“things”enclosedbybrackets
• Orderisimportant– Object:comma-separatedsetofpairsenclosedbybraces;eachpairconsistsofanattributename(string)andavalue(any“thing”)• Orderisunimportant• Attributenames“should”beuniquewithinanobject
• Simpletypes:numbers,strings(indoublequotes),andspecialvalues“true”,“false”,and“null”
• Thing=asimplevalueoranarrayoranobject11
[{ "ISBN": "ISBN-10","price": 80.00,"title": "Foundations of Databases","authors": [ "Abiteboul", "Hull", "Vianu" ],"publisher": "Addison Wesley","year": 1995,"sections": [{ "title": "Section 1","sections": [{ "title": "Section 1.1” },{ "title": "Section 1.2” }
]},{ "title": "Section 2" }
]}, … …
]
JSONSchema
• Recalltheadvantagesofhavingaschema– Definesastructure,helpscatcherrors,facilitatesexchange/automation,informsoptimization…
• JustlikerelationaldataandXML,JSONisgettingaschemastandardtoo!– Upandcoming,butstilladraftatthisstage
12
{"definitions": {"sections": {"type": "array","description": "Sections.","sections": {"$ref":"#definitions/sections"},"minItems": 0
}},"title": "Book","type": "object","properties": {"ISBN": {"type": "string","description": "The book's ISBN number."
},"price": {"type": "number","description": "The book's price.","exclusiveMinimum": 0
},... ..."sections": {"$ref":"#definitions/sections"},}
}... ...
}
MongoDB
• Oneofthe“NoSQL”posterchildren• Startedin2007• Targetingsemi-structureddatainJSON• Designedtobeeasyto“scaleout”• Goodsupportforindexing,partitioning,replication
• NiceintegrationinWebdevelopmentstacks• Not-so-greatsupportforjoins(orcomplexqueries)ortransactions
13
InsideaMongoDBdatabase
• Database=anumberof“collections”• Collection=alistof“documents”• Document=aJSONobject– Musthavean_id attributewhosevaluecanuniquelyidentifyadocumentwithinthecollection
☞Inotherwords,adatabasehascollectionsofsimilarlystructured“documents”– Muchliketablesofrecords,asopposedtoonebigXMLdocumentthatcontainsalldata
14
QueryingMongoDB
• find() andsort()– Analogoustosingle-tableselection/projection/sort
• “Aggregation”pipeline– With“stages”analogoustorelationaloperators– Join,group-by,restructuring,etc.
• MapReduce:– Supportsuser-definedfunctions– Wewillsavethistopicuntillaterinthiscourse
☞Wewon’tcoversyntaxforcreating/updatingMongoDBdatabasesinlecture– See“Help”ofthecoursewebsiteandreadthemanuals!
15
Keyfeaturestolookoutfor
• QuerieswrittenasJSONobjectsthemselves!– Naturalinsomecases(e.g.,forspecifyingconditionsonsubsetsofattributes),butawkward/misleadinginothers
• Simplepathexpressionsusingthe“dotnotation”– AnalogoustoXPath“/”
• Arrayswithinobjects– Workonnestedarraydirectlyusingconstructslikedot-indexnotation,$elemMatch,$map,and$filter
– Or“unnest”anarraysoitselementsgetpairedwiththeownerobjectinturnforpipelineprocessing• Afundamentalconceptinworkingwithnesteddata
16
BasicMongoDBfind()
• Allbooksdb.bib.find()
• Bookswithtitle“FoundationsofDatabases”db.bib.find({ title: "Foundations of Databases" })
• Bookswhosetitlecontains“Database”or“database”andwhosepriceislowerthan$50db.bib.find({ title:/[dD]atabase/, price:{$lt:50} })
• Bookswithpricebetween$70and$100db.bib.find({$and:[{price:{$gte:70}}, {price:{$lte:100}}]})
– Bytheway,whywouldn’tthefollowingwork?db.bib.find({ price:{$gte:70}, price:{$lte:100} })
• BooksauthoredbyWidomdb.bib.find({ authors: "Widom" })
– Notetheimplicitexistentialquantification 17
• Assumedb referstothedatabaseanddb.bib referstothecollectionofbooks
• Add.toArray() atendtogetprettyoutput• YouneedtodothisforHomework3!
Nogeneral“twig”matching!
• Supposeforamomentpublisher isanobjectitself,withattributesname,state,andcountry
• Thefollowingquerywon’tgetyoudatabasebooksbyUSpublishers:db.bib.find({ title: /[dD]atabase/,
publisher: { country: "US" } })
– Instead,theconditiononpublisher issatisfiedonlyifitisanobjectwithexactlyoneattribute,andthisattributemustbenamedcountry andhasvalue"US"
– WhathappensisthatMongoDBcheckstheequalityagainst{country: "US"} asanobject,notasapattern!
18
Moreonnestedstructures
• DotnotationforXPath-likepathexpressions– Bookswheresomesubsectiontitlecontains“1.1”
db.bib.find({ "sections.sections.title": /1\.1/ })• Notewethatneedtoquotetheexpression• Again,iftheexpressionreturnsmultiplethings,theconditiononlyneedstoholdforatleastoneofthem
• Use$elemMatch toensurethatthesamearrayelementsatisfiesmultipleconditions,e.g.:db.bib.find({ sections: { $elemMatch: {
title: /Section/,"sections.title": /1\.1/
}}})
• Dotnotationforspecifyingarrayelements– BookswhosefirstauthorisAbiteboul
db.bib.find({ "authors.0": "Abiteboul" })• Note0-basedindexing;again,needtoquotetheexpression
19
find() withprojectionandsorting
• Listjustthebookpricesandnothingelsedb.bib.find({ price: { $exists: true } },
{ _id: 0, price: 1 })– The(optional)secondargumenttofind() specifieswhattoproject:1meanstoreturn,0meanstoomit• _id isreturnedbydefaultunlessotherwisespecified
• Listbooksbutnotsubsections,orderedbyISBNdb.bib.find({}, {"sections.sections":0}).sort({ISBN:1})
– Outputfromfind() isfurthersortedbysort() ,where1/-1meanascending/descendingorder
☞“Aggregationpipelines”(next)arebettersuitedforconstructingmorecomplexoutput
20
MongoDBaggregationpipeline
• Idea:thinkofaqueryasperformingasequenceof“stages,”eachtransforminganinputsequenceofJSONobjectstoanoutputsequenceofJSONobjects
• “Aggregation”isamisnomer:thereareallkindsofstages– Selection($match),projection($project),sorting($sort)
• Muchofwhichfind() andsort() alreadydo– Computing/addingattributeswithgeneralizedprojection($project/$addFields),unnesting embeddedarrays($unwind),andrestructuringoutput($replaceRoot)• Operatorstotransform/filterarrays($map/$filter)
– Join($lookup)– Groupingandaggregation($group)
• Operatorstoaggregate(e.g.,$sum)orcollectintoanarray($push)21
Example:ThecongressMongoDBdatabase
• Twocollections,people andcommittees– Eachobjectinpeople isalegislator• roles =arrayofobjects
– Eachobjectincommittees isacommittee• members =arrayofobjects• subcommittees =anarrayofsubcommitteeobjects,eachwithitsownmembers array• Eachmemberobject’sid fieldreferencesalegislator
_id
22
23
[{
"_id" : "B000944","birthday" : ISODate("1952-11-09T00:00:00Z"),"gender" : "M","name" : "Sherrod Brown","roles" : [{"district" : 13,"enddate" : ISODate("1995-01-03T00:00:00Z"),"party" : "Democrat","startdate" : ISODate("1993-01-05T00:00:00Z"),"state" : "OH","type" : "rep"
},{"district" : 13,"enddate" : ISODate("1997-01-03T00:00:00Z"),"party" : "Democrat","startdate" : ISODate("1995-01-04T00:00:00Z"),"state" : "OH","type" : "rep"
}, … …]
},… …
]
[{"_id" : "HSAG","displayname" : "House Committee on Agriculture","type" : "house","members" : [{"id" : "C001062","role" : "Chair"
},{"id" : "G000289"
}, … …],"subcommittees" : [{"code" : "15","displayname" : "Conservation and Forestry","members" : [{"id" : "L000491","role" : "Chair"
},{"id" : "T000467"
}, … …]
}, … …]
},… …
]
Selection/projection/sorting
FindRepublicanlegislators,outputonlytheirnameandgender,sortbyname
db.people.aggregate([{ $match: {
"roles.party": "Republican”} },{ $project: {
_id: false,name: true,gender: true
} },{ $sort: {
name: 1} }
])
24
• aggregate() takesanarrayofstages• Noteagainquotingthedotnatation• Noteagainthesemanticsofcomparinga
listofvalues:i.e.,thequeryfindslegislatorswhohaveeverservedrolesasRepublicans
Generalizedprojection
FindRepublicanlegislators,outputtheirname,gender,androlesasanarrayoftypes(sen orrep)
db.people.aggregate([{ $match: {
"roles.party": "Republican"} },{ $addFields: {
compact_roles: {$map: { input: "$roles",
as: "role",in: "$$role.type" }
}} },{ $project: {
_id: false,name: true,gender: true,roles: "$compact_roles"
} }])
25
• Use“: "$xxx" ”totellMongoDBtointerpretxxx asafieldinthe“current”objectinsteadofjustastringliteral
• In$map,as definesanewvariabletoloopoverelementsintheinput array
• Foreachinputelement,$mapcomputesthein expressionandappendsitsvaluetotheoutputarray• Use“: "$$xxx" ”totellMongoDB
thatxxx isanewvariablecreatedduringexecution(asopposedtoafieldinthecurrentobject)
Unnesting andrestructuring
Createalistofsubcommittees:foreach,simplydisplayitsnameandthenameofthecommitteeitbelongsto
db.committees.aggregate([{ $unwind: "$subcommittees" },{ $replaceRoot: { newRoot: {
committee: "$displayname",subcommittee: "$subcommittees.displayname”
} } }])
26
Foreachinputcommittee,$unwind loopsoveritssubcommittees array,oneelementatatime,andoutputsacopyofthecommitteeobject,withitssubcommittees valuereplacedwiththissingleelement
Join
Foreachcommittee(ignoreitssubcommittees),displayitsnameandthenameofitschairdb.committees.aggregate([{ $addFields: {
chair_member: { $filter: {input: "$members",as: "member",cond: { $eq: ["$$member.role",
"Chairman"] }} }
} },{ $lookup: {
from: "people",localField: "chair_member.id",foreignField: "_id",as: "chair_person"
} },{ $project: {
_id: false,name: "$displayname",chair: { $arrayElemAt:["$chair_person.name",0] }
} },])
27
• In$lookup,localField specifiestheattributeinthecurrentobjectwhosevaluewillbeusedforlookup
• from specifiesthecollectioninwhichtolookforjoiningobjects;foreignField specifiestheattributethereintobejoined
• $lookup createsanattributeinthecurrentobjectwiththenamespecifiedbyas,andsetsitvaluetoanarrayholdingalljoiningobjects
☞ Non-equalityjoinsarealsopossible,withmorecomplexsyntax
$arrayElemAt extractsanarrayelementbyitsindex("chair_person.0.name" doesn’tworkhere)
• $filter filtersinput arrayaccordingtocond andproducesandoutputarray
Groupingandaggregation
• Countlegislatorsbygender,andlistthenamesoflegislatorsforeachgender
db.people.aggregate([{ $group: {
_id: "$gender",count: { $sum: 1 },list: { $push: "$name" }
} }])
28
• Therequired_id specifiesthegroupingexpression,whosevaluebecomestheidentifyingattributeofoutputobjects(onepergroup)
• Otherattributesholdaggregatevalues,computedusingaggregationoperators• $sum computeatotalbyaddingeachinput• $push createsanarraybyappendingeachinput
Summaryanddiscussion
• JSONislikemuchmorelightweightversionofXML– Butperhapsnotasgoodformixedcontents
• WritingqueriesJSONissometimesconvenient,butconfusinginmanysituations
• Queryasaspipeline:lessdeclarative,butarguablyeasiertoimplement(especiallytoparallelize)
• Nestedstructuresrequiresmorequeryconstructs– $unwind stage,$elemMatch/$map/$filter/$push/$arrayElemAt operators,etc.– Distinctionbetweenthetop-levelandnestedarraysisannoying
• E.g.,$match stageand$filter operatorbasicallydothesamething• XQueryismuchnicerinthisregard(withabilitytonestqueriesinreturn)
☞ ThereisactuallyXQuery-likelanguageforJSONcalled“JSONiq,”butitremainslessknown
29