cs 4604: introduction to database management...
TRANSCRIPT
CS4604:IntroductiontoDatabaseManagementSystems
B.AdityaPrakashLecture#1:Introduction
CourseInformation§ InstructorB.AdityaPrakash,Torg3160F,[email protected]– OfficeHours:12noon-1pmTuesdaysandThursdays– Andbyappointment– IncludestringCS4604insubjectinanyemailyousendme
§ TeachingAssistantsDeepikaRamasubramanian,McBryde106,[email protected]– OfficeHours:Wednesday:9-10:30am.Friday:10:30am-12noon
§ ClassMeetingTimeTuesdaysandThursdays,9:30-10:45am,McB113
§ KeepinginTouchCoursewebsitehttp://courses.cs.vt.edu/~cs4604updatedregularlythroughthesemester– Piazzalinkonthewebsite
Prakash2018 VTCS4604 2
Textbook§ RequiredDatabaseManagementSystems,byRaghuRamakrishnanandJohannesGehrke.3rdEd.McGrawHill.Webpageforthebook(witherrata)http://pages.cs.wisc.edu/~dbbook/
§ Optional:– Garcia-Molina,UllmanandWidom,3rdEd.– Silberschatz,KorthandSudarshan,6thEd.
Prakash2018 VTCS4604 3
CourseGrading
§ Projectisspreadover3deliverables§ Submithardcopiesofhomeworksandprojectassignmentsatthestartofclassontheduedate
§ Eachclasshasrequiredreading(oncoursewebpage)
§ NoPop-QuizzesJ
Homework 30% 6-7
Midtermexam 20% (Tentative)Oct4,classtime
Finalexam 30% Dec7,Friday
Courseproject 20% 3assignments
Prakash2018 VTCS4604 4
CourseProject
§ Wewillputprojectoverviewlater(firstprojectassignment)
§ 2,or3personsperproject.§ Projectrunstheentiresemesterwithregularassignmentsandafinalimplementationassignment.
Prakash2018 VTCS4604 5
ClassPolicies§ Makesureyougothroughthedetailedpoliciesonwebsite:
http://courses.cs.vt.edu/~cs4604/Fall18/policies.html
§ Lectures:Informmeinadvance,ifyouhavetoleaveaclassearlyorcomelateforanyreason.
§ Latepolicy:4‘slip’days(tobeusedonlyforHWsnotproject)
§ Howtosubmitlate:seewebpage§ Exams:noaidsallowed,except:– 1pagewithyournotes(bothsides),forthemidterm– 2suchpages,forthefinal
Prakash2018 VTCS4604 6
WhyStudyDatabases?§ Academic– Databasesinvolvemanyaspectsofcomputerscience– Fertileareaofresearch– ThreeTuringawardsindatabases
§ Programmer– aplethoraofapplicationsinvolveusingandaccessingdatabases
§ Businessman– Everybodyneedsdatabases=>lotsofmoneytobemade
§ Student– GetthoselastthreecreditsandIdon’thavetocomebacktoBlacksburgeveragain!
– Google,Oracle,Microsoft,Facebooketc.willhireme!– Databasessoundcool!– ???
Prakash2018 VTCS4604 7
WhatWillYouLearninCS4604?§ Implementation
– Whatisunder-the-hoodofaDBlikeOracle/MySQL?§ Design
– Howdoyoumodelyourdataandstructureyourinformationinadatabase?
§ Programming– HowdoyouusethecapabilitiesofaDBMS?
§ CS4604achievesabalancebetween– afirmtheoreticalfoundationtodesigningmoderate-sizeddatabases
– creating,querying,andimplementingrealisticdatabasesandconnectingthemtoapplications
Prakash2018 VTCS4604 8
CourseOutline§ Weeks1–4:Query/
ManipulationLanguagesandDataModeling– RelationalAlgebra– Datadefinition– ProgrammingwithSQL– Entity-Relationship(E/R)approach
– SpecifyingConstraints– GoodE/Rdesign
§ Weeks5–8:Indexes,ProcessingandOptimization– Storing– Hashing/Sorting– QueryOptimization– NoSQLandHadoop
§ Week9-10:RelationalDesign– FunctionalDependencies– Normalizationtoavoidredundancy
§ Week11-12:ConcurrencyControl– Transactions– LoggingandRecovery
§ Week13–14:Students’choice– PracticeProblems– XML– Dataminingandwarehousing
Prakash2018 VTCS4604 9
WhatisthegoalofaDBMS?
§ Electronicrecord-keepingFastandconvenientaccesstoinformation§ DBMS==databasemanagementsystem– `Relational’inthisclass– data+setofinstructionstoaccess/manipulatedata
Prakash2018 VTCS4604 10
WhatisaDBMS?§ FeaturesofaDBMS– Supportmassiveamountsofdata– Persistentstorage– Efficientandconvenientaccess– Secure,concurrent,andatomicaccess
§ Examples?– Searchengines,bankingsystems,airlinereservations,corporaterecords,payrolls,salesinventories.
– Newapplications:Wikis,social/biological/multimedia/scientific/geographicdata,heterogeneousdata.
Prakash2018 VTCS4604 11
FeaturesofaDBMS• Supportmassiveamountsofdata
– Giga/tera/petabytes– Fartoobigformainmemory
• Persistentstorage– Programsupdate,query,manipulatedata.– Datacontinuestolivelongafterprogramfinishes.
• Efficientandconvenientaccess– Efficient:donotsearchentiredatabasetoansweraquery.– Convenient:allowuserstoquerythedataaseasilyaspossible.
• Secure,concurrent,andatomicaccess– Allowmultipleuserstoaccessdatabasesimultaneously.– Allowauseraccesstoonlytoauthorizeddata.– Providesomeguaranteeofreliabilityagainstsystemfailures.
Prakash2018 VTCS4604 12
ExampleScenario
§ Students,takingclasses,obtaininggrades– FindmyGPA– <andotherad-hocqueries>
Prakash2018 VTCS4604 13
Obvioussolution1:Folders
§ Advantages?– Cheap;Easy-to-use
§ Disadvantages?– Noad-hocqueries– Nosharing– LargePhysicalfoot-print
Prakash2018 VTCS4604 14
ObviousSolution++
§ FlatfilesandC(C++,Java…)programs– E.g.one(ormore)UNIX/DOSfiles,withstudentrecordsandtheircourses
Prakash2018 VTCS4604 15
ObviousSolution++
§ Layoutforstudentrecords?– CSV(‘comma-separated-values’) Hermione Grainger,123,Potions,A
Draco Malfoy,111,Potions,B
Harry Potter,234,Potions,A
Ron Weasley,345,Potions,C
Prakash2018 VTCS4604 16
ObviousSolution++
§ Layoutforstudentrecords?– OtherpossibilitieslikeHermione Grainger,123 123,Potions,A
Draco Malfoy,111 111,Potions,B
Harry Potter,234 234,Potions,A
Ron Weasley,345 345,Potions,C
Prakash2018 VTCS4604 17
Problems?
§ inconvenientaccesstodata(need‘C++’expertize,plusknowledgeoffile-layout)– dataisolation
§ dataredundancy(andinconsistencies)§ integrityproblems§ atomicityproblems§ concurrent-accessproblems§ securityproblems§ …….
Prakash2018 VTCS4604 18
Problems-Why?
§ Twomainreasons:– file-layoutdescriptionisburiedwithintheCprogramsand
– thereisnosupportfortransactions(concurrencyandrecovery)
Prakash2018 VTCS4604 19
DBMSshandleexactlythesetwoproblems
ExampleScenario§ RDBMS=“Relational”DBMS§ Therelationalmodelusesrelationsortablestostructuredata§ ClassListrelation:
§ Relationseparatesthelogicalview(externals)fromthephysicalview(internals)
§ Simplequerylanguages(SQL)foraccessing/modifyingdata– FindallstudentswhosegradesarebetterthanB.– SELECTStudentFROMClassListWHEREGrade>“B”
Student Course Grade
HermioneGrainger Potions A
DracoMalfoy Potions B
HarryPotter Potions A
RonWeasley Potions C
Prakash2018 VTCS4604 20
DBMSArchitecture
Prakash2018 VTCS4604 21
TransactionProcessing§ Oneormoredatabaseoperationsaregroupedintoa“transaction”
§ Transactionsshouldmeetthe“ACIDtest”– Atomicity:All-or-nothingexecutionoftransactions.
– Consistency:Databaseshaveconsistencyrules(e.g.whatdataisvalid).AtransactionshouldNOTviolatethedatabase’sconsistency.Ifitdoes,itneedstoberolledback.
– Isolation:Eachtransactionmustappeartobeexecutedasifnoothertransactionisexecutingatthesametime.
– Durability:Anychangeatransactionmakestothedatabaseshouldpersistandnotbelost.
Prakash2018 VTCS4604 22
Prakash2018 VTCS4604 23
Disadvantagesover(flat)files?
Prakash2018 VTCS4604 24
Disadvantagesover(flat)files
§ Price§ additionalexpertise(SQL/DBA)(hence:over-killforsmall,single-userdatasetsBut:mobilephones(eg.,android)usesqlite)
ABriefHistoryofDBMS§ Theearliestdatabases(1960s)evolvedfromfilesystems
– Filesystems• Allowstorageoflargeamountsofdataoveralongperiodoftime• Filesystemsdonotsupport:
– Efficientaccessofdataitemswhoselocationinaparticularfileisnotknown
– Logicalstructureofdataislimitedtocreationofdirectorystructures– Concurrentaccess:Multipleusersmodifyingasinglefilegeneratenon-uniformresults
• Navigationalandhierarchical• UserprogrammedthequeriesbywalkingfromnodetonodeintheDBMS.
§ RelationalDBMS(1970stonow)– Viewdatabaseintermsofrelationsortables– High-levelqueryanddefinitionlanguagessuchasSQL– Allowusertospecifywhat(s)hewants,nothowtogetwhat(s)hewants
§ Object-orientedDBMS(1980s)– Inspiredbyobject-orientedlanguages– Object-relationalDBMS
Prakash2018 VTCS4604 25
TheDBMSIndustry§ ADBMSisasoftwaresystem.
§ MajorDBMSvendors:Oracle,Microsoft,IBM,Sybase
§ Free/Open-sourceDBMS:MySQL,PostgreSQL,Firebird.– UsedbycompaniessuchasGoogle,Yahoo,Lycos,BASF….
§ Allare“relational”(or“object-relational”)DBMS.
§ Amulti-billiondollarindustry
Prakash2018 VTCS4604 26
Prakash2018 VTCS4604 27
Fundamentalconcepts
§ 3-levelarchitecture§ logicaldataindependence§ physicaldataindependence
Prakash2018 VTCS4604 28
3-levelarchitecture
§ viewlevel§ logicallevel§ physicallevel
v1 v2v3
Prakash2018 VTCS4604 29
3-levelarchitecture
§ viewlevel§ logicallevel:eg.,tables– STUDENT(ssn,name)– TAKES(ssn,cid,grade)
§ physicallevel:– howarethesetablesstored,howmanybytes/attributeetc
Prakash2018 VTCS4604 30
3-levelarchitecture
§ viewlevel,eg:– v1:selectssnfromstudent– v2:selectssn,c-idfromtakes
§ logicallevel§ physicallevel
Prakash2018 VTCS4604 31
3-levelarchitecture
§ ->hence,physicalandlogicaldataindependence:
§ logicalD.I.:– ???
§ physicalD.I.:– ???
Prakash2018 VTCS4604 32
3-levelarchitecture
§ ->hence,physicalandlogicaldataindependence:
§ logicalD.I.:– canadd(drop)column;add/droptable
§ physicalD.I.:– canaddindex;changerecordorder
Prakash2018 VTCS4604 33
Databaseusers
§ ‘naive’users§ casualusers§ applicationprogrammers§ [DBA(Databaseadministrator)]
Prakash2018 VTCS4604 34
Casualusers
DBMS
data
andmeta-data=catalog
select*fromstudent
Prakash2018 VTCS4604 35
``Naive’’users
Pictorially:
DBMS
data
andmeta-data=catalog
app.(eg.,reportgenerator)
Prakash2018 VTCS4604 36
App.programmers
§ thosewhowritetheapplications(likethe‘reportgenerator’)
Prakash2018 VTCS4604 37
DBAdministrator(DBA)
§ Duties?
Prakash2018 VTCS4604 38
DBAdministrator(DBA)
§ schemadefinition(‘logical’level)§ physicalschema(storagestructure,accessmethods
§ schemasmodifications§ grantingauthorizations§ integrityconstraintspecification
Prakash2018 VTCS4604 39
Overallsystemarchitecture
§ [Users]§ DBMS– queryprocessor– storagemanager– transactionmanager
§ [Files]
Prakash2018 VTCS4604 40
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
Prakash2018 VTCS4604 41
Overallsystemarchitecture
§ queryprocessor– DMLcompiler– embeddedDMLpre-compiler– DDLinterpreter– Queryevaluationengine
Prakash2018 VTCS4604 42
Overallsystemarchitecture(cont’d)
§ storagemanager– authorizationandintegritymanager– transactionmanager– buffermanager– filemanager
Prakash2018 VTCS4604 43
Overallsystemarchitecture(cont’d)
§ Files– datafiles– datadictionary=catalog(=meta-data)– indices– statisticaldata
Prakash2018 VTCS4604 44
Someexamples:
§ DBAdoingaDDL(datadefinitionlanguage)operation,eg.,createtablestudent...
Prakash2018 VTCS4604 45
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
Prakash2018 VTCS4604 46
Someexamples:
§ casualuser,askingforanupdate,eg.:updatestudentsetnameto‘smith’wheressn=‘345’
Prakash2018 VTCS4604 47
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
Prakash2018 VTCS4604 48
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
Prakash2018 VTCS4604 49
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
Prakash2018 VTCS4604 50
Someexamples:
§ app.programmer,creatingareport,egmain(){....execsql“select*fromstudent”...}
Prakash2018 VTCS4604 51
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
pgm(src)
Prakash2018 VTCS4604 52
Someexamples:
§ ‘naive’user,runningthepreviousapp.
Prakash2018 VTCS4604 53
DDLint.DMLproc.
queryeval.app.pgm(o)
trans.mgr
emb.DML
buff.mgr filemgr
data meta-data
queryproc.
storagemgr.
naive app.pgmr casual DBA users
pgm(src)
Prakash2018 VTCS4604 54
Conclusions
§ (relational)DBMSs:electronicrecordkeepers§ customizethemwithcreatetablecommands§ askSQLqueriestoretrieveinfo
Prakash2018 VTCS4604 55
Conclusionscontd
mainadvantagesover(flat)files&scripts:§ logical+physicaldataindependence(ie.,flexibilityofaddingnewattributes,newtablesandindices)
§ concurrencycontrolandrecovery