dbms introduction

20
1 Lecture 1: Introduction to databases Timothy G. Griffin Easter Term 2008    IB/Dip/IIG www.cl.cam.ac.uk/Teaching/current/Databases/

Upload: shikha-yadav

Post on 12-Oct-2015

28 views

Category:

Documents


1 download

DESCRIPTION

DBMS

TRANSCRIPT

  • Lecture 1:Introduction to databasesTimothy G. Griffin Easter Term 2008 IB/Dip/IIG

    www.cl.cam.ac.uk/Teaching/current/Databases/

  • Database PrehistoryData entry Storage and retrieval Query processing Sorting

  • Early AutomationData management and application code were all tangled togetherHard to modifyHard to generalizeMany competing approachesData manipulation code written at very low levels of abstraction

  • Our Hero --- E. F. CoddEdgar F. "Ted" Codd ( August 23, 1923 - April 18, 2003) was a British computer scientist who invented relational databases while working for IBM. He was born in Portland, Dorset, studied mathsand chemistry at Oxford. He was a pilot in the Royal Air Force during WWII. In 1948 he joined IBMin New York as a mathematical programmer. He fled the USA to Canada during the McCarthy period. Later, he returned to the USA to earn a doctorate in CSfrom the University of Michigan in Ann Arbor. He then joined IBM research in San Jose.

    His 1970 paper A Relational Model of Data for Large Shared Data Banks changed everything.

    In the mid 1990s he coined the term OLAP.

  • Database Management Systems (DBMSs)Raw Resources (bare metal) DBMSYour Applications Go HereDatabase abstractionsallow this interface to be cleanly defined and this allows applications and data management systems to be implemented separately.

  • Today, Database Systems are Ubiquitous Data Distrib.Service ToolsEnd UsersService DBProduction DBDevelopmentDBSubmittersSubmission toolsAdd value(computation)Add value (review etc.)Data exchangeOther archivesQ/C etcDatabasedesignReleases&UpdatesReleases&UpdatesDatabase system design from the European Bioinformatics Institute (Hinxton UK)

  • What is a database system?A database is a large, integrated collection of dataA database contains a model of something!A database management system (DBMS) is a software system designed to store, manage and facilitate access to the database

  • What does a database system do?Manages Very Large Amounts of DataSupports efficient access to Very Large Amounts of DataSupports concurrent access to Very Large Amounts of DataSupports secure, atomic access to Very Large Amounts of Data

  • Databases are a Rich Area for Computer ScienceProgramming languages and software engineering (obviously)Data structures and algorithms (obviously)Logic, discrete maths, computation theorySome of todays most beautiful theoretical results are in finite model theory --- an area derived directly from database theorySystems problems: concurrency, operating systems, file organisation, networks, distributed systems Many of the concepts covered in this course are classical --- they form the heart of the subject. But the field of databases is still evolving andproducing new and interesting research (hinted at in lectures 11 & 12).

  • What this course is aboutAccording to Ullman, there are three aspects to studying databases:Modelling and design of databasesProgrammingDBMS implementationThis course addresses 1 and 2

  • Course OutlineIntroductionEntity-Relationship ModelThe Relational Model The Relational AlgebraThe Relational CalculusSchema refinement: Functional dependenciesSchema refinement: NormalisationTransactionsOnline Analytical Processing (OLAP) More OLAPBasic SQL and Integrity ConstraintsFurther relational algebra, further SQL

  • Recommended ReadingDate, An introduction to database systems, 8th ed.Elmasri & Navathe, Fundamentals of database systems, 4th ed.Silberschatz, Korth & Sudarshan, Database system concepts, 4th ed.Ullman & Widom, A first course in database systems.OLAPDB2/400: Mastering Data Warehousing Functions. (IBM Redbook) Chapters 1 & 2 only. http://www.redbooks.ibm.com/abstracts/sg245184.html

    Data Warehousing and OLAP Hector Garcia-Molina (Stanford University) http://www.cs.uh.edu/~ceick/6340/dw-olap.ppt Data Warehousing and OLAP Technology for Data Mining Department of Computing London Metropolitan University http://learning.unl.ac.uk/csp002n/CSP002N_wk2.ppt

  • Some systems to play withmysql:www.mysql.orgOpen source, quite powerfulPostgreSQL:www.postgresql.orgOpen source, powerfulMicrosoft Access:Simple system, lots of nice GUI wrappersCommercial systems:Oracle 10g (www.oracle.com)SQL Server 2000 (www.microsoft.com/sql)DB2 (www.ibm.com/db2)

  • Database system architectureIt is common to describe databases in two waysThe logical level:What users see, the program or query language interface, The physical level:How files are organised, what indexing mechanisms are used, It is traditional to split the logical level into two: overall database design (conceptual) and the views that various users get to seeA schema is a description of a database

  • Three-level architectureConceptualSchemaInternalSchemaExternalSchema 1ExternalSchema 2ExternalSchema n

  • Logical and physical data independenceData independence is the ability to change the schema at one level of the database system without changing the schema at the next higher levelLogical data independence is the capacity to change the conceptual schema without changing the user viewsPhysical data independence is the capacity to change the internal schema without having to change the conceptual schema or user views

  • Database design processRequirements analysisUser needs; what must database do?Conceptual designHigh-level description; often using E/R modelLogical designTranslate E/R model into (typically) relational schemaSchema refinementCheck schema for redundancies and anomaliesPhysical design/tuningConsider typical workloads, and further optimiseNext Lecture

  • The Fundamental Tradeoff of Database Performance TuningDe-normalized data can often result in faster query responseNormalized data leads to better transaction throughput, and avoids update anomalies (corruption of data integrity)

    What is more important in your database --- query responseor transaction throughput? The answer will vary. What do the extreme ends of the spectrum look like? Yes, indexing data can speed up transactions, but this just proves the point --- an index IS redundant data. General rule of thumb: indexing will slow down transactions!

  • A Theme of this Course:OLTP vs. OLAP OLTP = Online Transaction ProcessingNeed to support many concurrent transactions (updates and queries) Normally associated with the operational database that supports day-to-day activities of an organization.OLAP = Online Analytic Processing Often based on data extracted from operational database, as well as other sourcesUsed in long-term analysis, business trends.

  • Design Heterogeneity Data Distrib.Service ToolsEnd UsersService DBProduction DBDevelopmentDBSubmittersSubmission toolsAdd value(computation)Add value (review etc.)Data exchangeOther archivesQ/C etcDatabasedesignReleases&UpdatesReleases&UpdatesDatabase system design from the European Bioinformatics Institute (Hinxton UK) De-normalized Derived Tables--- for fast accessNormalized Tables