slides adapted from silberschatz et al. and ramakrishnan et al. cs 542 – database management...
Post on 21-Dec-2015
222 views
TRANSCRIPT
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
CS 542 – Database CS 542 – Database Management Systems Management Systems
Summer 2004Summer 2004
Kajal ClaypoolKajal Claypool
[email protected]@cs.wpi.edu
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Administrative StuffAdministrative Stuff
Class web page:Class web page: http://www.cs.wpi.edu/~kajal/courses/cs542-S04/indehttp://www.cs.wpi.edu/~kajal/courses/cs542-S04/inde
x.htmlx.html
Collaboration Policy:Collaboration Policy: Homework assignments are individual effortsHomework assignments are individual efforts
• Copying from Web, friends, old assignments, any source => Copying from Web, friends, old assignments, any source => cheating!cheating!
• ““Copying” -> taking anything verbatim, finding the main idea Copying” -> taking anything verbatim, finding the main idea and using it.and using it.
• 1st cheating offense -> F for assignment1st cheating offense -> F for assignment• 2nd cheating offense -> F for the course2nd cheating offense -> F for the course
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Introduction to Introduction to Database SystemsDatabase Systems
An Overview of DBMSAn Overview of DBMS
Chapter 1Chapter 1
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
OutlineOutline
Purpose of Database SystemsPurpose of Database Systems View of DataView of Data Data ModelsData Models Data Definition LanguageData Definition Language Data Manipulation LanguageData Manipulation Language Transaction ManagementTransaction Management Storage ManagementStorage Management Database AdministratorDatabase Administrator Database usersDatabase users Overall System ArchitectureOverall System Architecture
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Why study Databases?Why study Databases?
DatabasesDatabases are everywhere: are everywhere: Banking: all transactionsBanking: all transactions Airlines: reservations, schedulesAirlines: reservations, schedules Universities: registration, gradesUniversities: registration, grades Sales: customers, products, purchasesSales: customers, products, purchases Manufacturing: production, inventory, orders, supply chainManufacturing: production, inventory, orders, supply chain Human resources: employee records, salaries, tax deductionsHuman resources: employee records, salaries, tax deductions
DBMSDBMS brings together many different CS areas brings together many different CS areas OS, Algorithms, AI, logic, languages, multimediaOS, Algorithms, AI, logic, languages, multimedia
DBAs make a lot of money!DBAs make a lot of money!
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
What is a DBMS?What is a DBMS?
DBMS: Database Management SystemDBMS: Database Management System Collection of interrelated data Collection of interrelated data Set of sophisticated programs to access that dataSet of sophisticated programs to access that data
Database: Collection of data Database: Collection of data Usually information relevant to an enterpriseUsually information relevant to an enterprise Models the real worldModels the real world
• Entities (students and courses)Entities (students and courses)• Relationship between entities (a student takes a course)Relationship between entities (a student takes a course)
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Why a DBMS?Why a DBMS? In the early days, database applications In the early days, database applications
were built on top of file systemswere built on top of file systems Drawbacks of using file systems to store Drawbacks of using file systems to store
data:data: Data redundancy and inconsistencyData redundancy and inconsistency
• Format of one file may be different from format of another file!Format of one file may be different from format of another file! Difficulty in accessing dataDifficulty in accessing data
• Need special programs to now generate a list of all customers Need special programs to now generate a list of all customers in a particular postal-code areain a particular postal-code area
Data isolationData isolation• Because data may be scattered over various files, it becomes Because data may be scattered over various files, it becomes
hard for the programmer to grab the information from all hard for the programmer to grab the information from all locationslocations
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Problems (contd.)Problems (contd.) Integrity problemsIntegrity problems
Integrity constraints (e.g. account balance > 0) become part of program codeIntegrity constraints (e.g. account balance > 0) become part of program code Hard to add new constraints or change existing onesHard to add new constraints or change existing ones
• Example: Bank decides that savings balance mustExample: Bank decides that savings balance must always be greater than always be greater than $50.$50.
AtomicityAtomicity:: Certain actions must be treated as one. This is hard to do if data is scattered Certain actions must be treated as one. This is hard to do if data is scattered
over many files.over many files.• Example: if you withdraw money, the bank must withdraw and then update the Example: if you withdraw money, the bank must withdraw and then update the
balance as one operationbalance as one operation
Concurrent Access:Concurrent Access: Often for efficiency you may allow many users to access the files at the same Often for efficiency you may allow many users to access the files at the same
time. But you must guarantee that my money does not go into your account! time. But you must guarantee that my money does not go into your account! This is hard with files.This is hard with files.
Security:Security: Not every user should be able to see everything! Hard to control with files. Not every user should be able to see everything! Hard to control with files.
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
A DBMSA DBMS
Database systems offer solutions to all the Database systems offer solutions to all the above problemsabove problems
provide uniform structures for provide uniform structures for storagestorage of information of information provide mechanisms for provide mechanisms for manipulatingmanipulating the information the information ensure the ensure the safetysafety of the information stored despite system of the information stored despite system
crashes or attempts at unauthorized accesscrashes or attempts at unauthorized access avoid anomalousavoid anomalous results if data is to be shared among many results if data is to be shared among many
usersusers Their Goal: do all of this in a way that is both Their Goal: do all of this in a way that is both convenientconvenient
and and efficientefficient..
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Data StorageData Storage
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Information StorageInformation Storage
Remember “convenience” and “efficient” are two requirements Remember “convenience” and “efficient” are two requirements for a DBMS systemfor a DBMS system
To efficiently retrieve data , data is often stored as To efficiently retrieve data , data is often stored as complexcomplex data data structures. However, these complex data structures adversely structures. However, these complex data structures adversely effect the requirement “convenience of database system users”effect the requirement “convenience of database system users”
Solution: Solution: Data AbstractionData Abstraction Simplify the user’s interaction with the DBMSSimplify the user’s interaction with the DBMS
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Data AbstractionData Abstraction Physical LevelPhysical Level
Lowest levelLowest level Describes “how” data (a record) is actually storedDescribes “how” data (a record) is actually stored
Logical LevelLogical Level Next higher levelNext higher level Describes “what” data are stored in the database, and what relationships Describes “what” data are stored in the database, and what relationships
exist between those dataexist between those data Logical level user does not need to be aware of the complexity of the Logical level user does not need to be aware of the complexity of the
physical level structuresphysical level structures View LevelView Level
Highest level of abstractionHighest level of abstraction Only part of the entire database is visible, I.e., information that a user may Only part of the entire database is visible, I.e., information that a user may
need to seeneed to see Also provides a security mechanismAlso provides a security mechanism
• Example: a teller in a bank does not needExample: a teller in a bank does not need to see the salaries of all bank to see the salaries of all bank employees!employees!
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
View of DataView of Data
An architecture for a database system
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Data Abstraction (contd.)Data Abstraction (contd.) Example: a record in PascalExample: a record in Pascal
typetype customer = customer = recordrecordnamename : string; : string;streetstreet : string; : string;citycity : integer; : integer;
endend;;
Physical level: Physical level: Stored as contiguous blocksStored as contiguous blocks
Logical level:Logical level: As shown aboveAs shown above
View level:View level: A subsetA subset
typetype customer = customer = recordrecordnamename : string; : string;
endend;;
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Instances and SchemasInstances and Schemas Concept is similar to types and variables in programming Concept is similar to types and variables in programming
languageslanguages SchemaSchema – – the logical structure of the database the logical structure of the database
e.g., the database consists of information about a set of customers and e.g., the database consists of information about a set of customers and accounts and the relationship between themaccounts and the relationship between them
Analogous to type information of a variable in a programAnalogous to type information of a variable in a program Physical schemaPhysical schema: : database design at the physical leveldatabase design at the physical level Logical schemaLogical schema: : database design at the logical leveldatabase design at the logical level
InstanceInstance – – the actual content of the database at a particular point in time the actual content of the database at a particular point in time Analogous to the value of a variableAnalogous to the value of a variable
Physical Physical DataData Independence Independence – – the ability to modify the physical the ability to modify the physical schema without changing the logical schemaschema without changing the logical schema
Applications depend on the logical schemaApplications depend on the logical schema In general, the interfaces between the various levels and components should be In general, the interfaces between the various levels and components should be
well defined so that changes in some parts do not seriously influence others.well defined so that changes in some parts do not seriously influence others.
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Data ModelsData Models A collection of concepts for describing A collection of concepts for describing
data data data relationshipsdata relationships data semanticsdata semantics data constraintsdata constraints
Examples of Data Models:Examples of Data Models: Entity-Relationship modelEntity-Relationship model Relational modelRelational model Object-oriented modelObject-oriented model Semi-structured data models (XML)Semi-structured data models (XML) Older models: network model and hierarchical modelOlder models: network model and hierarchical model
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Entity-Relationship ModelEntity-Relationship Model E-R model of real worldE-R model of real world
Entities (objects) Entities (objects) • E.g. customers, accounts, bank branchE.g. customers, accounts, bank branch
Relationships between entitiesRelationships between entities• E.g. Account A-101 is held by customer JohnsonE.g. Account A-101 is held by customer Johnson• Relationship set Relationship set depositordepositor associates customers with accounts associates customers with accounts
Widely used for database designWidely used for database design Database design in E-R model usually converted to design in the Database design in E-R model usually converted to design in the
relational model (coming up next) which is used for storage and relational model (coming up next) which is used for storage and processingprocessing
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Relational ModelRelational Model
Example of tabular data in the relational modelExample of tabular data in the relational model
customer-name
Customer-idcustomer-street
customer-city
account-number
Johnson
Smith
Johnson
Jones
Smith
192-83-7465
019-28-3746
192-83-7465
321-12-3123
019-28-3746
Alma
North
Alma
Main
North
Palo Alto
Rye
Palo Alto
Harrison
Rye
A-101
A-215
A-201
A-217
A-201
Attributes
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
A Sample Relational A Sample Relational DatabaseDatabase
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Data ManipulationData Manipulation
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Data Definition Language Data Definition Language (DDL)(DDL)
Specification notation for defining the database schemaSpecification notation for defining the database schema E.g. E.g.
create tablecreate table accountaccount ( ( account-numberaccount-number charchar(10),(10), balancebalance integerinteger))
DDL compiler generates a set of tables stored in a DDL compiler generates a set of tables stored in a data dictionarydata dictionary Data dictionary contains metadata (i.e., data about data)Data dictionary contains metadata (i.e., data about data)
e.g.: schema of a table is metadatae.g.: schema of a table is metadata Data Data storage and definitionstorage and definition language language
language in which the storage structure and access methods used by language in which the storage structure and access methods used by the database system are specifiedthe database system are specified
Usually an extension of the data definition languageUsually an extension of the data definition language
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Data Manipulation Language Data Manipulation Language (DML)(DML)
Language for accessing and manipulating the data organized by the Language for accessing and manipulating the data organized by the appropriate data modelappropriate data model
DML also known as query languageDML also known as query language It can:It can:
• Retrieve information stored in the databaseRetrieve information stored in the database• Insert new informationInsert new information• Delete existing informationDelete existing information• Modify existing informationModify existing information
Two classes of languages Two classes of languages Procedural – Procedural –
• user specifies what data is required and how to get those data user specifies what data is required and how to get those data Nonprocedural (or Declarative)Nonprocedural (or Declarative)
• user specifies what data is required without specifying how to get those datauser specifies what data is required without specifying how to get those data
The portion of DML that is involved with information retrieval is called a The portion of DML that is involved with information retrieval is called a query languagequery language
SQL is the most widely used query languageSQL is the most widely used query language The DML component of SQL is nonproceduralThe DML component of SQL is nonprocedural
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
SQLSQL SQL: widely used non-procedural languageSQL: widely used non-procedural language
E.g. find the name of the customer with customer-id 192-83-7465E.g. find the name of the customer with customer-id 192-83-7465selectselect customer.customer-namecustomer.customer-namefromfrom customercustomerwherewhere customer.customer-idcustomer.customer-id = ‘192-83-7465’ = ‘192-83-7465’
E.g. find the balances of all accounts held by the customer with customer-id 192-E.g. find the balances of all accounts held by the customer with customer-id 192-83-746583-7465
selectselect account.balanceaccount.balancefromfrom depositordepositor, , accountaccountwherewhere depositor.customer-iddepositor.customer-id = ‘192-83-7465’ = ‘192-83-7465’ andand depositor.account-number = account.account-numberdepositor.account-number = account.account-number
Application programs generally access databases through one ofApplication programs generally access databases through one of Language extensions to allow embedded SQLLanguage extensions to allow embedded SQL Application program interface (e.g. ODBC/JDBC) which allow SQL queries to be Application program interface (e.g. ODBC/JDBC) which allow SQL queries to be
sent to a databasesent to a database
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
SafetySafety
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Database UsersDatabase Users
Users are differentiated by the way they expect to interact with Users are differentiated by the way they expect to interact with the systemthe system
Application programmers –Application programmers – interact with system through DML callsinteract with system through DML calls
Sophisticated users – Sophisticated users – form requests in a database query languageform requests in a database query language
Specialized users – Specialized users – write specialized database applications that do not fit into the write specialized database applications that do not fit into the
traditional data processing frameworktraditional data processing framework Naïve users – Naïve users –
invoke one of the permanent application programs that have been invoke one of the permanent application programs that have been written previouslywritten previously
E.g. people accessing database over the web, bank tellers, clerical E.g. people accessing database over the web, bank tellers, clerical staffstaff
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Database AdministratorDatabase Administrator
Coordinates all the activities of the database system; the Coordinates all the activities of the database system; the database administrator has a good understanding of the database administrator has a good understanding of the enterprise’s information resources and needs.enterprise’s information resources and needs.
Database administrator's duties include:Database administrator's duties include: Schema definitionSchema definition Storage structure and access method definitionStorage structure and access method definition Schema and physical organization modificationSchema and physical organization modification Granting user authority to access the databaseGranting user authority to access the database Specifying integrity constraintsSpecifying integrity constraints Acting as liaison with usersActing as liaison with users Monitoring performance and responding to changes in requirementsMonitoring performance and responding to changes in requirements
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Safe SharingSafe Sharing
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Concurrency ControlConcurrency Control
Concurrent execution of user programs is essential for good Concurrent execution of user programs is essential for good DBMS performance.DBMS performance.
Because disk accesses are frequent, and relatively slow, it is Because disk accesses are frequent, and relatively slow, it is important to keep the cpu humming by working on several user important to keep the cpu humming by working on several user programs concurrentlyprograms concurrently
Interleaving actions of different user programs can lead to Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is inconsistency: e.g., check is cleared while account balance is being computedbeing computed
DBMS ensures that such problems do not arise. Users can DBMS ensures that such problems do not arise. Users can pretend that they are using a single-user system.pretend that they are using a single-user system.
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Transaction: Transaction: An Execution of a DB ProgramAn Execution of a DB Program
Key concept is Key concept is transactiontransaction:: An atomic sequence of database actions (read/write)An atomic sequence of database actions (read/write)
Each transaction, executed completely, must leave the database in Each transaction, executed completely, must leave the database in a a consistentconsistent state if the DB is consistent when the transaction state if the DB is consistent when the transaction begins.begins.
Users can specify some simple integrity constraints on the data, and the DBMS Users can specify some simple integrity constraints on the data, and the DBMS will enforce these constraintswill enforce these constraints
Beyond this, the DBMS does not really understand the semantics of your data Beyond this, the DBMS does not really understand the semantics of your data (eg. It does not understand how the interest is calculated on the account (eg. It does not understand how the interest is calculated on the account balance)balance)
Thus, ensuring that the transaction (run alone) preserves consistency is Thus, ensuring that the transaction (run alone) preserves consistency is ultimately the user’s responsibility!ultimately the user’s responsibility!
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Scheduling Concurrent Scheduling Concurrent TransactionsTransactions
DBMS ensures that the execution of {T1, T2….Tn} is equivalent DBMS ensures that the execution of {T1, T2….Tn} is equivalent to some serial execution of T1’, T2’…Tn’to some serial execution of T1’, T2’…Tn’
Before reading/writing an object, a transaction requests a lock on the object, Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of and waits till the DBMS gives it the lock. All locks are released at the end of the transaction.the transaction.
This is called Strict 2PL locking protocolThis is called Strict 2PL locking protocol
Main Idea:Main Idea: If an action of Ti (say writing X) affects Tj (which perhaps reads X), one of If an action of Ti (say writing X) affects Tj (which perhaps reads X), one of
them say Ti will obtain the lock first and Tj is forced to wait until Ti them say Ti will obtain the lock first and Tj is forced to wait until Ti completes; this effectively orders the transactionscompletes; this effectively orders the transactions
What if Tj already has a lock on Y and Ti requests Y? What if Tj already has a lock on Y and Ti requests Y?
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Ensuring AtomicityEnsuring Atomicity DBMS ensures atomicity, all or nothing, even if the system crashes DBMS ensures atomicity, all or nothing, even if the system crashes
in the middle of a Xact.in the middle of a Xact. Idea:Idea:
Keep a log (history) of all actions carried out by the DBMS while Keep a log (history) of all actions carried out by the DBMS while executing a set of Xactsexecuting a set of Xacts
• Before a change is made to the database, the corresponding log entry is Before a change is made to the database, the corresponding log entry is made and forced into a safe locationmade and forced into a safe location
• After a crash, the effects of partially executed Xacts are undone using the After a crash, the effects of partially executed Xacts are undone using the log.log.
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Overall System Structure Overall System Structure
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
SummarySummary
DBMS used to maintain and query large datasetsDBMS used to maintain and query large datasets Benefits include recovery from system crashes, concurrent Benefits include recovery from system crashes, concurrent
access, quick application development, data integrity and access, quick application development, data integrity and securitysecurity
Levels of abstraction give data independenceLevels of abstraction give data independence A DBMS typically has a layered architectureA DBMS typically has a layered architecture DBAs are one of the highest paid computer jobs and still in high DBAs are one of the highest paid computer jobs and still in high
demand!demand! DBMS R&D is one of biggest research areas in industry DBMS R&D is one of biggest research areas in industry
(Microsoft, IBM, Oracle) and in academia!(Microsoft, IBM, Oracle) and in academia!
Slides adapted from Silberschatz et al. and RamSlides adapted from Silberschatz et al. and Ramakrishnan et al.akrishnan et al.
Where to next?Where to next?
Storage and Manipulation of DataStorage and Manipulation of Data Storage at the Logical levelStorage at the Logical level
• Chapter 2Chapter 2 Manipulation of Data Manipulation of Data
• Chapter 3Chapter 3