basic concepts course: 03-60-315cs315.joan.myweb.cs.uwindsor.ca/documents/courseware/60-315...
TRANSCRIPT
-
Basic Concepts
Course: 03-60-315
Dr. Joan Morrissey School of Computer Science
University of Windsor Windsor, Canada
-
2
Some Basic Concepts
DATA BASE: collection of related data - facts that can be recorded and
have an explicit meaning.
Examples:
• Names, phone numbers, email addresses of people you know. (Small)
• A library, a bank or a university. (Large)
A database can be centralized e.g. SIS, faculty payroll at the U of W or
distributed e.g. RBC which has branches all over the world.
In a distributed system, each local location keeps local data but can
participate in a global query such as “How much in mortgage loans do we
have for all locations?” We will not be looking at distributed systems.
-
3
Basic Concepts……..
Generally, we define a database to be a set of related data with the
following characteristics:
• Data is related not random – it is logically coherent. There are no irrelevant details in the database - so shouldn’t put weather and student data in the one database.
• Designed, built and populated for a specific group of users and applications – e.g. students and SIS. Here the students are the users and SIS is the application (or program) used by students.
• Represents some aspect of the real world often called a miniworld or Universe of Discourse (UoD). For example, a banking database – nothing stored only banking related info.
• Can be of any size and complexity – personal contact list (small and not complex) or Revenue Canada ( very large and very complex as tax rules change frequently and data is stored for many years).
-
4
DBMS – what it allows us to do…
DBMS: collection of software for designing and maintaining DB. Allows
the following:
• Data to be described – what records, what fields, what data types, files, etc. Data is described using a Data Definition Language (DDL).
– We have to describe the data format to the system so that we can retrieve it later. The description of the data is called the schema or metadata.
• Data to be physically stored – with efficient file structures. Need to store large amounts of data for a long period of time – securely.
– All file structures allow for the efficient non-volatile storage of our data.
– Note that any industrial DB will never fit into RAM and it would not give permanent storage anyway. Data is buffered to and from RAM for processing.
– Solid state drives ? Degrade quickly with a large number of R & W’s.
-
5
DBMS – what it allows us to do…
• Queries, both interactive (e.g. use of SIS, form based) and application programs (e.g. payroll). Also known as Data Manipulation Language (DML).
– Use query language (with SQL) e.g. forms for interactive queries such as looking at your grades on SIS.
– Use Java/C++ with embedded SQL for application programs such as payroll.
– Could also use PL/SQL. More efficient but more complicated.
– Note that in SQL there is a difference between the DDL and the DML but it’s not that important. One is used to define your DB, the other is used to retrieve data from the stored DB.
• Updates – including deletes (delete a student from a course via VW), addition (add students) and modifications (e.g. change of all grades after a final exam) to the data.
-
6
DBMS – what it allows us to do…
• Concurrent access by many users – without data corruption.
– We want several users to be able to use the database at once but it must be done correctly. One user can’t be allowed to interfere with another and have incorrect data as a result.
– Done by locking the database at different levels with different types of locks. Also need transaction control….reads and writes must be done in the correct order.
To recap: a DBMS allows us to do 5 different things:
1. Describe our data.
2. Physically store our data.
3. Query our data.
4. Add, delete and modify data.
5. Provides safe, correct concurrent access by multiple users to the same data.
-
7
Basic Concepts…..
Database System:
• DB + description of the data (description is also called meta-data)
• DBMS – the software which controls the DB.
• End users – e.g. professor checking student record. Usually done interactively.
• Applications software/programs – e.g. payroll, SIS, eCV, eGRADE, updating of DARS and so on. Applications are just programs that perform some task.
MUST UNDERSTAND WHAT LIFE WAS LIKE BEFORE DBMSs
TO REALLY APPRECIATE THEM !!!!
Think of life before cell phones, iPods, wireless internet....
-
8
Traditional File Processing (TFP) …early 1980s!
Program 1 Payroll
Program 2 SIS
Program 3 Finances
.... and so on - many programs. Interface very basic – as was output!
Code Queries
Metadata & file structures
Code Queries
Metadata & file structures
Code Queries
Metadata & file structures
.
Data Data Data
Data file might be the same in many programs!
Problem !!
Output Output Output
-
9
Database Management Systems now! Like Oracle
DBMS software
All application programs needed e.g. payroll, SIS, eGrade , eCV and so on. Written in, for example C++ and using embedded SQL commands. Might also be written in PL/SQL.
Metadata, file structures + one copy of the data
All interactive users using, for example, SIS. And people running applications like payroll.
Output
-
10
Databases Vs. Traditional File Processing (TFP)
Databases evolved from simple file systems .....from about the late 1980s
• TFP = use traditional programming languages, usually Cobol, to maintain & process files.
TFP does not allow just one description of the data or one single file
structure. The descriptions of the data and the file structures were coded
into each program! What a waste of time
• Coding of schema and file structure repeated for every program written! This is the opposite of having meta data where we have just one coding of the data and file structures. Furthermore, if a better file structure became available, then all programs needed to be changed or remain out of date.
• Meta data allows a DBMS to be general purpose. System is independent of data to be stored. That is, you can buy a product like Oracle “off the shelf”, insert your data, meta data and write relevant applications like SIS, payroll and so on. Not so in TFP!!
-
11
Databases Vs. Traditional File Processing (TFP) In TFP there is no “query language”. Query code was written as part of the
program and was not usually efficient. Usually like “enter 1 to do X, enter 2 to do
y”, and so on.
• Oracle optimizes your queries but if you are programming the query, it may be terribly inefficient. For example, it might contain very large joins!
There is no concept of concurrent access in TFP – that is, the use of the software
by many people at once. For example, registration then and now.
Every program had its own copy of the data and file structures. This causes major
problems: – incorrect data can be present in different copies of the data when they should all be
the same. Therefore we get inconsistent data – bad!
– Also, the more copies of the data you have, the harder it is to keep the data secure.
– New & better file structure meant recoding in all programs.
– Two students would not be able to access SIS at the same time !
-
12
Databases Vs. Traditional File Processing (TFP)
There is no data independence. For example in TFP you must rewrite your
program if to want to take advantage of any new software e.g. a different &
better file structures . Not true in Oracle which has data independence.
DBMS allows data abstraction – user does not need to be aware of data
structures and file structures. For example, you using SIS!
• Details hidden by data model, for example, the relational data model.
• Not true in TFP as the user writes all the code for describing, storing and manipulating data.
In TFP there are no views – each program is a view. DBMSs allow views
so that users see only the data they are interested in. Views can also be used
for security purposes. Views give a lot of flexibility:
• Different formats – for example, currency can be in US$ or CDN$.
• Different languages e.g. Website may be in German or English. And so on ....
-
13
Advantages of using a DBMS (not the 5 basic things a DBMS allows us to do!)
Controls redundancy – only one copy of the data and one file structure & thus
• Eliminates duplication of effort in programming all data descriptions and file structures in each program.
• Eliminates waste of storage - space in RAM and on the hard disk.
• Eliminates inconsistency & security problems.
Allows data sharing but gives problem of concurrent access. For example, two
people trying to edit the same piece of data at the same time – must be done
correctly.
• Solution – provide concurrency control software that ensures transactions are performed correctly. Difficult problem as it involves locking data at certain levels and, in some cases, serializing transactions (parts of programs). Can’t serialize everything. If done in parallel, must make sure that reads and writes are done in the correct order – otherwise you may get an F not an A+ !!
• Not of concern in 60-315. Done in 60-415.
-
14
Advantages of using a DBMS (Contd..) Provides Security and Authorization – e.g. passwords, card keys, biometric
access and so on.
Provides multiple interfaces – natural language, graphical, menus, forms etc.
• SIS interface is good example of a menu/form type interface where you fill in the “blanks”. Most used type of interface for interactive programs. SIS is a combo of menu & forms. Point, click and fill in .....
Enforces security (and other types of) constraints – only “correct” data is
allowed and stored.
• Example: Can’t withdraw $1000 if only $5 in account and no overdraft ability. Or, student can’t register online for 6 courses on SIS - except in some cases.
-
15
Advantages of using a DBMS (Contd..) Provides backup and recovery software.
• Backup is easy –but must be done !
• Recovery can be difficult e.g. disk failure is catastrophic so keep duplicate copy of DB elsewhere. A lot was learned after 9/11 about copies of DBs.
• ATM failure is another example of where recovery is needed. You ask to take out money but power failure prevents you from getting your money.
– Done by logging each part of transaction to allow for rollback. In the ATM case, the debit would be logged to non-volatile hardware but not the delivery of your money. Therefore, can “roll back” your account so that the debit didn’t take place.
-
16
Data Model
Data Model: a set of concepts used to describe the data and a set of
operations used to manipulate the data.
• High level e.g. Object Oriented – no details of physical storage just objects & the relationships between objects. Along with semantic DBMSs, mostly theoretical at this point. It was found that they didn’t have the functionality required by today’s database applications.
• Low level e.g. an inverted list – lots of physical details. Not usually found any more in DBMSs but are being used again in search engines!
• Implementation e.g. relational, network and hierarchical – latter two hardly used today so we concentrate on the relational model, which is the most used in industry.
-
17
3-Level Architecture
3-Level Architecture: Proposed as a “ gold standard” for design and
implementation of DBMS – but often ignored. Proposed in late ‘70s – by an
ANSI-SPARK committee - so terminology rather old but still the ideal for
implementation as it gives you all the “good” properties of a DBMS – good
design that is error free and easy to use.
-
18
3-Level Architecture
-
19
3-Level Architecture: Notes
1. Users (A1, A2, B1, B2 etc) can be interactive users or applications programs.
2. DSL = data sub-language e.g. SQL. A DSL is a language embedded in a
language like Java to do data retrieval for programs.
• Java provides the features that are lacking in SQL and SQL provides the data retrieval and optimization of queries which are missing from a language like Java.
• PL/SQL would be best!
3. The external views here are different to views in SQL. More on this later.
4. The conceptual view or schema describes all the data in the system – each and
every record with all its fields and the data type of each field.
-
20
3-Level Architecture: Notes (Contd…)
5. An external view is a subset of the conceptual view. The union of all external
views = conceptual view. External views serve two purposes:
• Users see only the data they are interested in e.g. SIS.
• Views can be used for security – e.g. to stop faculty/staff seeing confidential information about others e.g. salary.
6. An example of an external to conceptual mapping is a field might be viewed as
US$ at the external view but seen as CDN$ at the conceptual level.
7. An example of a conceptual mapping to an internal mapping could be that data
is seen as relational tables at the conceptual level but is seen as plain records at
the internal level. (More later & exercise.)
-
21
3-Level Architecture: Notes (Contd…)
8. The internal level describes each record in a very physical way – in terms of the
data types and the number of bytes used for each field in each record.
• However, there are no actual references to addresses on the hard disk(s) used. This was never part of the architecture.
9. Architecture says nothing about storage devices – just as well as there have
been so many advances since it was developed!
10. The Data Base Administrator (DBA) can interact with all levels. Users and
programs can only interact with the external level.
• Ensures that the conceptual, internal levels and mappings are all correct.
• Chaos would result if any user could interact with all levels and mappings.
-
22
Aims of 3-Level Architecture
Data Independence: The opposite is data dependence - knowledge of data and file
structures is built into the logic and code of programs – as in TFP. Undesirable
since: Different applications would want to see the data in different ways and the
DBA needs the flexibility to change file structures.
Definition: Data Independence is the immunity of applications to changes at both
the conceptual level and internal level.
• Logical Data Independence: changes to conceptual level do not affect external level. Example: if we add a field to one view, all programs using the other views will be not have to be changed/recoded.
• Physical Data Independence: changes to internal level do not affect external level. Example: If we switch from B+-trees to hash files at the internal level, the only changes are done to the Conceptual/Internal mapping but not to the external level. So no program needs to be rewritten. (Reality now but not back in 70’s & early 80’s. Nothing was said then about file structures.)
-
23
Aims of 3-Level Architecture
Multiple user views – because people want to see data differently and to
provide for security needs.
Use of schemas to make DB “off the shelf”. That was not the terminology
used then but it is the DBMS software which truly makes it “off the shelf”.
For example,
• We buy Oracle, enter our description of the date (and constraints) and enter data.
• Then we code all the programs needed.
• Oracle can be used for many different types of applications.
-
24
Example of a DB with just one record and what that might look at each level (very simplified!)
External –View 1 External – View 2 employ record emp# varchar (20) salary varchar (30)
emp# record emp-number varchar (20) dept–no varchar (20)
Conceptual emp record employee number varchar (20) employee department varchar (20) employee salary varchar (30)
Internal employee record length = 70 bytes employee# length = 20 bytes, offset = 0, Key dept# length = 20 bytes, offset = 20 pay length = 30 bytes, offset = 40
Internal Record
Conceptual employee record
Two external employee records • Note: the mappings (different field names at each level) and that each record has a unique name.
• Must allow enough space in the internal schema for the data at the conceptual level & for the external views.
• Must have a “key” in the internal schema to uniquely identify each record and allow it to be retrieved.
-
25
Another View of the Architecture Oracle is more like this…
Data Metadata
+ file structures
Query Processor
Transaction Manager
Storage Manager
Schema Modification
Queries by both users and
applications programs
Modifications
Constraints Security concurrency control
Storage component →
←
-
26
Storage component: contains meta data and data.
• Meta data includes names of relations, attributes, data types, info on indexes.
• Most common file structure is some sort of B-tree or hashing
Storage Manager: retrieves data from storage; modifies data as requested;
stores data and deletes data as requested.
• It is the interface between the stored data and the users/applications that use it.
Another View of the Architecture Description of the components
-
27
Query Processor: Processes and optimizes all queries, that is –
• Accepts all requests for data retrieval and modification.
• Translates query (e.g. in SQL) into a sequence of requests for stored data.
• Optimizes queries – to minimize costly disk accesses.
Transaction Manager: Contains all constraints, security measures, and
concurrency control methods.
• Responsible for concurrency control and integrity maintenance during transactions (sequential group of queries – e.g. transfer funds from bank account to credit card).
• *It one off to one side as messages to users won’t have to go back through it – but that’s all! All queries, including read only ones will go through it as read locks have to acquired etc,.
* Different to notes on website.
Another View of the Architecture Description of the components (Contd…)
Basic Concepts
Course: 03-60-315
Dr. Joan Morrissey
School of Computer Science
University of Windsor
Windsor, Canada
‹#›
1
Some Basic Concepts
DATA BASE: collection of related data - facts that can be recorded and have an explicit meaning.
Examples:
Names, phone numbers, email addresses of people you know. (Small)
A library, a bank or a university. (Large)
A database can be centralized e.g. SIS, faculty payroll at the U of W or distributed e.g. RBC which has branches all over the world.
In a distributed system, each local location keeps local data but can participate in a global query such as “How much in mortgage loans do we have for all locations?” We will not be looking at distributed systems.
‹#›
Basic Concepts……..
Generally, we define a database to be a set of related data with the following characteristics:
Data is related not random – it is logically coherent. There are no irrelevant details in the database - so shouldn’t put weather and student data in the one database.
Designed, built and populated for a specific group of users and applications – e.g. students and SIS. Here the students are the users and SIS is the application (or program) used by students.
Represents some aspect of the real world often called a miniworld or Universe of Discourse (UoD). For example, a banking database – nothing stored only banking related info.
Can be of any size and complexity – personal contact list (small and not complex) or Revenue Canada ( very large and very complex as tax rules change frequently and data is stored for many years).
‹#›
DBMS – what it allows us to do…
DBMS: collection of software for designing and maintaining DB. Allows the following:
Data to be described – what records, what fields, what data types, files, etc. Data is described using a Data Definition Language (DDL).
We have to describe the data format to the system so that we can retrieve it later. The description of the data is called the schema or metadata.
Data to be physically stored – with efficient file structures. Need to store large amounts of data for a long period of time – securely.
All file structures allow for the efficient non-volatile storage of our data.
Note that any industrial DB will never fit into RAM and it would not give permanent storage anyway. Data is buffered to and from RAM for processing.
Solid state drives ? Degrade quickly with a large number of R & W’s.
‹#›
More about storage mechanisms (file structures) later !
4
DBMS – what it allows us to do…
Queries, both interactive (e.g. use of SIS, form based) and application programs (e.g. payroll). Also known as Data Manipulation Language (DML).
Use query language (with SQL) e.g. forms for interactive queries such as looking at your grades on SIS.
Use Java/C++ with embedded SQL for application programs such as payroll.
Could also use PL/SQL. More efficient but more complicated.
Note that in SQL there is a difference between the DDL and the DML but it’s not that important. One is used to define your DB, the other is used to retrieve data from the stored DB.
Updates – including deletes (delete a student from a course via VW), addition (add students) and modifications (e.g. change of all grades after a final exam) to the data.
‹#›
DBMS – what it allows us to do…
Concurrent access by many users – without data corruption.
We want several users to be able to use the database at once but it must be done correctly. One user can’t be allowed to interfere with another and have incorrect data as a result.
Done by locking the database at different levels with different types of locks. Also need transaction control….reads and writes must be done in the correct order.
To recap: a DBMS allows us to do 5 different things:
1. Describe our data.
2. Physically store our data.
3. Query our data.
4. Add, delete and modify data.
5. Provides safe, correct concurrent access by multiple users to the same data.
‹#›
Basic Concepts…..
Database System:
DB + description of the data (description is also called meta-data)
DBMS – the software which controls the DB.
End users – e.g. professor checking student record. Usually done interactively.
Applications software/programs – e.g. payroll, SIS, eCV, eGRADE, updating of DARS and so on. Applications are just programs that perform some task.
MUST UNDERSTAND WHAT LIFE WAS LIKE BEFORE DBMSs TO REALLY APPRECIATE THEM !!!!
Think of life before cell phones, iPods, wireless internet....
‹#›
Traditional File Processing (TFP) …early 1980s!
Program 1PayrollProgram 2SISProgram 3Finances....and so on - many programs. Interface very basic – as was output!
CodeQueriesMetadata & file structures CodeQueriesMetadata & file structures CodeQueriesMetadata & file structures .
Data
Data
Data
Data file might be the same in many programs! Problem !!
Output
Output
Output
‹#›
Describe what registration was like for me.
8
Database Management Systems now! Like Oracle
DBMS software All application programs needed e.g. payroll, SIS, eGrade , eCV and so on. Written in, for example C++ and using embedded SQL commands. Might also be written in PL/SQL.
Metadata, file structures + one copy of the data
All interactive users using, for example, SIS. And people running applications like payroll.
Output
‹#›
GUI Interface & Nice output
9
Databases Vs. Traditional File Processing (TFP)
Databases evolved from simple file systems .....from about the late 1980s
TFP = use traditional programming languages, usually Cobol, to maintain & process files.
TFP does not allow just one description of the data or one single file structure. The descriptions of the data and the file structures were coded into each program! What a waste of time
Coding of schema and file structure repeated for every program written! This is the opposite of having meta data where we have just one coding of the data and file structures. Furthermore, if a better file structure became available, then all programs needed to be changed or remain out of date.
Meta data allows a DBMS to be general purpose. System is independent of data to be stored. That is, you can buy a product like Oracle “off the shelf”, insert your data, meta data and write relevant applications like SIS, payroll and so on. Not so in TFP!!
‹#›
Databases Vs. Traditional File Processing (TFP)
In TFP there is no “query language”. Query code was written as part of the program and was not usually efficient. Usually like “enter 1 to do X, enter 2 to do y”, and so on.
Oracle optimizes your queries but if you are programming the query, it may be terribly inefficient. For example, it might contain very large joins!
There is no concept of concurrent access in TFP – that is, the use of the software by many people at once. For example, registration then and now.
Every program had its own copy of the data and file structures. This causes major problems:
incorrect data can be present in different copies of the data when they should all be the same. Therefore we get inconsistent data – bad!
Also, the more copies of the data you have, the harder it is to keep the data secure.
New & better file structure meant recoding in all programs.
Two students would not be able to access SIS at the same time !
‹#›
Databases Vs. Traditional File Processing (TFP)
There is no data independence. For example in TFP you must rewrite your program if to want to take advantage of any new software e.g. a different & better file structures . Not true in Oracle which has data independence.
DBMS allows data abstraction – user does not need to be aware of data structures and file structures. For example, you using SIS!
Details hidden by data model, for example, the relational data model.
Not true in TFP as the user writes all the code for describing, storing and manipulating data.
In TFP there are no views – each program is a view. DBMSs allow views so that users see only the data they are interested in. Views can also be used for security purposes. Views give a lot of flexibility:
Different formats – for example, currency can be in US$ or CDN$.
Different languages e.g. Website may be in German or English. And so on ....
‹#›
Advantages of using a DBMS (not the 5 basic things a DBMS allows us to do!)
Controls redundancy – only one copy of the data and one file structure & thus
Eliminates duplication of effort in programming all data descriptions and file structures in each program.
Eliminates waste of storage - space in RAM and on the hard disk.
Eliminates inconsistency & security problems.
Allows data sharing but gives problem of concurrent access. For example, two people trying to edit the same piece of data at the same time – must be done correctly.
Solution – provide concurrency control software that ensures transactions are performed correctly. Difficult problem as it involves locking data at certain levels and, in some cases, serializing transactions (parts of programs). Can’t serialize everything. If done in parallel, must make sure that reads and writes are done in the correct order – otherwise you may get an F not an A+ !!
Not of concern in 60-315. Done in 60-415.
‹#›
Advantages of using a DBMS (Contd..)
Provides Security and Authorization – e.g. passwords, card keys, biometric access and so on.
Provides multiple interfaces – natural language, graphical, menus, forms etc.
SIS interface is good example of a menu/form type interface where you fill in the “blanks”. Most used type of interface for interactive programs. SIS is a combo of menu & forms. Point, click and fill in .....
Enforces security (and other types of) constraints – only “correct” data is allowed and stored.
Example: Can’t withdraw $1000 if only $5 in account and no overdraft ability. Or, student can’t register online for 6 courses on SIS - except in some cases.
‹#›
Advantages of using a DBMS (Contd..)
Provides backup and recovery software.
Backup is easy –but must be done !
Recovery can be difficult e.g. disk failure is catastrophic so keep duplicate copy of DB elsewhere. A lot was learned after 9/11 about copies of DBs.
ATM failure is another example of where recovery is needed. You ask to take out money but power failure prevents you from getting your money.
Done by logging each part of transaction to allow for rollback. In the ATM case, the debit would be logged to non-volatile hardware but not the delivery of your money. Therefore, can “roll back” your account so that the debit didn’t take place.
‹#›
Data Model
Data Model: a set of concepts used to describe the data and a set of operations used to manipulate the data.
High level e.g. Object Oriented – no details of physical storage just objects & the relationships between objects. Along with semantic DBMSs, mostly theoretical at this point. It was found that they didn’t have the functionality required by today’s database applications.
Low level e.g. an inverted list – lots of physical details. Not usually found any more in DBMSs but are being used again in search engines!
Implementation e.g. relational, network and hierarchical – latter two hardly used today so we concentrate on the relational model, which is the most used in industry.
‹#›
3-Level Architecture
3-Level Architecture: Proposed as a “ gold standard” for design and implementation of DBMS – but often ignored. Proposed in late ‘70s – by an ANSI-SPARK committee - so terminology rather old but still the ideal for implementation as it gives you all the “good” properties of a DBMS – good design that is error free and easy to use.
‹#›
3-Level Architecture
‹#›
3-Level Architecture: Notes
Users (A1, A2, B1, B2 etc) can be interactive users or applications programs.
DSL = data sub-language e.g. SQL. A DSL is a language embedded in a language like Java to do data retrieval for programs.
Java provides the features that are lacking in SQL and SQL provides the data retrieval and optimization of queries which are missing from a language like Java.
PL/SQL would be best!
The external views here are different to views in SQL. More on this later.
The conceptual view or schema describes all the data in the system – each and every record with all its fields and the data type of each field.
‹#›
3-Level Architecture: Notes (Contd…)
An external view is a subset of the conceptual view. The union of all external views = conceptual view. External views serve two purposes:
Users see only the data they are interested in e.g. SIS.
Views can be used for security – e.g. to stop faculty/staff seeing confidential information about others e.g. salary.
An example of an external to conceptual mapping is a field might be viewed as US$ at the external view but seen as CDN$ at the conceptual level.
An example of a conceptual mapping to an internal mapping could be that data is seen as relational tables at the conceptual level but is seen as plain records at the internal level. (More later & exercise.)
‹#›
3-Level Architecture: Notes (Contd…)
The internal level describes each record in a very physical way – in terms of the data types and the number of bytes used for each field in each record.
However, there are no actual references to addresses on the hard disk(s) used. This was never part of the architecture.
Architecture says nothing about storage devices – just as well as there have been so many advances since it was developed!
The Data Base Administrator (DBA) can interact with all levels. Users and programs can only interact with the external level.
Ensures that the conceptual, internal levels and mappings are all correct.
Chaos would result if any user could interact with all levels and mappings.
‹#›
Aims of 3-Level Architecture
Data Independence: The opposite is data dependence - knowledge of data and file structures is built into the logic and code of programs – as in TFP. Undesirable since: Different applications would want to see the data in different ways and the DBA needs the flexibility to change file structures.
Definition: Data Independence is the immunity of applications to changes at both the conceptual level and internal level.
Logical Data Independence: changes to conceptual level do not affect external level. Example: if we add a field to one view, all programs using the other views will be not have to be changed/recoded.
Physical Data Independence: changes to internal level do not affect external level. Example: If we switch from B+-trees to hash files at the internal level, the only changes are done to the Conceptual/Internal mapping but not to the external level. So no program needs to be rewritten. (Reality now but not back in 70’s & early 80’s. Nothing was said then about file structures.)
‹#›
Aims of 3-Level Architecture
Multiple user views – because people want to see data differently and to provide for security needs.
Use of schemas to make DB “off the shelf”. That was not the terminology used then but it is the DBMS software which truly makes it “off the shelf”.
For example,
We buy Oracle, enter our description of the date (and constraints) and enter data.
Then we code all the programs needed.
Oracle can be used for many different types of applications.
‹#›
Example of a DB with just one record and what that might look at each level (very simplified!)
External –View 1External – View 2
employ record emp# varchar (20) salary varchar (30) emp# record emp-number varchar (20) dept–no varchar (20)
Conceptual
emp record
employee number varchar (20)
employee department varchar (20)
employee salary varchar (30)
Internal
employee record length = 70 bytes employee# length = 20 bytes, offset = 0, Key dept# length = 20 bytes, offset = 20 pay length = 30 bytes, offset = 40
Internal Record
Conceptual employee record
Two external employee records
Note: the mappings (different field names at each level) and that each record has a unique name.
Must allow enough space in the internal schema for the data at the conceptual level & for the external views.
Must have a “key” in the internal schema to uniquely identify each record and allow it to be retrieved.
‹#›
Another View of the ArchitectureOracle is more like this…
Data
Metadata
+
file structures
Query
Processor
Transaction
Manager
Storage
Manager
Schema
Modification
Queries by both
users and
applications
programs
Modifications
Constraints
Security
concurrency
control
Storage component
→
←
‹#›
Storage component: contains meta data and data.
Meta data includes names of relations, attributes, data types, info on indexes.
Most common file structure is some sort of B-tree or hashing
Storage Manager: retrieves data from storage; modifies data as requested; stores data and deletes data as requested.
It is the interface between the stored data and the users/applications that use it.
Another View of the ArchitectureDescription of the components
‹#›
Query Processor: Processes and optimizes all queries, that is –
Accepts all requests for data retrieval and modification.
Translates query (e.g. in SQL) into a sequence of requests for stored data.
Optimizes queries – to minimize costly disk accesses.
Transaction Manager: Contains all constraints, security measures, and concurrency control methods.
Responsible for concurrency control and integrity maintenance during transactions (sequential group of queries – e.g. transfer funds from bank account to credit card).
*It one off to one side as messages to users won’t have to go back through it – but that’s all! All queries, including read only ones will go through it as read locks have to acquired etc,.
* Different to notes on website.
Another View of the ArchitectureDescription of the components (Contd…)
‹#›