basic concepts course: 03-60-315cs315.joan.myweb.cs.uwindsor.ca/documents/courseware/60-315...

27
Basic Concepts Course: 03-60-315 Dr. Joan Morrissey School of Computer Science University of Windsor Windsor, Canada

Upload: others

Post on 19-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Basic Concepts

    Course: 03-60-315

    Dr. Joan Morrissey School of Computer Science

    University of Windsor Windsor, Canada

  • 2

    Some Basic Concepts

    DATA BASE: collection of related data - facts that can be recorded and

    have an explicit meaning.

    Examples:

    • Names, phone numbers, email addresses of people you know. (Small)

    • A library, a bank or a university. (Large)

    A database can be centralized e.g. SIS, faculty payroll at the U of W or

    distributed e.g. RBC which has branches all over the world.

    In a distributed system, each local location keeps local data but can

    participate in a global query such as “How much in mortgage loans do we

    have for all locations?” We will not be looking at distributed systems.

  • 3

    Basic Concepts……..

    Generally, we define a database to be a set of related data with the

    following characteristics:

    • Data is related not random – it is logically coherent. There are no irrelevant details in the database - so shouldn’t put weather and student data in the one database.

    • Designed, built and populated for a specific group of users and applications – e.g. students and SIS. Here the students are the users and SIS is the application (or program) used by students.

    • Represents some aspect of the real world often called a miniworld or Universe of Discourse (UoD). For example, a banking database – nothing stored only banking related info.

    • Can be of any size and complexity – personal contact list (small and not complex) or Revenue Canada ( very large and very complex as tax rules change frequently and data is stored for many years).

  • 4

    DBMS – what it allows us to do…

    DBMS: collection of software for designing and maintaining DB. Allows

    the following:

    • Data to be described – what records, what fields, what data types, files, etc. Data is described using a Data Definition Language (DDL).

    – We have to describe the data format to the system so that we can retrieve it later. The description of the data is called the schema or metadata.

    • Data to be physically stored – with efficient file structures. Need to store large amounts of data for a long period of time – securely.

    – All file structures allow for the efficient non-volatile storage of our data.

    – Note that any industrial DB will never fit into RAM and it would not give permanent storage anyway. Data is buffered to and from RAM for processing.

    – Solid state drives ? Degrade quickly with a large number of R & W’s.

  • 5

    DBMS – what it allows us to do…

    • Queries, both interactive (e.g. use of SIS, form based) and application programs (e.g. payroll). Also known as Data Manipulation Language (DML).

    – Use query language (with SQL) e.g. forms for interactive queries such as looking at your grades on SIS.

    – Use Java/C++ with embedded SQL for application programs such as payroll.

    – Could also use PL/SQL. More efficient but more complicated.

    – Note that in SQL there is a difference between the DDL and the DML but it’s not that important. One is used to define your DB, the other is used to retrieve data from the stored DB.

    • Updates – including deletes (delete a student from a course via VW), addition (add students) and modifications (e.g. change of all grades after a final exam) to the data.

  • 6

    DBMS – what it allows us to do…

    • Concurrent access by many users – without data corruption.

    – We want several users to be able to use the database at once but it must be done correctly. One user can’t be allowed to interfere with another and have incorrect data as a result.

    – Done by locking the database at different levels with different types of locks. Also need transaction control….reads and writes must be done in the correct order.

    To recap: a DBMS allows us to do 5 different things:

    1. Describe our data.

    2. Physically store our data.

    3. Query our data.

    4. Add, delete and modify data.

    5. Provides safe, correct concurrent access by multiple users to the same data.

  • 7

    Basic Concepts…..

    Database System:

    • DB + description of the data (description is also called meta-data)

    • DBMS – the software which controls the DB.

    • End users – e.g. professor checking student record. Usually done interactively.

    • Applications software/programs – e.g. payroll, SIS, eCV, eGRADE, updating of DARS and so on. Applications are just programs that perform some task.

    MUST UNDERSTAND WHAT LIFE WAS LIKE BEFORE DBMSs

    TO REALLY APPRECIATE THEM !!!!

    Think of life before cell phones, iPods, wireless internet....

  • 8

    Traditional File Processing (TFP) …early 1980s!

    Program 1 Payroll

    Program 2 SIS

    Program 3 Finances

    .... and so on - many programs. Interface very basic – as was output!

    Code Queries

    Metadata & file structures

    Code Queries

    Metadata & file structures

    Code Queries

    Metadata & file structures

    .

    Data Data Data

    Data file might be the same in many programs!

    Problem !!

    Output Output Output

  • 9

    Database Management Systems now! Like Oracle

    DBMS software

    All application programs needed e.g. payroll, SIS, eGrade , eCV and so on. Written in, for example C++ and using embedded SQL commands. Might also be written in PL/SQL.

    Metadata, file structures + one copy of the data

    All interactive users using, for example, SIS. And people running applications like payroll.

    Output

  • 10

    Databases Vs. Traditional File Processing (TFP)

    Databases evolved from simple file systems .....from about the late 1980s

    • TFP = use traditional programming languages, usually Cobol, to maintain & process files.

    TFP does not allow just one description of the data or one single file

    structure. The descriptions of the data and the file structures were coded

    into each program! What a waste of time

    • Coding of schema and file structure repeated for every program written! This is the opposite of having meta data where we have just one coding of the data and file structures. Furthermore, if a better file structure became available, then all programs needed to be changed or remain out of date.

    • Meta data allows a DBMS to be general purpose. System is independent of data to be stored. That is, you can buy a product like Oracle “off the shelf”, insert your data, meta data and write relevant applications like SIS, payroll and so on. Not so in TFP!!

  • 11

    Databases Vs. Traditional File Processing (TFP) In TFP there is no “query language”. Query code was written as part of the

    program and was not usually efficient. Usually like “enter 1 to do X, enter 2 to do

    y”, and so on.

    • Oracle optimizes your queries but if you are programming the query, it may be terribly inefficient. For example, it might contain very large joins!

    There is no concept of concurrent access in TFP – that is, the use of the software

    by many people at once. For example, registration then and now.

    Every program had its own copy of the data and file structures. This causes major

    problems: – incorrect data can be present in different copies of the data when they should all be

    the same. Therefore we get inconsistent data – bad!

    – Also, the more copies of the data you have, the harder it is to keep the data secure.

    – New & better file structure meant recoding in all programs.

    – Two students would not be able to access SIS at the same time !

  • 12

    Databases Vs. Traditional File Processing (TFP)

    There is no data independence. For example in TFP you must rewrite your

    program if to want to take advantage of any new software e.g. a different &

    better file structures . Not true in Oracle which has data independence.

    DBMS allows data abstraction – user does not need to be aware of data

    structures and file structures. For example, you using SIS!

    • Details hidden by data model, for example, the relational data model.

    • Not true in TFP as the user writes all the code for describing, storing and manipulating data.

    In TFP there are no views – each program is a view. DBMSs allow views

    so that users see only the data they are interested in. Views can also be used

    for security purposes. Views give a lot of flexibility:

    • Different formats – for example, currency can be in US$ or CDN$.

    • Different languages e.g. Website may be in German or English. And so on ....

  • 13

    Advantages of using a DBMS (not the 5 basic things a DBMS allows us to do!)

    Controls redundancy – only one copy of the data and one file structure & thus

    • Eliminates duplication of effort in programming all data descriptions and file structures in each program.

    • Eliminates waste of storage - space in RAM and on the hard disk.

    • Eliminates inconsistency & security problems.

    Allows data sharing but gives problem of concurrent access. For example, two

    people trying to edit the same piece of data at the same time – must be done

    correctly.

    • Solution – provide concurrency control software that ensures transactions are performed correctly. Difficult problem as it involves locking data at certain levels and, in some cases, serializing transactions (parts of programs). Can’t serialize everything. If done in parallel, must make sure that reads and writes are done in the correct order – otherwise you may get an F not an A+ !!

    • Not of concern in 60-315. Done in 60-415.

  • 14

    Advantages of using a DBMS (Contd..) Provides Security and Authorization – e.g. passwords, card keys, biometric

    access and so on.

    Provides multiple interfaces – natural language, graphical, menus, forms etc.

    • SIS interface is good example of a menu/form type interface where you fill in the “blanks”. Most used type of interface for interactive programs. SIS is a combo of menu & forms. Point, click and fill in .....

    Enforces security (and other types of) constraints – only “correct” data is

    allowed and stored.

    • Example: Can’t withdraw $1000 if only $5 in account and no overdraft ability. Or, student can’t register online for 6 courses on SIS - except in some cases.

  • 15

    Advantages of using a DBMS (Contd..) Provides backup and recovery software.

    • Backup is easy –but must be done !

    • Recovery can be difficult e.g. disk failure is catastrophic so keep duplicate copy of DB elsewhere. A lot was learned after 9/11 about copies of DBs.

    • ATM failure is another example of where recovery is needed. You ask to take out money but power failure prevents you from getting your money.

    – Done by logging each part of transaction to allow for rollback. In the ATM case, the debit would be logged to non-volatile hardware but not the delivery of your money. Therefore, can “roll back” your account so that the debit didn’t take place.

  • 16

    Data Model

    Data Model: a set of concepts used to describe the data and a set of

    operations used to manipulate the data.

    • High level e.g. Object Oriented – no details of physical storage just objects & the relationships between objects. Along with semantic DBMSs, mostly theoretical at this point. It was found that they didn’t have the functionality required by today’s database applications.

    • Low level e.g. an inverted list – lots of physical details. Not usually found any more in DBMSs but are being used again in search engines!

    • Implementation e.g. relational, network and hierarchical – latter two hardly used today so we concentrate on the relational model, which is the most used in industry.

  • 17

    3-Level Architecture

    3-Level Architecture: Proposed as a “ gold standard” for design and

    implementation of DBMS – but often ignored. Proposed in late ‘70s – by an

    ANSI-SPARK committee - so terminology rather old but still the ideal for

    implementation as it gives you all the “good” properties of a DBMS – good

    design that is error free and easy to use.

  • 18

    3-Level Architecture

  • 19

    3-Level Architecture: Notes

    1. Users (A1, A2, B1, B2 etc) can be interactive users or applications programs.

    2. DSL = data sub-language e.g. SQL. A DSL is a language embedded in a

    language like Java to do data retrieval for programs.

    • Java provides the features that are lacking in SQL and SQL provides the data retrieval and optimization of queries which are missing from a language like Java.

    • PL/SQL would be best!

    3. The external views here are different to views in SQL. More on this later.

    4. The conceptual view or schema describes all the data in the system – each and

    every record with all its fields and the data type of each field.

  • 20

    3-Level Architecture: Notes (Contd…)

    5. An external view is a subset of the conceptual view. The union of all external

    views = conceptual view. External views serve two purposes:

    • Users see only the data they are interested in e.g. SIS.

    • Views can be used for security – e.g. to stop faculty/staff seeing confidential information about others e.g. salary.

    6. An example of an external to conceptual mapping is a field might be viewed as

    US$ at the external view but seen as CDN$ at the conceptual level.

    7. An example of a conceptual mapping to an internal mapping could be that data

    is seen as relational tables at the conceptual level but is seen as plain records at

    the internal level. (More later & exercise.)

  • 21

    3-Level Architecture: Notes (Contd…)

    8. The internal level describes each record in a very physical way – in terms of the

    data types and the number of bytes used for each field in each record.

    • However, there are no actual references to addresses on the hard disk(s) used. This was never part of the architecture.

    9. Architecture says nothing about storage devices – just as well as there have

    been so many advances since it was developed!

    10. The Data Base Administrator (DBA) can interact with all levels. Users and

    programs can only interact with the external level.

    • Ensures that the conceptual, internal levels and mappings are all correct.

    • Chaos would result if any user could interact with all levels and mappings.

  • 22

    Aims of 3-Level Architecture

    Data Independence: The opposite is data dependence - knowledge of data and file

    structures is built into the logic and code of programs – as in TFP. Undesirable

    since: Different applications would want to see the data in different ways and the

    DBA needs the flexibility to change file structures.

    Definition: Data Independence is the immunity of applications to changes at both

    the conceptual level and internal level.

    • Logical Data Independence: changes to conceptual level do not affect external level. Example: if we add a field to one view, all programs using the other views will be not have to be changed/recoded.

    • Physical Data Independence: changes to internal level do not affect external level. Example: If we switch from B+-trees to hash files at the internal level, the only changes are done to the Conceptual/Internal mapping but not to the external level. So no program needs to be rewritten. (Reality now but not back in 70’s & early 80’s. Nothing was said then about file structures.)

  • 23

    Aims of 3-Level Architecture

    Multiple user views – because people want to see data differently and to

    provide for security needs.

    Use of schemas to make DB “off the shelf”. That was not the terminology

    used then but it is the DBMS software which truly makes it “off the shelf”.

    For example,

    • We buy Oracle, enter our description of the date (and constraints) and enter data.

    • Then we code all the programs needed.

    • Oracle can be used for many different types of applications.

  • 24

    Example of a DB with just one record and what that might look at each level (very simplified!)

    External –View 1 External – View 2 employ record emp# varchar (20) salary varchar (30)

    emp# record emp-number varchar (20) dept–no varchar (20)

    Conceptual emp record employee number varchar (20) employee department varchar (20) employee salary varchar (30)

    Internal employee record length = 70 bytes employee# length = 20 bytes, offset = 0, Key dept# length = 20 bytes, offset = 20 pay length = 30 bytes, offset = 40

    Internal Record

    Conceptual employee record

    Two external employee records • Note: the mappings (different field names at each level) and that each record has a unique name.

    • Must allow enough space in the internal schema for the data at the conceptual level & for the external views.

    • Must have a “key” in the internal schema to uniquely identify each record and allow it to be retrieved.

  • 25

    Another View of the Architecture Oracle is more like this…

    Data Metadata

    + file structures

    Query Processor

    Transaction Manager

    Storage Manager

    Schema Modification

    Queries by both users and

    applications programs

    Modifications

    Constraints Security concurrency control

    Storage component →

  • 26

    Storage component: contains meta data and data.

    • Meta data includes names of relations, attributes, data types, info on indexes.

    • Most common file structure is some sort of B-tree or hashing

    Storage Manager: retrieves data from storage; modifies data as requested;

    stores data and deletes data as requested.

    • It is the interface between the stored data and the users/applications that use it.

    Another View of the Architecture Description of the components

  • 27

    Query Processor: Processes and optimizes all queries, that is –

    • Accepts all requests for data retrieval and modification.

    • Translates query (e.g. in SQL) into a sequence of requests for stored data.

    • Optimizes queries – to minimize costly disk accesses.

    Transaction Manager: Contains all constraints, security measures, and

    concurrency control methods.

    • Responsible for concurrency control and integrity maintenance during transactions (sequential group of queries – e.g. transfer funds from bank account to credit card).

    • *It one off to one side as messages to users won’t have to go back through it – but that’s all! All queries, including read only ones will go through it as read locks have to acquired etc,.

    * Different to notes on website.

    Another View of the Architecture Description of the components (Contd…)

    Basic Concepts

    Course: 03-60-315

    Dr. Joan Morrissey

    School of Computer Science

    University of Windsor

    Windsor, Canada

    ‹#›

    1

    Some Basic Concepts

    DATA BASE: collection of related data - facts that can be recorded and have an explicit meaning.

    Examples:

    Names, phone numbers, email addresses of people you know. (Small)

    A library, a bank or a university. (Large)

    A database can be centralized e.g. SIS, faculty payroll at the U of W or distributed e.g. RBC which has branches all over the world.

    In a distributed system, each local location keeps local data but can participate in a global query such as “How much in mortgage loans do we have for all locations?” We will not be looking at distributed systems.

    ‹#›

    Basic Concepts……..

    Generally, we define a database to be a set of related data with the following characteristics:

    Data is related not random – it is logically coherent. There are no irrelevant details in the database - so shouldn’t put weather and student data in the one database.

     Designed, built and populated for a specific group of users and applications – e.g. students and SIS. Here the students are the users and SIS is the application (or program) used by students.

    Represents some aspect of the real world often called a miniworld or Universe of Discourse (UoD). For example, a banking database – nothing stored only banking related info.

    Can be of any size and complexity – personal contact list (small and not complex) or Revenue Canada ( very large and very complex as tax rules change frequently and data is stored for many years).

    ‹#›

    DBMS – what it allows us to do…

    DBMS: collection of software for designing and maintaining DB. Allows the following:

    Data to be described – what records, what fields, what data types, files, etc. Data is described using a Data Definition Language (DDL).

    We have to describe the data format to the system so that we can retrieve it later. The description of the data is called the schema or metadata.

    Data to be physically stored – with efficient file structures. Need to store large amounts of data for a long period of time – securely.

    All file structures allow for the efficient non-volatile storage of our data.

    Note that any industrial DB will never fit into RAM and it would not give permanent storage anyway. Data is buffered to and from RAM for processing.

    Solid state drives ? Degrade quickly with a large number of R & W’s.

    ‹#›

    More about storage mechanisms (file structures) later !

    4

    DBMS – what it allows us to do…

    Queries, both interactive (e.g. use of SIS, form based) and application programs (e.g. payroll). Also known as Data Manipulation Language (DML).

    Use query language (with SQL) e.g. forms for interactive queries such as looking at your grades on SIS.

    Use Java/C++ with embedded SQL for application programs such as payroll.

    Could also use PL/SQL. More efficient but more complicated.

    Note that in SQL there is a difference between the DDL and the DML but it’s not that important. One is used to define your DB, the other is used to retrieve data from the stored DB.

     Updates – including deletes (delete a student from a course via VW), addition (add students) and modifications (e.g. change of all grades after a final exam) to the data.

    ‹#›

    DBMS – what it allows us to do…

    Concurrent access by many users – without data corruption.

    We want several users to be able to use the database at once but it must be done correctly. One user can’t be allowed to interfere with another and have incorrect data as a result.

    Done by locking the database at different levels with different types of locks. Also need transaction control….reads and writes must be done in the correct order.

    To recap: a DBMS allows us to do 5 different things:

    1. Describe our data.

    2. Physically store our data.

    3. Query our data.

    4. Add, delete and modify data.

    5. Provides safe, correct concurrent access by multiple users to the same data.

    ‹#›

    Basic Concepts…..

    Database System:

    DB + description of the data (description is also called meta-data)

     DBMS – the software which controls the DB.

     End users – e.g. professor checking student record. Usually done interactively.

     Applications software/programs – e.g. payroll, SIS, eCV, eGRADE, updating of DARS and so on. Applications are just programs that perform some task.

    MUST UNDERSTAND WHAT LIFE WAS LIKE BEFORE DBMSs TO REALLY APPRECIATE THEM !!!!

    Think of life before cell phones, iPods, wireless internet....

    ‹#›

    Traditional File Processing (TFP) …early 1980s!

    Program 1PayrollProgram 2SISProgram 3Finances....and so on - many programs. Interface very basic – as was output!

    CodeQueriesMetadata & file structures CodeQueriesMetadata & file structures CodeQueriesMetadata & file structures .

    Data

    Data

    Data

    Data file might be the same in many programs! Problem !!

    Output

    Output

    Output

    ‹#›

    Describe what registration was like for me.

    8

    Database Management Systems now! Like Oracle

    DBMS software All application programs needed e.g. payroll, SIS, eGrade , eCV and so on. Written in, for example C++ and using embedded SQL commands. Might also be written in PL/SQL.

    Metadata, file structures + one copy of the data

    All interactive users using, for example, SIS. And people running applications like payroll.

    Output

    ‹#›

    GUI Interface & Nice output

    9

    Databases Vs. Traditional File Processing (TFP)

    Databases evolved from simple file systems .....from about the late 1980s

    TFP = use traditional programming languages, usually Cobol, to maintain & process files.

    TFP does not allow just one description of the data or one single file structure. The descriptions of the data and the file structures were coded into each program! What a waste of time

    Coding of schema and file structure repeated for every program written! This is the opposite of having meta data where we have just one coding of the data and file structures. Furthermore, if a better file structure became available, then all programs needed to be changed or remain out of date.

    Meta data allows a DBMS to be general purpose. System is independent of data to be stored. That is, you can buy a product like Oracle “off the shelf”, insert your data, meta data and write relevant applications like SIS, payroll and so on. Not so in TFP!!

    ‹#›

    Databases Vs. Traditional File Processing (TFP)

    In TFP there is no “query language”. Query code was written as part of the program and was not usually efficient. Usually like “enter 1 to do X, enter 2 to do y”, and so on.

    Oracle optimizes your queries but if you are programming the query, it may be terribly inefficient. For example, it might contain very large joins!

     There is no concept of concurrent access in TFP – that is, the use of the software by many people at once. For example, registration then and now.

    Every program had its own copy of the data and file structures. This causes major problems:

    incorrect data can be present in different copies of the data when they should all be the same. Therefore we get inconsistent data – bad!

    Also, the more copies of the data you have, the harder it is to keep the data secure.

    New & better file structure meant recoding in all programs.

    Two students would not be able to access SIS at the same time !

    ‹#›

    Databases Vs. Traditional File Processing (TFP)

    There is no data independence. For example in TFP you must rewrite your program if to want to take advantage of any new software e.g. a different & better file structures . Not true in Oracle which has data independence.

    DBMS allows data abstraction – user does not need to be aware of data structures and file structures. For example, you using SIS!

    Details hidden by data model, for example, the relational data model.

    Not true in TFP as the user writes all the code for describing, storing and manipulating data.

    In TFP there are no views – each program is a view. DBMSs allow views so that users see only the data they are interested in. Views can also be used for security purposes. Views give a lot of flexibility:

    Different formats – for example, currency can be in US$ or CDN$.

    Different languages e.g. Website may be in German or English. And so on ....

    ‹#›

    Advantages of using a DBMS (not the 5 basic things a DBMS allows us to do!)

    Controls redundancy – only one copy of the data and one file structure & thus

    Eliminates duplication of effort in programming all data descriptions and file structures in each program.

    Eliminates waste of storage - space in RAM and on the hard disk.

    Eliminates inconsistency & security problems.

    Allows data sharing but gives problem of concurrent access. For example, two people trying to edit the same piece of data at the same time – must be done correctly.

    Solution – provide concurrency control software that ensures transactions are performed correctly. Difficult problem as it involves locking data at certain levels and, in some cases, serializing transactions (parts of programs). Can’t serialize everything. If done in parallel, must make sure that reads and writes are done in the correct order – otherwise you may get an F not an A+ !!

    Not of concern in 60-315. Done in 60-415.

    ‹#›

    Advantages of using a DBMS (Contd..)

    Provides Security and Authorization – e.g. passwords, card keys, biometric access and so on.

    Provides multiple interfaces – natural language, graphical, menus, forms etc.

    SIS interface is good example of a menu/form type interface where you fill in the “blanks”. Most used type of interface for interactive programs. SIS is a combo of menu & forms. Point, click and fill in .....

    Enforces security (and other types of) constraints – only “correct” data is allowed and stored.

    Example: Can’t withdraw $1000 if only $5 in account and no overdraft ability. Or, student can’t register online for 6 courses on SIS - except in some cases.

    ‹#›

    Advantages of using a DBMS (Contd..)

    Provides backup and recovery software.

    Backup is easy –but must be done !

    Recovery can be difficult e.g. disk failure is catastrophic so keep duplicate copy of DB elsewhere. A lot was learned after 9/11 about copies of DBs.

    ATM failure is another example of where recovery is needed. You ask to take out money but power failure prevents you from getting your money.

    Done by logging each part of transaction to allow for rollback. In the ATM case, the debit would be logged to non-volatile hardware but not the delivery of your money. Therefore, can “roll back” your account so that the debit didn’t take place.

    ‹#›

    Data Model

    Data Model: a set of concepts used to describe the data and a set of operations used to manipulate the data.

    High level e.g. Object Oriented – no details of physical storage just objects & the relationships between objects. Along with semantic DBMSs, mostly theoretical at this point. It was found that they didn’t have the functionality required by today’s database applications.

     Low level e.g. an inverted list – lots of physical details. Not usually found any more in DBMSs but are being used again in search engines!

     Implementation e.g. relational, network and hierarchical – latter two hardly used today so we concentrate on the relational model, which is the most used in industry.

    ‹#›

    3-Level Architecture

    3-Level Architecture: Proposed as a “ gold standard” for design and implementation of DBMS – but often ignored. Proposed in late ‘70s – by an ANSI-SPARK committee - so terminology rather old but still the ideal for implementation as it gives you all the “good” properties of a DBMS – good design that is error free and easy to use.

    ‹#›

    3-Level Architecture

    ‹#›

    3-Level Architecture: Notes

    Users (A1, A2, B1, B2 etc) can be interactive users or applications programs.

    DSL = data sub-language e.g. SQL. A DSL is a language embedded in a language like Java to do data retrieval for programs.

    Java provides the features that are lacking in SQL and SQL provides the data retrieval and optimization of queries which are missing from a language like Java.

    PL/SQL would be best!

    The external views here are different to views in SQL. More on this later.

    The conceptual view or schema describes all the data in the system – each and every record with all its fields and the data type of each field.

    ‹#›

    3-Level Architecture: Notes (Contd…)

    An external view is a subset of the conceptual view. The union of all external views = conceptual view. External views serve two purposes:

    Users see only the data they are interested in e.g. SIS.

    Views can be used for security – e.g. to stop faculty/staff seeing confidential information about others e.g. salary.

    An example of an external to conceptual mapping is a field might be viewed as US$ at the external view but seen as CDN$ at the conceptual level.

    An example of a conceptual mapping to an internal mapping could be that data is seen as relational tables at the conceptual level but is seen as plain records at the internal level. (More later & exercise.)

    ‹#›

    3-Level Architecture: Notes (Contd…)

    The internal level describes each record in a very physical way – in terms of the data types and the number of bytes used for each field in each record.

    However, there are no actual references to addresses on the hard disk(s) used. This was never part of the architecture.

    Architecture says nothing about storage devices – just as well as there have been so many advances since it was developed!

    The Data Base Administrator (DBA) can interact with all levels. Users and programs can only interact with the external level.

    Ensures that the conceptual, internal levels and mappings are all correct.

    Chaos would result if any user could interact with all levels and mappings.

    ‹#›

    Aims of 3-Level Architecture

    Data Independence: The opposite is data dependence - knowledge of data and file structures is built into the logic and code of programs – as in TFP. Undesirable since: Different applications would want to see the data in different ways and the DBA needs the flexibility to change file structures.

     Definition: Data Independence is the immunity of applications to changes at both the conceptual level and internal level.

    Logical Data Independence: changes to conceptual level do not affect external level. Example: if we add a field to one view, all programs using the other views will be not have to be changed/recoded.

    Physical Data Independence: changes to internal level do not affect external level. Example: If we switch from B+-trees to hash files at the internal level, the only changes are done to the Conceptual/Internal mapping but not to the external level. So no program needs to be rewritten. (Reality now but not back in 70’s & early 80’s. Nothing was said then about file structures.)

    ‹#›

    Aims of 3-Level Architecture

    Multiple user views – because people want to see data differently and to provide for security needs.

    Use of schemas to make DB “off the shelf”. That was not the terminology used then but it is the DBMS software which truly makes it “off the shelf”.

    For example,

    We buy Oracle, enter our description of the date (and constraints) and enter data.

    Then we code all the programs needed.

    Oracle can be used for many different types of applications.

    ‹#›

    Example of a DB with just one record and what that might look at each level (very simplified!)

    External –View 1External – View 2

    employ record emp# varchar (20) salary varchar (30) emp# record emp-number varchar (20) dept–no varchar (20)

    Conceptual

    emp record

    employee number varchar (20)

    employee department varchar (20)

    employee salary varchar (30)

    Internal

    employee record length = 70 bytes employee# length = 20 bytes, offset = 0, Key dept# length = 20 bytes, offset = 20 pay length = 30 bytes, offset = 40

    Internal Record

    Conceptual employee record

    Two external employee records

    Note: the mappings (different field names at each level) and that each record has a unique name.

    Must allow enough space in the internal schema for the data at the conceptual level & for the external views.

    Must have a “key” in the internal schema to uniquely identify each record and allow it to be retrieved.

    ‹#›

    Another View of the ArchitectureOracle is more like this…

    Data

    Metadata

    +

    file structures

    Query

    Processor

    Transaction

    Manager

    Storage

    Manager

    Schema

    Modification

    Queries by both

    users and

    applications

    programs

    Modifications

    Constraints

    Security

    concurrency

    control

    Storage component

    ‹#›

    Storage component: contains meta data and data.

    Meta data includes names of relations, attributes, data types, info on indexes.

    Most common file structure is some sort of B-tree or hashing

     Storage Manager: retrieves data from storage; modifies data as requested; stores data and deletes data as requested.

    It is the interface between the stored data and the users/applications that use it.

    Another View of the ArchitectureDescription of the components

    ‹#›

    Query Processor: Processes and optimizes all queries, that is –

    Accepts all requests for data retrieval and modification.

    Translates query (e.g. in SQL) into a sequence of requests for stored data.

    Optimizes queries – to minimize costly disk accesses.

    Transaction Manager: Contains all constraints, security measures, and concurrency control methods.

    Responsible for concurrency control and integrity maintenance during transactions (sequential group of queries – e.g. transfer funds from bank account to credit card).

    *It one off to one side as messages to users won’t have to go back through it – but that’s all! All queries, including read only ones will go through it as read locks have to acquired etc,.

    * Different to notes on website.

    Another View of the ArchitectureDescription of the components (Contd…)

    ‹#›