introduction to databases
DESCRIPTION
Introduction to Databases. Data Organisation Definition Data modelling SQL DBMS functions. Basics of data Organisation:. DATA HIERARCHY (four categories) Fields = represent a single data item Records = made up of a related set of fields describing one instance of an entity - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/1.jpg)
Introduction to Databases
Data Organisation
Definition
Data modelling
SQL
DBMS functions
![Page 2: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/2.jpg)
Basics of data Organisation:
DATA HIERARCHY (four categories)
• Fields = represent a single data item
• Records = made up of a related set of fields describing one instance of an entity
• File / Table = a set of related records - as many as instances (occurrence) in the set
• Database = a collection of related files
![Page 3: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/3.jpg)
Example of data structure
Name First name Telephone
Sampras Pete 45 25 65 65Healy Margaret 25 58 96 63Clinton Bill 12 25 28 89Henry Thierry 25 78 85 85
Fields
Records
File / Table + Other files =>complete dataStructure = DB
![Page 4: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/4.jpg)
"A collection of interrelated data stored together with controlled redundancy, to serve one or more applications in an optimal fashion; the data is stored so that it is independent of the application programs which use it; a common and controlled approach is used in adding new data and in modifying existing data within the database."
Database: Definition.
![Page 5: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/5.jpg)
• A collection of interrelated data stored together
• with controlled redundancy
• to serve one or more applications in an optimal fashion
• the data is stored so that it is independent of the application programs which use it
• a common and controlled approach is used in adding new data and in modifying existing data within the database.
Definition - closer look
![Page 6: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/6.jpg)
Advantages of Databases:
• data are independent from applications - stored centrally
• data repository accessible to any new program
• data are not duplicated in different locations
• programmers do not have to write extensive descriptions of the files
• These save enough money and time to offset the extra costs of setting and maintaining DBs
![Page 7: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/7.jpg)
Disadvantages of DBs:
• Data are more accessible so more easily abused
• Large DBs require expensive hardware and software
• specialised / scarce personnel is required to develop and maintain large DBs
• People / business units may object to “their” data being widely available in a DB
![Page 8: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/8.jpg)
Characteristics of DBs…
• High concurrency (high performance under load)
• Multi-user (read does not interfere with write)
• Data consistency – changes to data don’t affect running queries + no phantom data changes
• High degree of recoverability (pull the plug test)
![Page 9: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/9.jpg)
ACID test
• Atomicity
• Consistency
• Isolation
• Durability
All or nothing
Preserve consistency of database
Transactions are independent
Once committed data is preserved
![Page 10: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/10.jpg)
DataBase Management System (DBMS):
• program that makes it possible to:
– create– use
– maintain a database
• It provides an interface / translation mechanism between the logical organisation of the data stored in the DB and the physical organisation of the data
![Page 11: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/11.jpg)
Using a database:
Two main functions of the DBMS :
• Query language - for people who are not programmer (greatest advantage of DB)
• Data manipulation language - for programmers who want to modify the links between data elements within the DB
• Also, Host Language - the language used by programmers to develop the rest of the application - eg: Visual Basic for Applications (VBA) / Oracle developer 2000
![Page 12: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/12.jpg)
Different types of DBs:
• creating the DB = specifying the links between data items
• different types of relationships can be specified - ie different logical views
• they correspond to three main types of DBMSs:
– Hierarchical DBs– Network DBs– Relational DBs– Object Oriented DBs
![Page 13: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/13.jpg)
Hierarchical DBs:
• data item are related as “Parent” and “Child” in a tree-like structure
• “parent” means data item is higher in the tree than “child” and connected to it
• one “parent” can have more than one “child”, but one “child” can only have one “parent”
• most common platform = IBM’s Information Management System (IMS)
![Page 14: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/14.jpg)
Example
Customers
Orders
Items
Unit of packaging
Payments
Currency
Substitution Product
Very fast retrieval
![Page 15: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/15.jpg)
Undesirable side effects:
• Insertion of record:
– dependent record cannot be added without a parent– eg: units of packaging cannot be added without linkage
to an existing item
• Deletion of record:
– deletion of a parent deletes all children– deleting an existing item will delete its replacement
items
• Impossible to have two parents = trouble
![Page 16: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/16.jpg)
Network DBs:
• same as parent and children in Hierarchical DB, but children can have more than one parent
• It is also possible to link items upwards to other items parents
• practically, it means that the DBMS is more flexible for data retrieval
![Page 17: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/17.jpg)
Example
Customers
Orders
Items
Unit of packaging
Payments
Currency
Substitution Product
Suppliers
![Page 18: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/18.jpg)
Relational DBs:
• Data items stored in tables
• Specific fields in tables related to other field in other tables (joint)
• infinite number of possible viewpoints on the data (queries)
• Highly flexible DB but overly slow for complex searches
• Oracle, SyBase, Ingres, Access, Paradox for Windows...
![Page 19: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/19.jpg)
Describing relationships
• Attempt at modelling the business elements (entities) and their relationships (links)
• Can be based on users’ descriptions of the business processes
• Specifies dependencies between the data items
• Coded in an Entity-Relationship Diagram (ERD)
![Page 20: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/20.jpg)
Types of Relationships
• one-to-one: one instance of one data item corresponds to one instance of another
• one-to-many: one instance to many instances
• many-to-many: many instance correspond to many instances
• Also some relationships may be:
– compulsory– optional
![Page 21: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/21.jpg)
Example
• Student registering system
• What are the entities?
• What type of relationship do they have?
• Draw the diagram
![Page 22: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/22.jpg)
Entity Relationship Diagram
![Page 23: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/23.jpg)
Next step - creating the data structure
• Few rules - a lot of experience
• Can get quite complex (paramount for the speed of the DB)
• Tables must be normalised - ie redundancy is limited to the strict minimum by an algorithm
• In practice, normalisation is not always the best
![Page 24: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/24.jpg)
Data Structure Diagrams
• Describe the underlying structure of the DB: the complete logical structure
• Data items are stored in tables linked by pointers
– attribute pointers: data fields in one table that will link it to another (common information)
– logical pointers: specific links that exist between tables
• Tables have a key
• If an attribute seems to belong to a relationship rather than an attribute, it may mean an associative entity must be added
![Page 25: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/25.jpg)
ORDER
order numberItem descriptionItem PriceQuantity orderedCustomer numberItem number
Item
Item numberItem descriptionItem costQuantity on hand
Customer
Customer numberCustomer nameCustomer addressCustomer balanceCustomer special rate
1
2
3
4
* compulsory attributes0 optional attributes
![Page 26: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/26.jpg)
Definitions
• Entity
• Attributes
• Instance(s)
• Domain
• Key (candidate primary and foreign)
![Page 27: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/27.jpg)
Definitions
• Relationship
• Ordinality
• Cardinality
• Associative Entity
![Page 28: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/28.jpg)
Some test questions
• Is it a bird is it a plane?
• Is it an entity or an attribute?
![Page 29: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/29.jpg)
Normalisation
• Process of simplifying the relationships amongst data items as much as possible (see example provided - handout)
• Through an iterative process, structure of data is refined to 1NF, 2NF, 3NF etc.
• Reasons for normalisation:
– to simplify retrieval (speed of response)– to simplify maintenance (updates, deletion, insertions)– to reduce the need to restructure the data for each new
application
![Page 30: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/30.jpg)
First Normal Form
• design record structure so that each record looks the same (same length, no repeating groups)
• repetition within a record means one relation was missed = create new relation
• elements of repeating groups are stored as a separate entity, in a separate table
• normalised records have a fixed length and expanded primary key
![Page 31: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/31.jpg)
Second Normal Form
• Record must be in first normal form first
• each item in the record must be fully dependent on the key for identification
• Functional dependency means a data item’s value is uniquely associated with another’s
• only on-to-one relationship between elements in the same file
• otherwise split into more tables
![Page 32: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/32.jpg)
Third normal form
• to remove transitive dependencies
• when one item is dependent on an item which is dependent from the key in the file
• relationship is split to avoid data being lost inadvertently
• this will give greater flexibility for the design of the application + eliminate deletion problems
• in practice, 3 NF not used all the time - speed of retrieval can be affected
![Page 33: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/33.jpg)
Beyond data modeling
• Model must be normalised – purpose ?
• Outcome is a set of tables = logical design
• Then, design can be warped until it meets the realistic constraints of the system
• Eg: what business problem are we trying to solve? – see handout [riccardi p. 113, 127]
![Page 34: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/34.jpg)
Realistic constraints
• Users cannot cope with too many tables
• Too much development required in hiding complex data structure
• Too much administration
• Optimisation is impossible with too many tables
• Actually: RDBs can be quite slow!
![Page 35: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/35.jpg)
Key practical questions
• What are the most important tasks that the DB MUST accomplish efficiently?
• How must the DB be rigged physically to address these?
• What coding practices will keep the coding clean and simple?
• What additional demands arise from the need for resilience and security?
![Page 36: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/36.jpg)
Analysis - Three Levels of Schema
Internal Schema
Logical Schema
External Schema 2 External Schema …External Schema 1
Disk Array
Tables Tables
![Page 37: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/37.jpg)
4 way trade-off
Performance
Clarity of code
Ease of use
Security
![Page 38: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/38.jpg)
Key decisions
• Oracle offers many different ways to do things– Indexes– Backups…
• Good analysis is not only about knowing these => understanding whether they are appropriate
• Failure to think it through => unworkable model
• Particularly, predicting performance must be done properly– Ok on the technical side, tricky on the business side
![Page 39: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/39.jpg)
Design optimisation
• Sources of problems:– Network traffic– Excess CPU usage
• But physical I/O is greatest threat (different from physical I/O)
• Disks still the slowest in the loop
• Solution: minimise or re-schedule access
• Also try to minimise the impact of Q4 (e.g. mirroring, internal consistency checks…)
![Page 40: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/40.jpg)
Creating links between the tables
• use common fields to join tables / queries
• very easy when data is properly normalised
• Gives total flexibility in terms of data retrieval
• Main strength of RDBs (SQL)
![Page 41: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/41.jpg)
Structured Query Language
• used for defining and manipulating data in Relational DBs
• aimed at:
– reducing training costs– increasing productivity– improve application portability– increase application longevity– reduce dependency on single vendors– enable cross systems communication
• In practice, SQLs can be a bit different
![Page 42: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/42.jpg)
Querying RDBs with SQL• use a form of pseudo english to retrieve data in
a view (which looks like a table)
• syntax is based on a number of “clauses”
• Select: specifies what data elements will be included in the view
• From: lists the tables involved
• Where: specifies conditions to filter the data
– specific values sought– links between tables
![Page 43: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/43.jpg)
Example with one table
• find the name and address of customer number 1217
![Page 44: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/44.jpg)
Example with a range
• find the items which are priced between £50 and £15000
![Page 45: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/45.jpg)
Example with two tables
• find the rep name of all customers
![Page 46: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/46.jpg)
Example with two tables
• same for customer Robson only
![Page 47: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/47.jpg)
Use of a Search Condition - nested queries
• find the name and address of the customer who ordered order # 110
![Page 48: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/48.jpg)
Additional syntax
• Add computation in the “select” statement:
– select SUM(price)– select AVG(price), MAX, MIN, COUNT
• Simplify comparisons with a BETWEEN clause and LIKE clause (with *, ?)
• Add sorting instruction after the where clause
– ORDER BY name (alphabetical)– ORDER BY price (ascending)
• Provide aggregate information by grouping data:
– GROUP BY customer
![Page 49: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/49.jpg)
• find contents (item# and description) of order 110:
![Page 50: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/50.jpg)
• find the average price of the cars for sale
• find the average price of all orders taken so far by customer “Jones”
![Page 51: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/51.jpg)
• find how much cash customer “Barry” has generated in total
![Page 52: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/52.jpg)
find the average price of all orders taken so far
![Page 53: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/53.jpg)
![Page 54: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/54.jpg)
CUSTOMER TABLE
PRODUCT TABLE
ITEM TABLE
SALES_ORDER TABLE
PRICE TABLE
Oracle Demo Set -Sales Order Processing
![Page 55: Introduction to Databases](https://reader036.vdocuments.us/reader036/viewer/2022062518/568142e5550346895daf42c4/html5/thumbnails/55.jpg)
DEPARTMENT TABLE
EMPLOYEE TABLE
JOB TABLE
LOCATION TABLE
Oracle Demo Set -Employee Data