2005-10-06 - slide 1is 202 – fall 2005 prof. ray larson uc berkeley sims sims 202: information...
Post on 21-Dec-2015
218 views
TRANSCRIPT
2005-10-06 - SLIDE 1IS 202 – FALL 2005
Prof. Ray Larson
UC Berkeley SIMS
SIMS 202:
Information Organization
and Retrieval
Introduction to Database Design
2005-10-06 - SLIDE 2IS 202 – FALL 2005
Lecture Overview
• Databases and Database Design
• Database Life Cycle
• Database Design
• ER Diagrams
• Discussion
2005-10-06 - SLIDE 3IS 202 – FALL 2005
Lecture Overview
• Databases and Database Design
• Database Life Cycle
• Database Design
• ER Diagrams
• Discussion
2005-10-06 - SLIDE 4IS 202 – FALL 2005
What is a Database?
2005-10-06 - SLIDE 5IS 202 – FALL 2005
Files and Databases
• File: A collection of records or documents dealing with one organization, person, area or subject (Rowley)– Manual (paper) files– Computer files
• Database: A collection of similar records with relationships between the records (Rowley)– Bibliographic, statistical, business data,
images, etc.
2005-10-06 - SLIDE 6IS 202 – FALL 2005
Database
• A Database is a collection of stored operational data used by the application systems of some particular enterprise (C.J. Date)– Paper “Databases”
• Still contain a large portion of the world’s knowledge
– File-Based Data Processing Systems• Early batch processing of (primarily) business data
– Database Management Systems (DBMS)
2005-10-06 - SLIDE 7IS 202 – FALL 2005
Why DBMS?
• History– 50’s and 60’s all applications were custom
built for particular needs– File based– Many similar/duplicative applications dealing
with collections of business data– Early DBMS were extensions of programming
languages– 1970 - E.F. Codd and the Relational Model– 1979 - Ashton-Tate and first Microcomputer
DBMS
2005-10-06 - SLIDE 8IS 202 – FALL 2005
From File Systems to DBMS
• Problems with file processing systems– Inconsistent data– Inflexibility– Limited data sharing– Poor enforcement of standards– Excessive program maintenance
2005-10-06 - SLIDE 9IS 202 – FALL 2005
DBMS Benefits
• Minimal data redundancy• Consistency of data• Integration of data• Sharing of data• Ease of application development• Uniform security, privacy, and integrity controls• Data accessibility and responsiveness• Data independence• Reduced program maintenance
2005-10-06 - SLIDE 10IS 202 – FALL 2005
Terms and Concepts
• Data independence– Physical representation and location of data
and the use of that data are separated• The application doesn’t need to know how or
where the database has stored the data, but just how to ask for it
• Moving a database from one DBMS to another should not have a material effect on application program
• Recoding, adding fields, etc. in the database should not affect applications
2005-10-06 - SLIDE 11IS 202 – FALL 2005
Database Environment
CASE Tools
DBMS
UserInterface
ApplicationPrograms
Repository Database
2005-10-06 - SLIDE 12IS 202 – FALL 2005
Database Components
DBMS===============
Design toolsTable CreationForm CreationQuery CreationReport Creation
Procedural language
compiler (4GL)=============
Run timeForm processorQuery processor
Report WriterLanguage Run time
UserInterface
Applications
ApplicationProgramsDatabase
Database contains:User’s DataMetadataIndexesApplication Metadata
2005-10-06 - SLIDE 13IS 202 – FALL 2005
Types of Database Systems
• PC databases
• Centralized database
• Client/server databases
• Distributed databases
• Database models
2005-10-06 - SLIDE 14IS 202 – FALL 2005
PC Databases
E.g.:AccessFoxProDbaseEtc.
2005-10-06 - SLIDE 15IS 202 – FALL 2005
Centralized Databases
Central Computer
2005-10-06 - SLIDE 16IS 202 – FALL 2005
Client Server Databases
NetworkClient
Client
Client
DatabaseServer
2005-10-06 - SLIDE 17IS 202 – FALL 2005
Distributed Databases
computercomputer
computer
Location A
Location CLocation B
HomogeneousDatabases
2005-10-06 - SLIDE 18IS 202 – FALL 2005
Distributed Databases
Local Network
DatabaseServer
Client
Client
CommServer
Remote Comp.
Remote Comp.
HeterogeneousOr FederatedDatabases
2005-10-06 - SLIDE 19IS 202 – FALL 2005
Terms and Concepts
• A “database application” is an application program (or set of related programs) that is used to perform a series of database activities:– Create
• Add new data to the database
– Read• Read current data from the database
– Update• Update or modify current database data
– Delete• Remove current On behalf of database users
2005-10-06 - SLIDE 20IS 202 – FALL 2005
Terms and Concepts
• Enterprise– Organization
• Entity– Person, Place, Thing, Event, Concept...
• Attributes– Data elements (facts) about some entity– Also sometimes called fields or items or domains
• Data values– Instances of a particular attribute for a particular
entity
2005-10-06 - SLIDE 21IS 202 – FALL 2005
Terms and Concepts
• Key– An attribute or set of attributes used to identify
or locate records in a file
• Primary Key– An attribute or set of attributes that uniquely
identifies each record in a file
2005-10-06 - SLIDE 22IS 202 – FALL 2005
Terms and Concepts
• Models– (1) Levels or views of the Database
• Conceptual, logical, physical
– (2) DBMS types• Relational, Hierarchic, Network, Object-Oriented,
Object-Relational
2005-10-06 - SLIDE 23IS 202 – FALL 2005
Models (1)
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
More later on this…
2005-10-06 - SLIDE 24IS 202 – FALL 2005
Data Models(2): History
• Hierarchical Model (1960’s and 1970’s)– Similar to data structures in programming
languages
Books(id, title)
Publisher SubjectsAuthors
(first, last)
2005-10-06 - SLIDE 25IS 202 – FALL 2005
Data Models(2): History
• Network Model (1970’s)– Provides for single entries of data and
navigational “links” through chains of data.
Subjects Books
Authors
Publishers
2005-10-06 - SLIDE 26IS 202 – FALL 2005
Data Models(2): History
• Relational Model (1980’s)– Provides a conceptually simple model for data
as relations (typically considered “tables”) with all data visible
Book ID Title pubid Author id1 Introductio 2 12 The history 4 23 New stuff ab 3 34 Another title 2 45 And yet more 1 5
pubid pubname1 Harper2 Addison3 Oxford4 Que
Authorid Author name1 Smith2 Wynar3 Jones4 Duncan5 Applegate
Subid Subject1 cataloging2 history3 stuff
Book ID Subid1 22 13 34 24 3
2005-10-06 - SLIDE 27IS 202 – FALL 2005
Data Models(2): History
• Object Oriented Data Model (1990’s)– Encapsulates data and operations as
“Objects”
Books(id, title)
Publisher SubjectsAuthors
(first, last)
2005-10-06 - SLIDE 28IS 202 – FALL 2005
Data Models(2): History
• Object-Relational Model (1990’s)– Combines the well-known properties of the
Relational Model with such OO features as:• User-defined datatypes• User-defined functions• Inheritance and sub-classing
• All of the major enterprise DBMS systems are now Object-Relational or incorporate Object-Relational features
2005-10-06 - SLIDE 29IS 202 – FALL 2005
Lecture Overview
• Databases and Database Design
• Database Life Cycle
• Database Design
• ER Diagrams
• Discussion
2005-10-06 - SLIDE 30IS 202 – FALL 2005
Database System Life Cycle
Growth,Change, &
Maintenance6
Operations5
Integration4
Design1
Conversion3
PhysicalCreation
2
2005-10-06 - SLIDE 31IS 202 – FALL 2005
Design (more later)
• Determination of the needs of the organization– That is: “what are the business rules of this
organization”, what information does it use and need?
• Development of the Conceptual Model of the database– In Database design this is typically done with Entity-
Relationship diagramming techniques
• Construction of a Data Dictionary• Development of the Logical Model
2005-10-06 - SLIDE 32IS 202 – FALL 2005
Physical Creation
• Development of the Physical Model of the Database– Data formats and types– Determination of indexes, etc.
• Load a prototype database and test
• Determine and implement security, privacy and access controls
• Determine and implement integrity constraints
2005-10-06 - SLIDE 33IS 202 – FALL 2005
Conversion
• Convert existing data sets and applications to use the new database– May need programs, conversion utilities to
convert old data to new formats
2005-10-06 - SLIDE 34IS 202 – FALL 2005
Integration
• Overlaps with Phase 3
• Integration of converted applications and new applications into the new database
2005-10-06 - SLIDE 35IS 202 – FALL 2005
Operations
• All applications run full-scale
• Privacy, security, access control must be in place
• Recovery and Backup procedures must be established and used
2005-10-06 - SLIDE 36IS 202 – FALL 2005
Growth, Change, and Maintenance
• Change is a way of life– Applications, data requirements, reports, etc.
will all change as new needs and requirements are found
– The Database and applications and will need to be modified to meet the needs of changes
2005-10-06 - SLIDE 37IS 202 – FALL 2005
Another View of the Life Cycle
Operations5
Conversion3
PhysicalCreation
2Growth, Change
6
Integration4
Design1
2005-10-06 - SLIDE 38IS 202 – FALL 2005
Lecture Overview
• Databases and Database Design
• Database Life Cycle
• Database Design
• ER Diagrams
• Discussion
2005-10-06 - SLIDE 39IS 202 – FALL 2005
Database design process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2005-10-06 - SLIDE 40IS 202 – FALL 2005
Conceptual Design
• Conceptual Model– Merge the collective needs of all applications– Determine what Entities are being used
• Some object about which information is to maintained
– What are the Attributes of those entities?• Properties or characteristics of the entity• What attributes uniquely identify the entity
– What are the Relationships between entities• How the entities interact with each other?
2005-10-06 - SLIDE 41IS 202 – FALL 2005
Developing a Conceptual Model
• Overall view of the database that integrates all the needed information discovered during the requirements analysis
• Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details
• Can also be represented using other modeling tools (such as UML)
2005-10-06 - SLIDE 42IS 202 – FALL 2005
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2005-10-06 - SLIDE 43IS 202 – FALL 2005
Logical Design
• Logical Model– How is each entity and relationship
represented in the Data Model of the DBMS• Hierarchic?• Network?• Relational?• Object-Oriented?
2005-10-06 - SLIDE 44IS 202 – FALL 2005
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2005-10-06 - SLIDE 45IS 202 – FALL 2005
Physical Design
• Internal Model– Choices of index file structure– Choices of data storage formats– Choices of disk layout
2005-10-06 - SLIDE 46IS 202 – FALL 2005
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
2005-10-06 - SLIDE 47IS 202 – FALL 2005
Database Application Design
• External Model– User views of the integrated database – Making the old (or updated) applications work
with the new database design
2005-10-06 - SLIDE 48IS 202 – FALL 2005
Lecture Overview
• Databases and Database Design
• Database Life Cycle
• Database Design
• ER Diagrams
• Discussion
2005-10-06 - SLIDE 49IS 202 – FALL 2005
ER Diagrams
• You have already seen a UML form of one…
2005-10-06 - SLIDE 50IS 202 – FALL 2005
ER Diagrams
• Entity-Relationship (ER) diagrams are concerned with the entities involved in the organization/database and the relationships between those entities (as well as the attributes of each entity
2005-10-06 - SLIDE 51IS 202 – FALL 2005
Entity
• An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information– Persons (e.g.: customers in a business,
employees, authors)– Things (e.g.: purchase orders, meetings,
parts, companies)
Employee
2005-10-06 - SLIDE 52IS 202 – FALL 2005
Attributes
• Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it (this is the Metadata for the entities)
Employee
Last
Middle
First
Name SSN
Age
Birthdate
Projects
2005-10-06 - SLIDE 53IS 202 – FALL 2005
Relationships
• Relationships are the associations between entities
• They can involve one or more entities and belong to particular relationship types
2005-10-06 - SLIDE 54IS 202 – FALL 2005
Relationships
ClassAttendsStudent
PartSuppliesproject parts
Supplier
Project
2005-10-06 - SLIDE 55IS 202 – FALL 2005
Types of Relationships
• Concerned only with cardinality of relationship
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
1 1
n
n
1
m
Chen ER notation
2005-10-06 - SLIDE 56IS 202 – FALL 2005
Other Notations
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
“Crow’s Foot”
2005-10-06 - SLIDE 57IS 202 – FALL 2005
Other Notations
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
IDEFIX Notation
2005-10-06 - SLIDE 58IS 202 – FALL 2005
More Complex Relationships
ProjectEvaluationEmployee
Manager
1/n/n
1/1/1
n/n/1
ProjectAssignedEmployee 4(2-10) 1
SSN ProjectDate
ManagesEmployee
Manages
Is Managed By
1
n
2005-10-06 - SLIDE 59IS 202 – FALL 2005
Weak Entities
• Owe existence entirely to another entity
Order-lineContainsOrder
Invoice #
Part#
Rep#
QuantityInvoice#
2005-10-06 - SLIDE 60IS 202 – FALL 2005
Supertype and Subtype Entities
ClerkIs one ofSales-rep
Invoice
Other
Employee
Sold
Manages
2005-10-06 - SLIDE 61IS 202 – FALL 2005
Many to Many Relationships
Employee
ProjectIsAssigned
ProjectAssignment
Assigned
SSN
Proj#
SSN
Proj#Hours
2005-10-06 - SLIDE 62IS 202 – FALL 2005
UML diagraming
• ER diagrams are often now expressed using UML instead…
2005-10-06 - SLIDE 63IS 202 – FALL 2005
UML Class Diagram
DIVEORDS
Order NoCustomer NoSale DateShipviaPaymentMethodCCNumberNo of PeopleDepart DateReturn DateDestinationVacation Cost
CalcTotalInvoice()CalcEquipment()
Class Name
List of Attributes
List of operations
2005-10-06 - SLIDE 64IS 202 – FALL 2005
Differences from Entities in ER
• Entities can be represented by Class diagrams
• But Classes of objects also have additional operations associated with them
2005-10-06 - SLIDE 65IS 202 – FALL 2005
Operations
• Three basic types for database– Constructor– Query– Update
2005-10-06 - SLIDE 66IS 202 – FALL 2005
Associations
• An association is a relationship that describes a set of links between or among objects.
• An association can have a name that describes the nature of this relationship. You can put a triangle next to this name to indicate the direction in which the name should be read.
2005-10-06 - SLIDE 67IS 202 – FALL 2005
Associations
• An association contains an ordered list of association ends. – An association with exactly two association
ends is called a binary association– An association with more than two ends is
called an n-ary association.
2005-10-06 - SLIDE 68IS 202 – FALL 2005
Associations: Unary relationships
PersonIs-married-to
0..1
0..1
Employeemanages
*
0..1 manager
2005-10-06 - SLIDE 69IS 202 – FALL 2005
Associations: Binary Relationship
EmployeeParkingPlace
One-to-one
Is-assigned0..1 0..1
ProductLine
Product
One-to-many
contains1 *
Student Course
Many-to-many
Registers-for* *
2005-10-06 - SLIDE 70IS 202 – FALL 2005
Associations: Ternary Relationships
Vendor Warehouse* *Supplies
Part
*
2005-10-06 - SLIDE 71IS 202 – FALL 2005
Association Classes
Student CourseRegisters-for
* *
Registration________________TermGrade________________CheckEligibility()
Computer Account_________________
acctIDPassword
ServerSpace* 0..1issues
2005-10-06 - SLIDE 72IS 202 – FALL 2005
Derived Attributes, Associations, and Roles
Student_________
namessn
dateOfBirth/age
Course Offering
____________term
sectiontime
location
Registers-for
* 1
Course ____________
crseCodecrseTitlecreditHrs
* *
Scheduled-for
{age = currentDate – dateOfBirth}
* *
/Takes
/participant
Derivedattribute
Derived role
Derived association
2005-10-06 - SLIDE 73IS 202 – FALL 2005
GeneralizationEmployee
____________empName
empNumberaddress
dateHired____________
printLabel()
Hourly Employee_______________
HourlyRate_______________computeWages()
Salaried Employee_______________
Annual Salstockoption
_______________Contributepension()
Consultant_______________contractNumber
billingRate_______________
computeFees()
2005-10-06 - SLIDE 74IS 202 – FALL 2005
Lecture Overview
• Databases and Database Design
• Database Life Cycle
• ER Diagrams
• Discussion
2005-10-06 - SLIDE 75IS 202 – FALL 2005
Discussion
• Why use DBMS for web-based system development?
• Why Not use IR systems?
• Can you use both?
• Other Questions?
2005-10-06 - SLIDE 76IS 202 – FALL 2005
Next Time
• Normalization and the Relational Model