data modeling workshopexpert-data.com/uploads/datamodellingprimer.pdfquote from ims documentation, a...
TRANSCRIPT
Data Modeling
Workshop
Expert Data Solutions, Inc.
Prepared by:
Mark Hyatt
Scope of this Workshop
• The Relational Model
• Beginning Data Modeling
In the Beginning, there was Chaos
DATA
Quote from IMS documentation, a well knowpre-relational system:
“Logically deleting a logical child preventsfurther access to the logical child using its logicalparent. Unidirectional logical child segments areassumed to be logically deleted. A logical parentis considered logically deleted when all its logicalchildren are physically deleted. For physicallypaired logical relationships, the physical childpaired to the logical child must also be physicallydeleted before the logical parent is consideredlogically deleted.”
Pre-relational Problems
What is a Relational Database?
• Implementation of data structures and controls basedupon Relational Theory
• Relational Algebra first described in 1970 by Dr. E. F.Codd
• Dr. Codd placed database research on a solidscientific footing
• Set theory based mathematical rules for a data model
• All Relational Database Management Systemscurrently use the Structured Query Language (SQL)for data access and control
• Introductory reading:
A Database Primer by C.J. Date
Entity-Relationship Model
• Used when mapping a real-world system to aRelational Database Management System
• Categorizes all elements of a system as eitheran entity or a relationship
• Three basic steps:– Identify the entity types of your system
– Identify attributes of each entity
– Identify the relationships between entities
• Normalize the model
• Create tables to implement the E-R model
Normalization
• A more complete discussion of Normalizationcomes later in the presentation, but basically, theprocess ensures that each entity has a completeprimary key and that all attributes contribute to adescription of that unique key.
• Normally, each entity is represented by a table,however, this is not always the case whenimplementing entity sub-types.
Drawing Notation
There are several standard notations in use for Entity-Relationship Modeling. The three most commonly used are IE(Information Engineering), IDEF1X (Integration Definition forInformation Modeling) and Barker Notation (developed byRichard Barker and several others at the British Consultingfirm CACI and then adopted by Oracle for its CASE modelingtools). A brief summary of the differences between thesenotations follows, but a detailed discussion of the variousstrengths and weaknesses of each style is beyond the scope ofthis presentation. The examples in this presentation use theOracle CASE*Method, or Barker notation, as it is arguably thesimplest to read and so discuss diagrams with non-technicalbusiness users.
Notation
Notation IDEF1X IE Barker
Relationship Multiplicity
• Zero or one
• One only
• Zero or more
• One or more
Line StyleSolid line signifies Identifying
relationshipAll lines solid and carry no
significanceTwo part line - solid signifies mandatory and dashed optional
Identifying Relationships Property not visible in drawing
Entity Subtypes
Or Constraint N/A N/A
Exclusive Or (XOR) Constraint
N/A
P
1
Z
Child ChildParent Parent
SubtypeSupertype
Subtype
SubtypeSupertype
Subtype
Subtype
Subtype
Supertype
This
Pronoun
That
Us
Pronoun
Them
Us
Pronoun
Them
Entity Type
• Commonly referred to simply as an Entity
• A thing of significance to the system
• Logical equivalent of nouns (person, place, orthing)
• First step of Logical Data Modeling is toidentify Entity Types
Department
Example
Attributes
• Fully describes Entity Types
• Logical equivalent of adjectives
• Each Entity must contain at least oneAttribute and one or more Attributescombined should uniquely identify the EntityType
Department
# Department Number
Department Name
Department Head
Example
Relationships
• Logical equivalent of verbs
• Describes an association between two things(Entities)
• Three basic associations will be discussed:
• one-to-many
• one-to-one
• many-to-many
Department
Department
Skill
Employee
Employee
Dept Head
Relationships
• Each end of a relationship can also be optional ormandatory – “may (be)” or “must (be)”
• Optional relationships are shown as dashed andmandatory as solid lines in Barker Notation.
• They can be read in both directions with key wordssubstituted for optionality and multiplicity, alongwith entity names and the relationship roles.
Department Employeestaffed by
assigned toExample:
Each Department may be staffed by one or more Employees
Each Employee must be assigned to one and only one Department
Physical Data Model
• We will start with a better understanding of thephysical model since that is what will be familiar tomost users.
• Also, the ultimate goal of all Data Modeling is thedevelopment of a well functioning and wellunderstood physical database.
Database Tables
• The structure that physically holds data in a relationaldatabase
• Physical implementation of logical “entity types” andlogical “relationships”
• Must have at least one column (attribute)
• Contains zero or more rows
• Each row in the table has one value in each column(may be NULL)
• Rows are not named, which means the attribute(column) values uniquely define the row, and not bysome pre-assigned label.
Table Examples
PUPPIES
PUPPY_NO
NAME
KENNEL_NO
KENNELS
KENNEL_NO
NAME
LOCATION
PUPPIES
PUPPY_NO NAME KENNEL_NO
52 Fido 953 Fifi 954 Spot 5
KENNELS
KENNEL_NO NAME LOCATION
5 ABC Kennels Any City8 Super Dogs Metropolis9 Bat Kennels Gotham
Guidelines for Creating Tables
• Use descriptive names for tables, columns, indexes, and clusters
• Use only well-known or organization standard abbreviations
• Be consistent in the use of singular and plural forms of table names and columns. Plural for table names, singular for column names.
• Create a primary key for every table
• Avoid duplication of data across and within tables (normalize)
• Document the meaning of each table and column with the COMMENT statement
• Create a unique index for the primary key
Columns
• Called a field in non-relational systems
• Represents one and only one attribute of the entity or a foreign key defining a relationship
• Name should indicate its meaning
• Should not contain any hidden meaning, and therefore should not represent multiple attributes (overloaded fields may be used in programs, but not in the database)
• Identify columns by name, not by position
• Should not consider the physical order within the table as significant
PUPPIES
52 Fido 953 Fifi 954 Spot 5
Column PUPPIES.NAME
Column Example
PUPPY_NO NAME KENNEL_NO
Rows
• Stores the data of a table
• Represents one occurrence (or instance) of the entity or relationship represented by the table
• Should never be duplicated within a table
• Should not consider the physical (natural) row order within the table as significant
• Number of rows likely to be quite dynamic
• Roughly equivalent to a record in non-relational databases
Scientific Basis
In mathematical terms, a table is verysimilar to an unordered set, without abeginning or end. You achieve logicaldata independence by not relying on afixed internal ordering of a table’s rows.
PUPPIES
52 Fido 953 Fifi 954 Spot 5
Row Example
PUPPY_NO NAME KENNEL_NO
Value
• The data referenced by the intersection of a given rowand column
• Assumes the data type of its column
• May be NULL (logically “unknown” or “missing”)
• Note that NULL is not the same as 0 or blank
• Any expression using NULL will evaluate to NULL,e.g. 10 * NULL is NULL. The only comparisonoperators you should use with null values are ISNULL and IS NOT NULL. A null value cannot be eitherequal to or not equal to any other value but is ratherunknown
• Oracle treats conditions evaluating to unknown asFALSE and returns NO error message.
PUPPIES
52 Fido 953 Fifi 954 Spot 5
Value Example
PUPPY_NO NAME KENNEL_NO
Views
• A logical representation of a table orcombination of tables using a stored query
• Data is derived from the tables on which it isbased (base tables) and do not actuallycontain data
• Base tables may in turn be actual tables orother views
• All operations performed on views actuallyaffect the base table
PUPPIES_KENNELS_VIEW
PUPPY_NO PUPPY_NAME KENNEL_NAME LOCATION
52 Fido XYZ Kennels This City53 Fifi XYZ Kennels This City54 Spot ABC Kennels That City
View Example
View Uses
• Provide additional table security byrestricting access to a predetermined set ofrows and/or columns
• Hide data complexity
• Reduce syntactic complexity
• Present data from a different perspective
• Provide a level of referential integrity
• A single view can be created which willdisplay only eastern sales data to the Easternregional staff, another view with only westernsales data to the Western regional staff, andanother with all sales data to the Nationalsales manager, all from the same base table
Security Views
• A single view may be used to return rowsjoined from multiple tables, so users needonly remember the simpler view name
Hide Data Complexity
• Views allow users to select data from multipletables and even multiple databases withoutactually knowing how to perform the joinoperation
Reduce Syntactic Complexity
• Provides a means to rename columns withoutactually affecting the table(s) on which theview is based
Different Perspective
• Using the WITH CHECK option of CREATEVIEW specifies that inserts and updatesperformed through the view should not beallowed when it results in rows that the viewcannot select, and therefore would disappearfrom the view when committed
• In other words, ensuring that no value isdefined that is not supported by theconditions of the view query
Additional Relational Integrity
Indexes
• Provide quick access to rows in a table
• Enforce uniqueness of rows within a table
• Best practices direct creation of a uniqueindex on the primary key of each table
• Oracle, DB2, and SQL Server, however, do notphysically require a unique index for everytable
• Transparent to users
Clusters
• Means of structuring the data in one or moretables so that the rows are physically close toone another
• Improves the performance of certainoperations and decreases the performance ofothers
• Transparent to users
Primary Keys
• Value uniquely identifies each row of the table
• Consists of one or more columns (compositeprimary key)
• Not required by the DBMS, but best practicestypically require a primary key for allproduction database tables
• Values should not change, as relationshipsare most often represented through primarykeys
• Only one primary key allowed in a table
• Implies UNIQUE and NOT NULL
PUPPIES
PUPPY_NO
NAME
KENNEL_NO
KENNELS
KENNEL_NO
NAME
LOCATION
Primary Key Examples
PK
PK
Foreign Keys
• Represent relationships between tables
• Consists of one or more columns (compositeforeign key)
• Values are derived from the primary key ofanother (or same) table
• May contain multiple foreign keys representingmultiple relationships (3rd Normal Form)
• The mechanism used for supporting referentialintegrity
PUPPIES
PUPPY_NO
NAME
KENNEL_NO
KENNELS
KENNEL_NO
NAME
LOCATION
Foreign Key Example
FK
Unique Keys
• Has all the properties of a primary key but fills adifferent function
• NOT NULL constraint must accompany the UNIQUEconstraint
• Forces a unique value for each row in the table
• Not used to identify each row but only to ensureuniqueness
• An example would be to dictate that every truck in acompany have a unique license plate number, butthe primary key is a company assigned trucknumber since the license plate number may changeover time.
Database Terminology
Formal Informal Nonrelational equivalent term
Relational Terms
Relation
Attribute
Tuple
Table
Column
Row
File
Field
Record
Database Terminology
• Relation: Formal term that embraces the conceptsof both table and view, not to be confused with arelationship
• Entity Type: A logical thing of significance, whetherreal or imagined, about which information needs tobe known or held, and implemented as a Relation(normally a table)
• Relationship: An association between two Entities
• Attribute: Any detail that serves to qualify, identify,classify, quantify or express the state of an entity
• Tuple: A set of values for an attribute, synonymouswith row
Data Normalization
• Process by which data is configured formaximum flexibility and minimum redundancyin a relational environment
• Consists of 5 Rules
“The only glory most of us
have to hope for is the glory
of being normal.”Katherine Fullerton Gerould
Data Normalization
Data Normalization implies that you are working inthe physical database world (Physical DataModeling), taking un-normalized real world data and“normalizing” it to improve your data model. Thishas been a traditional requirement when convertinglegacy data systems to relational.
Logical Data Modeling by contrast, should result ina fully normalized state when performed correctly.
Rules for Data Normalization
1. Eliminate Repeating Groups
2. Eliminate Redundant Data
3. Eliminate Columns Not Dependent on Key
4. Isolate Independent Multiple Relationships
5. Isolate Semantically Related Multiple Relationships
1 - Eliminate Repeating Groups
• Make a separate table for each set of related attributes, and give each table a primary key
In order to answer a simple
question such as “Can Fifi roll
over?”, we must first find Fifi’s
Puppy record, then scan the list of
tricks until we find the answer. This
is awkward and inefficient. By
moving the Puppy’s Tricks into their
own entity you can answer the
question with a direct retrieval by
finding if the Puppy No. and Trick
Name occur together.
Data items for Puppies
Puppy Number
Puppy Name
Kennel Number
Kennel Name
Kennel Location
Trick ID 1..n
Trick Name 1..n
Trick Where Learned 1..n
Skill Level 1..n
1 - Eliminate Repeating Groups
• Make a separate table for each set of related attributes, and give each table a primary key
Data items for Puppies
Puppy Number
Puppy Name
Kennel Number
Kennel Name
Kennel Location
Trick ID 1..n
Trick Name 1..n
Trick Where Learned 1..n
Skill Level 1..n
PUPPIES
PUPPY_NO
NAME
KENNEL_NO
KENNEL_NAME
KENNEL_LOCATION
TRICKS
PUPPY_NO
TRICK_ID
NAME
WHERE_LEARNED
SKILL_LEVEL
Primary Key
Primary Key
2 - Eliminate Redundant Data
PUPPIES
PUPPY_NO
NAME
KENNEL_NO
KENNEL_NAME
KENNEL_LOCATION
PUPPY_TRICKS
PUPPY_NO
TRICK_ID
WHERE_LEARNED
SKILL_LEVEL
• If an attribute depends on only part of a multi-valued key, remove it to a separate table
TRICKS
TRICK_ID
NAME
TRICKS
PUPPY_NO TRICK_ID TRICK_NAME WHERE_LEARNED SKILL_LEVEL
52 27 Roll Over 16 953 16 Nose Stand 9 954 27 Roll Over 9 5
Redundant
Data
2 - Eliminate Redundant Data
Second normal form is used to make data maintenance
more efficient and also to prevent update and delete
anomalies.
An update anomaly would occur if you wanted to reclassify
a trick and missed updating a trick name in some table
using it. By storing the name once for each Trick ID, it is
updated once in one place and referred to by the new
classification everywhere else.
A delete anomaly could occur if the only occurrence of a
trick were deleted with the removal of a record from the
Puppies table with cascading delete of the puppy from
related Tricks. There would then be no occurrence of the
trick name anywhere.
3 - Eliminate Columns not Dependent on Key
PUPPIES
PUPPY_NO
NAME
KENNEL_NO
KENNEL_NAME
KENNEL_LOCATION
• If attributes do not contribute to a description of the key, remove them to a separate table
Puppies table satisfies 1st normal in that there are
no repeating groups, and 2nd normal since there
are no multi-valued keys, but kennel name and
kennel location describe only a kennel, and not a
puppy.
The purpose of 3rd normal is basically the same
as 2nd, and that is to avoid update and delete
anomalies and ease data maintenance. For
example, if there were currently no puppies from a
particular kennel then there would be no record of
that kennel’s existence.
3 - Eliminate Columns not Dependent on Key
• If attributes do not contribute to a description of the key, remove them to a separate table
PUPPY_TRICKS
PUPPY_NO
TRICK_ID
WHERE_LEARNED
SKILL_LEVEL
TRICKS
TRICK_ID
NAME
KENNELS
KENNEL_NO
NAME
LOCATION
PUPPIES
PUPPY_NO
NAME
KENNEL_NO
Third Normal FormPUPPIES
PUPPY_NO
NAME
KENNEL_NO
KENNEL_NAME
KENNEL_LOCATION
Third Normal Form is sufficient for most situations
“The rules leading to and including third normal form can be summed up in a single statement: Each attribute must be a fact about the key, the whole key, and nothing but the key.”
Wiorkowski and Kull
DB2 Design &
Development Guide
4 - Isolate Independent Multiple Relationships
• No table may contain two or more 1:n or n:m relationships that are not directly related
PUPPY_TRICKS
PUPPY_NO
TRICK_ID
WHERE_LEARNED
SKILL_LEVEL
COSTUME
New attribute added
Puppy Tricks is an example of a many to many
relationship in that a puppy may know many tricks
and many puppies may know the same trick.
Suppose a new attribute is added to the Puppy
Tricks table to show puppies that can wear a
certain Costume. The trick and costume attributes
do not share a meaningful relationship. Just
because a puppy can do a specific trick and it can
wear a certain costume, does not necessarily
mean that it can do them together. Therefore, you
should move costumes to a different table.
4 - Isolate Independent Multiple Relationships
PUPPY_TRICKS
PUPPY_NO
TRICK_ID
WHERE_LEARNED
SKILL_LEVEL
PUPPY_COSTUMES
PUPPY_NO
COSTUMENew attribute added
• No table may contain two or more 1:n or n:m relationships that are not directly related
PUPPY_TRICKS
PUPPY_NO
TRICK_ID
WHERE_LEARNED
SKILL_LEVEL
COSTUME
5 - Isolate Semantically Related Multiple Relationships
• There may be practical constraints on information that justify separating logically related many-to-many relationships
KENNELS_BREEDERS_BREEDS
KENNEL_NO
BREEDER
BREED
The need for 5th normal form becomes clear when considering inserts
and deletes. Suppose a kennel decides to offer three new breeds, and
it already deals with three breeders that can supply those breeds. This
will require nine new rows in the Kennels-Breeders-Breeds table, one
for each breeder/breed combination.
{Table to record which breeds are
available in each kennel, and
which breeder supplies dogs to
those kennels.}
• By breaking up the table, only three new entries are required; just thenew breeds for that kennel in Kennels-Breeds. There would be no needto record new breeders since they would already be recorded in theKennels-Breeders table for that kennel, and the Breeders-Breeds tablewould already contain the connection between those breeders and thebreeds.
• Further, the Breeder-Breeds table could have helped the kennel decidefrom which breeder they could acquire the new breeds.
• Fifth normal form can mean important savings if an application involvessignificant update activity.
KENNELS_BREEDS
KENNEL_NO
BREED
KENNELS_BREEDERS
KENNEL_NO
BREEDER
BREEDERS_BREEDS
BREEDER
BREED
KENNELS_BREEDERS_BREEDS
KENNEL_NO
BREEDER
BREED
“In anything at all, perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away.”
Saint-Exupéry
Wind, Sand, and Stars
Entity-Relationship Modeling
• Develop fully normalized Logical Data Modelsnaturally
• Standard process supported by the data modelingtools of almost all Database Design products
• Defines the Logical definition of a system first, inorder to build the best possible Physical database
• Model may be selectively de-normalized forphysical performance considerations
We will illustrate this process using the same sample data items as
in Data Normalization, but working from a logical model instead of
physical tables.
Identify Distinctive Entities from Data Items
Data items for Puppies
Puppy Number
Puppy Name
Kennel Number
Kennel Name
Kennel Location
Trick ID 1..n
Trick Name 1..n
Trick Where Learned 1..n
Skill Level 1..n
Puppy
Puppy No
Name
Trick
Trick ID
Name
Kennel
Kennel No
Name
Location
• Group only those attributes that specifically describe each specific entity (person, place, or thing)
Extra Data Items
Trick Where Learned 1..n
Skill Level 1..n
Identify Distinctive Entities from Data Items
The extra data items not included in the entities
actually provide descriptive information about a
relationship and will be worked with next.
Note the use of singular form with the entities
as opposed to the plural with tables. An entity,
logically speaking, is a single type of thing,
whereas a table is set of things with related
characteristics (relation).
Trick
Trick ID
Name
Extra Data Items
Trick Where Learned 1..n
Skill Level 1..n
Identify Relationships Between Entities
Kennel
Kennel No
Name
Location
Where Learned
Skill Level
Information carried by Relationship
Some relationships carry amplifying information
other than that the entities are related in a
particular way. Non-standard relationships must
be resolved by use of an additional entity.
Try reading the relationships before proceeding
contain
contained in
know known by
Puppy
Puppy No
Name
Trick
Trick ID
Name
Identify Relationships Between Entities
Kennel
Kennel No
Name
Location
contain
contained in
Puppy
Puppy No
Name
Each Kennel may contain one or more Puppies
and
Each Puppy must be contained in one and only one Kennel
Identify Relationships Between Entities
know known by
Puppy
Puppy No
Name
Trick
Trick ID
Name
Each Puppy may know one or more Tricks
and
Each Trick may be known by one or more Puppies
Resolve Non-standard Relationships
Puppy
Puppy No
Name
Kennel
Kennel No
Name
Location
Puppy Trick
Puppy No
Trick ID
Where Learned
Skill Level
contain
contained in
know
known by
included in
include
Trick
Trick ID
Name
Resolve Non-standard Relationships
Puppy
Puppy No
Nameknow
known by
included in
include
Trick
Trick ID
Name
Even though “Trick Where Learned” and “Skill Level” logically describe the
relationship, the only place that this information can be really used is in an entity.
Any 1:n and n:m relationship that carries with it additional information is
considered to be non-standard and must be resolved. Some data modeling tools
and methodologies allow you to define these types of relationships without an
intervening entity, but most do not, and ultimately, they will have to be
implemented in a physical table anyway.
Also note that 1:1 relationships may be defined, and are typically most useful
when extending entities that cannot be modified, such as those included in a
third-party application system.
Puppy Trick
Puppy No
Trick ID
Where Learned
Skill Level
Incorporate New Attributes/Entities
Kennel
Kennel No
Name
Location
contain
contained in
know
known by
included in
Costume
Costume ID
Description
Puppy
Puppy No
Name
Puppy Trick
Puppy No
Trick ID
Where Learned
Skill Level
Trick
Trick ID
Name
Additional analysis identifies an
additional data item. If it further
describes an existing Entity, then
add it as an Attribute, otherwise
create a new Entity.
Costume does not describe any
existing entity so it needs its own.
A better choice for design is to
use an automatic sequence
number for the Costume ID
rather than the actual costume
name for a primary key.
include
Define New Relationships
Trick
Trick ID
Name
Kennel
Kennel No
Name
Location
contain
contained in
know
known by
included in
include
Costume
Costume ID
Description
worn by
trained
to wear
Puppy
Puppy No
Name
Puppy Trick
Puppy Trick ID
Where Learned
Skill Level
Process Repeated for any new Data Items
contain
contained in
know
known by
included in
include
Costume
Costume ID
Description
worn by
trained to
wear
Breeder
Breeder ID
Name
Breed
Breed ID
Name
raised by
raise
supplied
by
supply
sell
Puppy
Puppy No
Name
Kennel
Kennel No
Name
Location
Puppy Trick
Puppy Trick ID
Where Learned
Skill Level
Trick
Trick ID
Name
Many-to-many relationships are perfectly valid logical constructs. Oracle Designer as well as most other ER
Modeling tools will automatically implement them physically with an intersection table containing only the foreign
keys to each related entity’s primary key. They must be manually converted to an intersection entity if they have
amplifying attributes.
Note that tables will be used to implement both entities and relationships. An intersection table is one that
implements a many-to-many relationship.
Entity-Relationship Model
• Fully normalized from the beginning
• Helps identify analysis gaps
• Facilitates building the best possible Physical database
• Documents all types of persons, places, and things of interest to the organization
• Flexible to increased data needs
• Provides the full power available in the relational model