data modeling workshopexpert-data.com/uploads/datamodellingprimer.pdfquote from ims documentation, a...

Data Modeling

Workshop

Expert Data Solutions, Inc.

Prepared by:

Mark Hyatt

Scope of this Workshop

• The Relational Model

• Beginning Data Modeling

In the Beginning, there was Chaos

DATA

Quote from IMS documentation, a well knowpre-relational system:

“Logically deleting a logical child preventsfurther access to the logical child using its logicalparent. Unidirectional logical child segments areassumed to be logically deleted. A logical parentis considered logically deleted when all its logicalchildren are physically deleted. For physicallypaired logical relationships, the physical childpaired to the logical child must also be physicallydeleted before the logical parent is consideredlogically deleted.”

Pre-relational Problems

What is a Relational Database?

• Implementation of data structures and controls basedupon Relational Theory

• Relational Algebra first described in 1970 by Dr. E. F.Codd

• Dr. Codd placed database research on a solidscientific footing

• Set theory based mathematical rules for a data model

• All Relational Database Management Systemscurrently use the Structured Query Language (SQL)for data access and control

• Introductory reading:

A Database Primer by C.J. Date

Entity-Relationship Model

• Used when mapping a real-world system to aRelational Database Management System

• Categorizes all elements of a system as eitheran entity or a relationship

• Three basic steps:– Identify the entity types of your system

– Identify attributes of each entity

– Identify the relationships between entities

• Normalize the model

• Create tables to implement the E-R model

Normalization

• A more complete discussion of Normalizationcomes later in the presentation, but basically, theprocess ensures that each entity has a completeprimary key and that all attributes contribute to adescription of that unique key.

• Normally, each entity is represented by a table,however, this is not always the case whenimplementing entity sub-types.

Drawing Notation

There are several standard notations in use for Entity-Relationship Modeling. The three most commonly used are IE(Information Engineering), IDEF1X (Integration Definition forInformation Modeling) and Barker Notation (developed byRichard Barker and several others at the British Consultingfirm CACI and then adopted by Oracle for its CASE modelingtools). A brief summary of the differences between thesenotations follows, but a detailed discussion of the variousstrengths and weaknesses of each style is beyond the scope ofthis presentation. The examples in this presentation use theOracle CASE*Method, or Barker notation, as it is arguably thesimplest to read and so discuss diagrams with non-technicalbusiness users.

Notation

Notation IDEF1X IE Barker

Relationship Multiplicity

• Zero or one

• One only

• Zero or more

• One or more

Line StyleSolid line signifies Identifying

relationshipAll lines solid and carry no

significanceTwo part line - solid signifies mandatory and dashed optional

Identifying Relationships Property not visible in drawing

Entity Subtypes

Or Constraint N/A N/A

Exclusive Or (XOR) Constraint

N/A

P

1

Z

Child ChildParent Parent

SubtypeSupertype

Subtype

SubtypeSupertype

Subtype

Subtype

Subtype

Supertype

This

Pronoun

That

Us

Pronoun

Them

Us

Pronoun

Them

Entity Type

• Commonly referred to simply as an Entity

• A thing of significance to the system

• Logical equivalent of nouns (person, place, orthing)

• First step of Logical Data Modeling is toidentify Entity Types

Department

Example

Attributes

• Fully describes Entity Types

• Logical equivalent of adjectives

• Each Entity must contain at least oneAttribute and one or more Attributescombined should uniquely identify the EntityType

Department

# Department Number

Department Name

Department Head

Example

Relationships

• Logical equivalent of verbs

• Describes an association between two things(Entities)

• Three basic associations will be discussed:

• one-to-many

• one-to-one

• many-to-many

Department

Department

Skill

Employee

Employee

Dept Head

Relationships

• Each end of a relationship can also be optional ormandatory – “may (be)” or “must (be)”

• Optional relationships are shown as dashed andmandatory as solid lines in Barker Notation.

• They can be read in both directions with key wordssubstituted for optionality and multiplicity, alongwith entity names and the relationship roles.

Department Employeestaffed by

assigned toExample:

Each Department may be staffed by one or more Employees

Each Employee must be assigned to one and only one Department

Physical Data Model

• We will start with a better understanding of thephysical model since that is what will be familiar tomost users.

• Also, the ultimate goal of all Data Modeling is thedevelopment of a well functioning and wellunderstood physical database.

Database Tables

• The structure that physically holds data in a relationaldatabase

• Physical implementation of logical “entity types” andlogical “relationships”

• Must have at least one column (attribute)

• Contains zero or more rows

• Each row in the table has one value in each column(may be NULL)

• Rows are not named, which means the attribute(column) values uniquely define the row, and not bysome pre-assigned label.

Table Examples

PUPPIES

PUPPY_NO

NAME

KENNEL_NO

KENNELS

KENNEL_NO

NAME

LOCATION

PUPPIES

PUPPY_NO NAME KENNEL_NO

52 Fido 953 Fifi 954 Spot 5

KENNELS

KENNEL_NO NAME LOCATION

5 ABC Kennels Any City8 Super Dogs Metropolis9 Bat Kennels Gotham

Guidelines for Creating Tables

• Use descriptive names for tables, columns, indexes, and clusters

• Use only well-known or organization standard abbreviations

• Be consistent in the use of singular and plural forms of table names and columns. Plural for table names, singular for column names.

• Create a primary key for every table

• Avoid duplication of data across and within tables (normalize)

• Document the meaning of each table and column with the COMMENT statement

• Create a unique index for the primary key

Columns

• Called a field in non-relational systems

• Represents one and only one attribute of the entity or a foreign key defining a relationship

• Name should indicate its meaning

• Should not contain any hidden meaning, and therefore should not represent multiple attributes (overloaded fields may be used in programs, but not in the database)

• Identify columns by name, not by position

• Should not consider the physical order within the table as significant

PUPPIES


Column PUPPIES.NAME

Column Example


Rows

• Stores the data of a table

• Represents one occurrence (or instance) of the entity or relationship represented by the table

• Should never be duplicated within a table

• Should not consider the physical (natural) row order within the table as significant

• Number of rows likely to be quite dynamic

• Roughly equivalent to a record in non-relational databases

Scientific Basis

In mathematical terms, a table is verysimilar to an unordered set, without abeginning or end. You achieve logicaldata independence by not relying on afixed internal ordering of a table’s rows.

PUPPIES


Row Example


Value

• The data referenced by the intersection of a given rowand column

• Assumes the data type of its column

• May be NULL (logically “unknown” or “missing”)

• Note that NULL is not the same as 0 or blank

• Any expression using NULL will evaluate to NULL,e.g. 10 * NULL is NULL. The only comparisonoperators you should use with null values are ISNULL and IS NOT NULL. A null value cannot be eitherequal to or not equal to any other value but is ratherunknown

• Oracle treats conditions evaluating to unknown asFALSE and returns NO error message.

PUPPIES


Value Example


Views

• A logical representation of a table orcombination of tables using a stored query

• Data is derived from the tables on which it isbased (base tables) and do not actuallycontain data

• Base tables may in turn be actual tables orother views

• All operations performed on views actuallyaffect the base table

PUPPIES_KENNELS_VIEW

PUPPY_NO PUPPY_NAME KENNEL_NAME LOCATION

52 Fido XYZ Kennels This City53 Fifi XYZ Kennels This City54 Spot ABC Kennels That City

View Example

View Uses

• Provide additional table security byrestricting access to a predetermined set ofrows and/or columns

• Hide data complexity

• Reduce syntactic complexity

• Present data from a different perspective

• Provide a level of referential integrity

• A single view can be created which willdisplay only eastern sales data to the Easternregional staff, another view with only westernsales data to the Western regional staff, andanother with all sales data to the Nationalsales manager, all from the same base table

Security Views

• A single view may be used to return rowsjoined from multiple tables, so users needonly remember the simpler view name

Hide Data Complexity

• Views allow users to select data from multipletables and even multiple databases withoutactually knowing how to perform the joinoperation

Reduce Syntactic Complexity

• Provides a means to rename columns withoutactually affecting the table(s) on which theview is based

Different Perspective

• Using the WITH CHECK option of CREATEVIEW specifies that inserts and updatesperformed through the view should not beallowed when it results in rows that the viewcannot select, and therefore would disappearfrom the view when committed

• In other words, ensuring that no value isdefined that is not supported by theconditions of the view query

Additional Relational Integrity

Indexes

• Provide quick access to rows in a table

• Enforce uniqueness of rows within a table

• Best practices direct creation of a uniqueindex on the primary key of each table

• Oracle, DB2, and SQL Server, however, do notphysically require a unique index for everytable

• Transparent to users

Clusters

• Means of structuring the data in one or moretables so that the rows are physically close toone another

• Improves the performance of certainoperations and decreases the performance ofothers

• Transparent to users

Primary Keys

• Value uniquely identifies each row of the table

• Consists of one or more columns (compositeprimary key)

• Not required by the DBMS, but best practicestypically require a primary key for allproduction database tables

• Values should not change, as relationshipsare most often represented through primarykeys

• Only one primary key allowed in a table

• Implies UNIQUE and NOT NULL

PUPPIES

PUPPY_NO

NAME

KENNEL_NO

KENNELS

KENNEL_NO

NAME

LOCATION

Primary Key Examples

PK

PK

Foreign Keys

• Represent relationships between tables

• Consists of one or more columns (compositeforeign key)

• Values are derived from the primary key ofanother (or same) table

• May contain multiple foreign keys representingmultiple relationships (3rd Normal Form)

• The mechanism used for supporting referentialintegrity

PUPPIES

PUPPY_NO

NAME

KENNEL_NO

KENNELS

KENNEL_NO

NAME

LOCATION

Foreign Key Example

FK

Unique Keys

• Has all the properties of a primary key but fills adifferent function

• NOT NULL constraint must accompany the UNIQUEconstraint

• Forces a unique value for each row in the table

• Not used to identify each row but only to ensureuniqueness

• An example would be to dictate that every truck in acompany have a unique license plate number, butthe primary key is a company assigned trucknumber since the license plate number may changeover time.

Database Terminology

Formal Informal Nonrelational equivalent term

Relational Terms

Relation

Attribute

Tuple

Table

Column

Row

File

Field

Record

Database Terminology

• Relation: Formal term that embraces the conceptsof both table and view, not to be confused with arelationship

• Entity Type: A logical thing of significance, whetherreal or imagined, about which information needs tobe known or held, and implemented as a Relation(normally a table)

• Relationship: An association between two Entities

• Attribute: Any detail that serves to qualify, identify,classify, quantify or express the state of an entity

• Tuple: A set of values for an attribute, synonymouswith row

Data Normalization

• Process by which data is configured formaximum flexibility and minimum redundancyin a relational environment

• Consists of 5 Rules

“The only glory most of us

have to hope for is the glory

of being normal.”Katherine Fullerton Gerould

Data Normalization

Data Normalization implies that you are working inthe physical database world (Physical DataModeling), taking un-normalized real world data and“normalizing” it to improve your data model. Thishas been a traditional requirement when convertinglegacy data systems to relational.

Logical Data Modeling by contrast, should result ina fully normalized state when performed correctly.

Rules for Data Normalization

1. Eliminate Repeating Groups

2. Eliminate Redundant Data

3. Eliminate Columns Not Dependent on Key

4. Isolate Independent Multiple Relationships

5. Isolate Semantically Related Multiple Relationships

1 - Eliminate Repeating Groups

• Make a separate table for each set of related attributes, and give each table a primary key

In order to answer a simple

question such as “Can Fifi roll

over?”, we must first find Fifi’s

Puppy record, then scan the list of

tricks until we find the answer. This

is awkward and inefficient. By

moving the Puppy’s Tricks into their

own entity you can answer the

question with a direct retrieval by

finding if the Puppy No. and Trick

Name occur together.

Data items for Puppies

Puppy Number

Puppy Name

Kennel Number

Kennel Name

Kennel Location

Trick ID 1..n

Trick Name 1..n

Trick Where Learned 1..n

Skill Level 1..n

1 - Eliminate Repeating Groups

• Make a separate table for each set of related attributes, and give each table a primary key


Puppy Number

Puppy Name

Kennel Number

Kennel Name

Kennel Location

Trick ID 1..n

Trick Name 1..n


Skill Level 1..n

PUPPIES

PUPPY_NO

NAME

KENNEL_NO

KENNEL_NAME

KENNEL_LOCATION

TRICKS

PUPPY_NO

TRICK_ID

NAME

WHERE_LEARNED

SKILL_LEVEL

Primary Key

Primary Key

2 - Eliminate Redundant Data

PUPPIES

PUPPY_NO

NAME

KENNEL_NO

KENNEL_NAME

KENNEL_LOCATION

PUPPY_TRICKS

PUPPY_NO

TRICK_ID

WHERE_LEARNED

SKILL_LEVEL

• If an attribute depends on only part of a multi-valued key, remove it to a separate table

TRICKS

TRICK_ID

NAME

TRICKS

PUPPY_NO TRICK_ID TRICK_NAME WHERE_LEARNED SKILL_LEVEL

52 27 Roll Over 16 953 16 Nose Stand 9 954 27 Roll Over 9 5

Redundant

Data

2 - Eliminate Redundant Data

Second normal form is used to make data maintenance

more efficient and also to prevent update and delete

anomalies.

An update anomaly would occur if you wanted to reclassify

a trick and missed updating a trick name in some table

using it. By storing the name once for each Trick ID, it is

updated once in one place and referred to by the new

classification everywhere else.

A delete anomaly could occur if the only occurrence of a

trick were deleted with the removal of a record from the

Puppies table with cascading delete of the puppy from

related Tricks. There would then be no occurrence of the

trick name anywhere.

3 - Eliminate Columns not Dependent on Key

PUPPIES

PUPPY_NO

NAME

KENNEL_NO

KENNEL_NAME

KENNEL_LOCATION

• If attributes do not contribute to a description of the key, remove them to a separate table

Puppies table satisfies 1st normal in that there are

no repeating groups, and 2nd normal since there

are no multi-valued keys, but kennel name and

kennel location describe only a kennel, and not a

puppy.

The purpose of 3rd normal is basically the same

as 2nd, and that is to avoid update and delete

anomalies and ease data maintenance. For

example, if there were currently no puppies from a

particular kennel then there would be no record of

that kennel’s existence.

3 - Eliminate Columns not Dependent on Key

• If attributes do not contribute to a description of the key, remove them to a separate table

PUPPY_TRICKS

PUPPY_NO

TRICK_ID

WHERE_LEARNED

SKILL_LEVEL

TRICKS

TRICK_ID

NAME

KENNELS

KENNEL_NO

NAME

LOCATION

PUPPIES

PUPPY_NO

NAME

KENNEL_NO

Third Normal FormPUPPIES

PUPPY_NO

NAME

KENNEL_NO

KENNEL_NAME

KENNEL_LOCATION

Third Normal Form is sufficient for most situations

“The rules leading to and including third normal form can be summed up in a single statement: Each attribute must be a fact about the key, the whole key, and nothing but the key.”

Wiorkowski and Kull

DB2 Design &

Development Guide

4 - Isolate Independent Multiple Relationships

• No table may contain two or more 1:n or n:m relationships that are not directly related

PUPPY_TRICKS

PUPPY_NO

TRICK_ID

WHERE_LEARNED

SKILL_LEVEL

COSTUME

New attribute added

Puppy Tricks is an example of a many to many

relationship in that a puppy may know many tricks

and many puppies may know the same trick.

Suppose a new attribute is added to the Puppy

Tricks table to show puppies that can wear a

certain Costume. The trick and costume attributes

do not share a meaningful relationship. Just

because a puppy can do a specific trick and it can

wear a certain costume, does not necessarily

mean that it can do them together. Therefore, you

should move costumes to a different table.

4 - Isolate Independent Multiple Relationships

PUPPY_TRICKS

PUPPY_NO

TRICK_ID

WHERE_LEARNED

SKILL_LEVEL

PUPPY_COSTUMES

PUPPY_NO

COSTUMENew attribute added

• No table may contain two or more 1:n or n:m relationships that are not directly related

PUPPY_TRICKS

PUPPY_NO

TRICK_ID

WHERE_LEARNED

SKILL_LEVEL

COSTUME

5 - Isolate Semantically Related Multiple Relationships

• There may be practical constraints on information that justify separating logically related many-to-many relationships

KENNELS_BREEDERS_BREEDS

KENNEL_NO

BREEDER

BREED

The need for 5th normal form becomes clear when considering inserts

and deletes. Suppose a kennel decides to offer three new breeds, and

it already deals with three breeders that can supply those breeds. This

will require nine new rows in the Kennels-Breeders-Breeds table, one

for each breeder/breed combination.

{Table to record which breeds are

available in each kennel, and

which breeder supplies dogs to

those kennels.}

• By breaking up the table, only three new entries are required; just thenew breeds for that kennel in Kennels-Breeds. There would be no needto record new breeders since they would already be recorded in theKennels-Breeders table for that kennel, and the Breeders-Breeds tablewould already contain the connection between those breeders and thebreeds.

• Further, the Breeder-Breeds table could have helped the kennel decidefrom which breeder they could acquire the new breeds.

• Fifth normal form can mean important savings if an application involvessignificant update activity.

KENNELS_BREEDS

KENNEL_NO

BREED

KENNELS_BREEDERS

KENNEL_NO

BREEDER

BREEDERS_BREEDS

BREEDER

BREED

KENNELS_BREEDERS_BREEDS

KENNEL_NO

BREEDER

BREED

“In anything at all, perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away.”

Saint-Exupéry

Wind, Sand, and Stars

Entity-Relationship Modeling

• Develop fully normalized Logical Data Modelsnaturally

• Standard process supported by the data modelingtools of almost all Database Design products

• Defines the Logical definition of a system first, inorder to build the best possible Physical database

• Model may be selectively de-normalized forphysical performance considerations

We will illustrate this process using the same sample data items as

in Data Normalization, but working from a logical model instead of

physical tables.

Identify Distinctive Entities from Data Items


Puppy Number

Puppy Name

Kennel Number

Kennel Name

Kennel Location

Trick ID 1..n

Trick Name 1..n


Skill Level 1..n

Puppy

Puppy No

Name

Trick

Trick ID

Name

Kennel

Kennel No

Name

Location

• Group only those attributes that specifically describe each specific entity (person, place, or thing)

Extra Data Items


Skill Level 1..n

Identify Distinctive Entities from Data Items

The extra data items not included in the entities

actually provide descriptive information about a

relationship and will be worked with next.

Note the use of singular form with the entities

as opposed to the plural with tables. An entity,

logically speaking, is a single type of thing,

whereas a table is set of things with related

characteristics (relation).

Trick

Trick ID

Name

Extra Data Items


Skill Level 1..n

Identify Relationships Between Entities

Kennel

Kennel No

Name

Location

Where Learned

Skill Level

Information carried by Relationship

Some relationships carry amplifying information

other than that the entities are related in a

particular way. Non-standard relationships must

be resolved by use of an additional entity.

Try reading the relationships before proceeding

contain

contained in

know known by

Puppy

Puppy No

Name

Trick

Trick ID

Name


Kennel

Kennel No

Name

Location

contain

contained in

Puppy

Puppy No

Name

Each Kennel may contain one or more Puppies

and

Each Puppy must be contained in one and only one Kennel


know known by

Puppy

Puppy No

Name

Trick

Trick ID

Name

Each Puppy may know one or more Tricks

and

Each Trick may be known by one or more Puppies

Resolve Non-standard Relationships

Puppy

Puppy No

Name

Kennel

Kennel No

Name

Location

Puppy Trick

Puppy No

Trick ID

Where Learned

Skill Level

contain

contained in

know

known by

included in

include

Trick

Trick ID

Name

Resolve Non-standard Relationships

Puppy

Puppy No

Nameknow

known by

included in

include

Trick

Trick ID

Name

Even though “Trick Where Learned” and “Skill Level” logically describe the

relationship, the only place that this information can be really used is in an entity.

Any 1:n and n:m relationship that carries with it additional information is

considered to be non-standard and must be resolved. Some data modeling tools

and methodologies allow you to define these types of relationships without an

intervening entity, but most do not, and ultimately, they will have to be

implemented in a physical table anyway.

Also note that 1:1 relationships may be defined, and are typically most useful

when extending entities that cannot be modified, such as those included in a

third-party application system.

Puppy Trick

Puppy No

Trick ID

Where Learned

Skill Level

Incorporate New Attributes/Entities

Kennel

Kennel No

Name

Location

contain

contained in

know

known by

included in

Costume

Costume ID

Description

Puppy

Puppy No

Name

Puppy Trick

Puppy No

Trick ID

Where Learned

Skill Level

Trick

Trick ID

Name

Additional analysis identifies an

additional data item. If it further

describes an existing Entity, then

add it as an Attribute, otherwise

create a new Entity.

Costume does not describe any

existing entity so it needs its own.

A better choice for design is to

use an automatic sequence

number for the Costume ID

rather than the actual costume

name for a primary key.

include

Define New Relationships

Trick

Trick ID

Name

Kennel

Kennel No

Name

Location

contain

contained in

know

known by

included in

include

Costume

Costume ID

Description

worn by

trained

to wear

Puppy

Puppy No

Name

Puppy Trick

Puppy Trick ID

Where Learned

Skill Level

Process Repeated for any new Data Items

contain

contained in

know

known by

included in

include

Costume

Costume ID

Description

worn by

trained to

wear

Breeder

Breeder ID

Name

Breed

Breed ID

Name

raised by

raise

supplied

by

supply

sell

Puppy

Puppy No

Name

Kennel

Kennel No

Name

Location

Puppy Trick

Puppy Trick ID

Where Learned

Skill Level

Trick

Trick ID

Name

Many-to-many relationships are perfectly valid logical constructs. Oracle Designer as well as most other ER

Modeling tools will automatically implement them physically with an intersection table containing only the foreign

keys to each related entity’s primary key. They must be manually converted to an intersection entity if they have

amplifying attributes.

Note that tables will be used to implement both entities and relationships. An intersection table is one that

implements a many-to-many relationship.

Entity-Relationship Model

• Fully normalized from the beginning

• Helps identify analysis gaps

• Facilitates building the best possible Physical database

• Documents all types of persons, places, and things of interest to the organization

• Flexible to increased data needs

• Provides the full power available in the relational model

Questions?

[email protected]

mailto:[email protected]?subject=Data%20Modeling