developerdmodel
TRANSCRIPT
-
7/31/2019 developerdmodel
1/23
Michael Brydon (mj [email protected])Last update: 02-May-01 1 o f 23
Lesson 3: An introduction to data modeling
3.1 Introduction: The importance of
conceptual models
Before you sit down in front of the keyboardand start creat ing a database applicat ion, i t iscritical that you take a step back and consideryour business problemin this case, the kitchensupply scenario presented in Lesson 2from aconcept ualpoint of view. To facilitate this
process, a number of conceptual modelingtechniques have been developed by computerscientists, psychologists, and consultants.
For our purposes, we can think of aconceptual model as a pictureof t heinformat ion system we are going to build.
To use an analogy, conceptual models areto informat ion systems what blueprintsare to buildings.
There are many different conceptual modelingtechniques used in practice. Each techniqueuses a dif ferent set of symbols and may focus on
a dif ferent part of the problem (e.g., data,processes, informat ion flows, objects, and soon). Despit e dif ferences in notation and focus,however, the underlying rationale forconceptual modeling techniques is always the
same: understand the problem before you startconstructing a solution.
There are two important things to keep in mindwhen learning about and doing data modeling:
1. Data modeling is f irst and foremost a toolfor communicat ion.Their i s no single rightmodel. Instead, a valuable model highlightstricky issues, allows users, designers, andimplementors to discuss the issues using thesame vocabulary, and leads to better designdecisions.
2. The modeling process is inherent lyiterative: you create a model, check itsassumpt ions with users, make the necessarychanges, and repeat the cycle unt il you are
sure you understand the critical issues.
In this background lesson, you are going to use adata modeling techniquespecif ically, Entity-Relationship Diagrams (ERDs)to model thebusiness scenario from Lesson 2. The datamodel you create in this lesson will form the
foundat ion of the database that you usethroughout the remaining lessons.
?
http://scenario.pdf/http://scenario.pdf/http://scenario.pdf/http://scenario.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
2/23
Introduct ion: The importance of conceptualAn int roduct ion to data modeling
3.1.1 What is data modeling?
A data model is a simply a diagram thatdescribes the most important things in your
business environment from a data-centric pointof view. To illustrate, consider the simple ERDshown in Figure 3.1. The purpose of t he diagramis to describe the relationship between the datastored about products and the data storedabout the organizations that supply theproducts.
3.1.1.1 Entities and attributes
The rectangles in Figure 3.1 are called entitytypes (typicall y short ened to ent it ies ) and
the ovals are called attributes. The ent it ies arethe things in the business environment aboutwhich we want to store data. The attributesprovide us with a means of organizing andstructuring the data. For example, we need tostore certain information about the productsthat we sell, such as the typical selling price of
the product ( Unit price ) and the quanti ty ofthe product current ly in inventory ( Qty onhand ). These pieces of data are att ributes ofthe Product entity.
It is important to note t hat the precise mannerin which data are usedand processedwithin a
particular business application is a separateissue from data modeling. For example, thedata model says nothing about howthe value of Qty on hand is changed over t ime. The focusin data modeling is on capturing data about theenvironment . You wil l learn how to change thisdata (e.g., process orders so that the inventory
values are updated) once you have masteredthe art of database design.
A data modeler assumes that if the rightdata is available, the other elements ofthe applicat ion wil l f all into placeeff ort lessly and wonderfully. For now, t his
is a good working assumption.
FIGURE 3.1: An ERD showing a relationshipbetween products and suppliers.
Product
supplied
by
Supplier
Unit price
ProductID
Qty on hand
Name
Address
Entity
Attributes
Relationship
Cardinality
?
http://map.pdf/ -
7/31/2019 developerdmodel
3/23
Introduct ion: The importance of conceptualAn int roduct ion to data modeling
3.1.1.2 Notation for relationships
In addit ion to ent it ies and at t ributes, Figure 3.1shows a relationship between the two ent it ies
using a line and a diamond. The relationshipconst ruct is usednot surprisinglyto indicatethe existence or absence of a relationshipbetween entit ies. A crows foot at either end ofa relationship line is used to denote thecardinality of the relationship.
For example, the crows foot on the productside of t he relationship in Figure 3.1 indicatesthat a particular supplier may provide yourcompany with several different products, suchas bowls, spatulas, wire whisks and so on. Theabsenceof a crows foot on the supplier sideindicates that each product in your inventory is
provided by a single supplier. Thus, therelationship in Figure 3.1 indicates that youalways buy all your wire whisks from the samecompany.
3.1.1.3 Modeling assumptions
The relationship shown in Figure 3.1 is calledone-to-many: each supplier supplies manyproducts (where many means any numberincluding zero ) but each product is supplied byone supplier (where one means at mostone ).
The decision to use a one-to-many relationship
reflects an assumption about the business
environment in which your wholesale companyoperates. However, it is easy to imagine adif ferent environment in which each product issupplied by multiplesuppliers. For example,many suppliers may carry a part icular brand ofwire whisk. When you run out of whisks, it is upto you to decide where to place your order. Inother words, it is possible that a many-to-manyrelationship exists between suppliers andproducts.
If mult iple supplier exist , at t ributes of t heproduct, such as it s price and product numbermay vary from supplier to supplier. In thissit uat ion, the data requirements of a many-to-many environment are slight ly more complexthan those of the one-to-many environment. Ifyou design and implement your database around
the one-to-many assumption but then discoverthat certain goods are supplied by multiplesuppliers, much effort is going to be required tofix the problem.
Herein lies the point of drawing an ERD:The diagram makes your assumpt ionsabout the relationships wit hin a part icularbusiness environment explicit beforeyoustart building things.
3.1.1.4 The role of the modeler
In the environment used in these tutorials, you
are the user, the designer, and the implementor
?
http://map.pdf/ -
7/31/2019 developerdmodel
4/23
Introduct ion: The importance of conceptualAn int roduct ion to data modeling
of the system. In a more realistic environment,however, these roles are played by dif ferentindividuals (or groups) wit h dif ferentbackgrounds and priorit ies. For example, acommon stereotype of implementors(programmers, database specialists, and so on)is that they seldom leave their cubicles tocommunicate with end-users of the softwarethey are wri t ing. Similarly, it is generally safe toassume that users have no interest in, or
understanding of, low-level technical details(such as the cardinality of relat ionships onERDs, mechanisms to enforce referent ialintegrity, and so on). Thus, it is up to thebusiness analyst to bridge the communicationgap between the different groups involved inthe construction, use, and administration of an
information system.As a business analyst (or more generall y, adesigner), it is critical that you walk throughyour conceptual models with users and makesure that your modeling assumptions areappropriate. In some cases, you may have toexamine sample data from the existingcomputer-based or manual system to determinewhether (for instance) there are any productsthat are supplied by multiple suppliers.
At the modeling stage, making changes such asconverting a one-to-many relationship to amany-to-many relationship is t ri vialall t hat is
required is the addit ion of a crows foot to one
end of the relationship, as shown in Figure 3.2.In contrast , making the same change once youhave implemented tables, built a userinterface, and writ ten code is a t ime-consumingand frustrating chore.
Generally, you can count on the 10
ruleof t humb when building software: t hecost of making a change increases by anorder of magnitude for each stage of thesystems development lifecycle that youcomplete.
FIGURE 3.2: An ERD for an environment inwhich there is a many-to-many relationship
between products and suppliers.
Product
supplied
by
Supplier
Unit price
ProductID
Qty on hand
Name
Address
The addition of asecond crowsfoot transformsthe one-to-manyrelationship into amany-to-manyrelationship.
!
http://map.pdf/ -
7/31/2019 developerdmodel
5/23
Introduct ion: The importance of conceptualAn int roduct ion to data modeling
3.1.2 Core modeling constructs and notation
Data modelers typically adopt a set ofnotational conventions so that their diagrams
are consistent . For example, large ITorganizations and consultancies typically adopta methodology1a set of tools and proceduresfor applying the tools that specif ies thenotat ion used within the organization. Enforcingstandardizat ion in t his way facili tates teamworkon large projects. Similarly, if a computer-
aided soft ware engineer ing (CASE) tool is usedfor conceptual modeling and design, notationalconvent ions are oft en enforced by the soft ware.
What follows is a brief summary of thenotational conventions that I use when drawingERDs. Keep in mind, however, that ERDs are f irst
and foremost a tool f or communicat ion betweenhumans. As such, the precise notat ion you use isnot part icularly important as long as people canread and understand the diagrams. Withexperience, you wil l come to realize thatdifferences in the shapes of the boxes and lineshave little effect on the core concepts of data
modeling.
3.1.2.1 Entities
Entities are drawn as rectangular boxescontaining a noun in singular form, as shown in
Figure 3.3.
You will see later t hat each ent it y you drawult imately becomes a table in your database.You might want to keep this t ransformationfrom entity to table in mind when selecting thenames of your enti t ies. For example, yourentity names should be short but descriptive.
3.1.2.2 Relationships
A relationship between entit ies is drawn as aline bisected by a diamond. The diamondcontains a verb (or short verb phrase) thatdescribes the nature of the relationship
between the entities, as shown in Figure 3.4.
Named relat ionships are used to make the ERDsmore readable. However, unlike entit y names,relationship names never show up in the finaldatabase. Consequently, it does not reallymat ter how you label your relat ionships, as long1 It can be argued that t he term method is
grammatically preferable. In Europe, for example,
the term method tends to be favored.
Customer
FIGURE 3.3: An entity named Customer .
http://map.pdf/ -
7/31/2019 developerdmodel
6/23
Introduct ion: The importance of conceptualAn int roduct ion to data modeling
as the labels make the diagram easier to
interpret.To illustrate, consider the relationship betweenproducts and suppliers shown in Figure 3.1. Therelationship is described by the verb phrase supplied by . Alt hough one could have optedfor t he short er relationship name has instead,the result ing diagram (e.g., Supplier has
product ) would be more dif f icult for readers ofthe diagram to interpret.
3.1.2.3 Relationship direction
One issue that sometimes troubles neophytedata modelers is that the directionof t he
relationship is not made explicit on thediagram. Returning to Figure 3.1, it is obviousto me (since I drew the diagram) that therelationship should be read: Product issuppli ed bysupplier. Reading the relat ionshipin the other direct ion ( Supplier is suppli edbyproduct ) makes very lit t le sense to anyone who
is familiar with the particular problem domain.
General ly, ERDs make certain assumpt ionsabout the readers knowledge of the underlyingbusiness domain.
A notational convention support ed bysome CASE tools is to require two namesfor each relationship: one that makessense in one direction (e.g., is suppliedby ), and another that makes sense in t heopposit e direct ion (e.g., supplies ).
Although double-naming may make thediagram easier to read, i t also addsclutter (twice as many labels) andimposes an additional burden on themodeler.
3.1.2.4 Cardinality
As discussed in Sect ion 3.1.1.2, the cardinalityof a relationship const rains the number ofinstances of one entity type that can beassociated with a single instance of the otherentity type.
The cardinality of relationships has animportant impact on number andstructure of the tables in the database.Consequently, it is important to get thecardinali ty right on paper before start ingthe implementation.
buys
FIGURE 3.4: A relat ionship named buys .
?
!
http://map.pdf/ -
7/31/2019 developerdmodel
7/23
Introduct ion: The importance of conceptualAn int roduct ion to data modeling
There are three fundamental types ofcardinality in ERDs:
One-to-many You have already seen an
example of a one-to-many relationship inFigure 3.1. You wil l soon discover that one-to-many relationships are the bread andbutter of relational databases.
One-to-one At this point in your datamodeling career, you should avoid one-to-one relationships. To illustrate the basicissue, consider the ERD shown in Figure 3.5.Based on an exist ing paper-based system,the modeler has assumed that eachcustomer is associated wit h one customerrecord (i.e., a paper form containinginformation about the customer, such as
address, fax number, and so on). Clearly,each customer has only one customerrecord and each customer record belongs toa single customer. However, if we automatethe system and get rid of the paper form,then there is no reason not to combine theCustomer and Customer Record ent it ies into
a single entity called Customer.
In many cases, one-to-one relationshipsindicate a modeling error. When you havea one-to-one relationship such as the oneshown in Figure 3.5, you should combinethe two entities into a single entity.
Many-to-many The world is full of many-to-many relationships. A well-used exampleis Student takes course. Many-to-manyrelat ionships also arise when you considerthe historyof an ent it y. To il lust rate,consider the ERD shown in Figure 3.6. Atf irst glance, t he relationship betweenFamily and Single-Family Dwelling (SFD)might seem to be one-to-one since aparticular family can only live in one SFD ata t ime and each SFD can (by definit ion) onlycontain a single family. However, it ispossible for a family to live in different
houses over time. Similarly, it is possiblethat many famil ies inhabit a part icularhouse over the years. Thus, if the conceptof t ime is considered, the relationshipbecomes many-t o-many.
We will discuss how you go about determiningcardinalit y in subsequent sections. At this point ,
it is sufficient to recognize that there are two
!
FIGURE 3.5: An incorrect one-to-onerelationship
Customer associatedwith
CustomerRecord
http://map.pdf/ -
7/31/2019 developerdmodel
8/23
Introduct ion: The importance of conceptualAn int roduct ion to data modeling
popular (and equivalent) approaches todenoting one and many on an ERD: thecrows foot notation you have already seen andthe 1:N notat ion.
1. Crows foot notation In the crows footnotation, three little lines (resembling acrows foot ) are used to indicate many .
Not surprisingly, the absence of a crowsfoot indicates one . Thus, t he relationshipin Figure 3.7 indicates that each product issupplied by at most one supplier, whereas each supplier may supply many products. .
2. 1:N notat ion In the 1:N notation, thesymbol N (and/ or M ) is used to indicate many whereas 1 is used to indicate one . An example of the 1:N notation isshown in Figure 3.8.
The ERD model also supports addit ionalcardinality information in the form ofcardinality constraints. To keep things
simple, however, the discussion ofcardinali t y const raints is deferred unti lSect ion 6.3.2.
3.1.2.5 Attributes
At t ributes are propert ies or characterist ics of a
particularly entity about which we wish tocollect and store data. In addition, there istypicall y one at t ribute that uniquely identi fiesparticular instances of the entity. For example,each of your customers may have a uniquecustomer ID. Such attributes are known as keyattributes.
FIGURE 3.6: What is the cardinali t y of thisrelationship?
Family livesin
Single-Family
Dwelling
Product suppliedby
Supplier
FIGURE 3.7: A one-to-many relationship incrows foot notation
Productsupplied
by Supplier
N 1
FIGURE 3.8: A one-to-many relat ionship in1:N notation
?
http://tables2.pdf/http://tables2.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
9/23
Learning object ives9 o f 23
An int roduct ion to data modeling
For ERDs that are drawn manually, attributesare t radit ionally shown as ovals containing thename of the attribute. If the attribute is a key,it is typically underlined. A number of
at t ributes for t he Customer entit y are shown inFigure 3.9.
Adding attributes to ERDs can result invery clut tered diagrams. Some CASE toolslist the at t ributes inside the enti tyrectangle (see the discussion of CASEtools in Sect ion 3.4.5). Another way to
reduce clut ter is to only show a handful ofcritical attributes on the diagram.
3.2 Learning objectives
! understand the core constructs of theent it y-relat ionship model
! create an ERD based on yourunderstanding of a business scenario
! use associative entities to add
attributes to relationships! gain some famil iarity wit h the role of
data modeling and CASE tools in t hedevelopment process
3.3 Exercises
In the sections that follow, we step through theconstruction of an ERD for the kitchen supplyscenario. By following along, you should gain abetter understanding of the basic techniquesinvolved in data modeling as well as some of thedesign pit falls that should be avoided.
If this lesson was on how to play golf, youwould not read it and then assume thatyou are a good golfer. Golf is a skill thatrequires both theoretical knowledge andhours of practice. Thus, the only way tobecome a good golfer is to acquire a solidunderstanding of the fundamentals and
then go out and hit thousands of balls.The same principles apply to datamodeling.
3.3.1 Starting simple
Let us begin with the simplest and most
essential statement one can make about the
Customer
CustID
Name Phone No.
Contact person
FIGURE 3.9: A number of attributes of theCustomer entity are shown in ovals.
?!
E iA i d i d d li
http://map.pdf/ -
7/31/2019 developerdmodel
10/23
Exercises10 o f 23
An int roduct ion to data modeling
wholesaling environment: cust omers buyproducts. It is natural that you would want toboth automate and informate (recallSect ion 2.4) t his important business process.
3.3.1.1 Step one: identify the entities
Ent it ies are physical things, organizations, rolesand events about which we want to storeinformation. In the wholesaling scenario, twoentit ies are immediately obvious: Customer and
Product. However, before we add the entit ies toour diagram, it is important that we have a f irmunderstanding of what exact lythese entitiescorrespond to in t he real world:
1. Customers In the wholesalingenvironment , a customer is an organization,not a person. There may be a single personat the organization through whom weconduct our business. We will refer to thisperson as the contact person for thecustomer in order to maintain a cleandistinction between people andorganizations.
2. Products It is not immediately clearwhether t he Product ent it y refers to aspecif ic it em or a class of similar items. Forexample, one of the products you sell is the Fat Cat mug. The Product ID of t he mug is 88 4017 and it normally sells for $5.50.Note, however, that there are many
individual Fat Cat mugs and each one is
slightly dif ferent due to irregularit ies,variations in paint ing, and so on. In ourcase, there are advantages to ignoring theindividualit y of each mug and treating them
all as a single group of interchangeableit ems. Thus, when we talk about aproduct or a SKU , we are talking aboutan ent ire classof similar instances, notindividual instances themselves.1
Having made these assumptions explicit ly, we
can now create our first ERD.
Take out a piece of paper and a pencil.
Unless you have a special-purpose CASEtool, it is seldom worth the effort to drawthe early drafts of your conceptual
models on a computer.
ERDs typicall y require many modif icat ionsso you should not invest much timemaking your diagrams look nice. In fact ,the diagram you are about to begin will
1 In some environments, it may be necessary to treat
products as individual items. For example, in the
aerospace industry, there is a requirement to track
individual parts by serial number in case a part fails.
The requirement for unit eff ectivit y necessit ates a
dif ferent set of assumptions about t he Product ent it y
and thus leads to a different database design.
?
!
E iA i t d t i t d t d li
http://scenario.pdf/http://scenario.pdf/http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
11/23
Exercises11 o f 23
An int roduct ion to data modeling
end up the in recycle bin by the end ofthe lesson.
Add the Customer and Product entities to
your diagram, as shown in Figure 3.10.
3.3.1.2 Step two: specify a relationship between the
entities
We know that customers buy products and thatproducts are bought by customers. It is a simplemat ter t o create a relationship to communicatethis fact.
Add a relationship line between theCustomer and Product entities.
Label t he relat ionship buys , as shown inFigure 3.11.
Unlike flow charts, the arrangement ofboxes and the direction of lines in an ERDhave no signif icanceany arrangement
that f its on the page is valid. Similarly,
the relationship line does not denote any
type of sequence or f low of informat ion.
3.3.1.3 Step three: determine the cardinality of the
relationship
Each customer can buy many productsindeed,that is the whole purpose of being in this l ine ofbusiness. To show this possibil it y on our ERD, weadd a crows foot to the Product side of therelat ionship l ine.
Similarly, each product can be purchased bymany customers. For example, a number of ourcustomers may chose to stock the Fat Catmug (keeping in mind that the product refers to
a style of mug, not an individual mug). As aresult, a crows foot is added to the Customerside of the relationship.
Designate the buys relationship as many-to-many using the crows foot notation, asshown in Figure 3.12.
FIGURE 3.10: Add the first two entities toyour ERD.
Customer Product
?
FIGURE 3.11: Add a relationship betweenthe two enti t ies.
Customer Productbuys
ExercisesAn int roduct ion to data modeling
http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
12/23
Exercises12 o f 23
An int roduct ion to data modeling
3.3.1.4 Step four: identify a few important attributes
There is a need to f ind a balance between thedescript iveness of the ERD and the ease withwhich others can decipher it . As such, I preferto show only a handful of attributes on the
ERDs.
Add a small number of important att ributesto your diagram, as shown in Figure 3.13.
By adding at t ributes such as Qty on hand tothe Product ent it y, it is clear t hat the ent it y
refers to a class of products, not an individual
product. In cont rast , if Serial number wereadded to the diagram as an at t ribute instead of Qty on hand , the reader of the ERD wouldcome to a different conclusion about themeaning of the Product enti ty.
3.3.2 Dealing with many-to-many
relationships
Although the diagram in Figure 3.13 istechnically correct, it is missing a great deal ofinformat ion about how a purchase t ransact ion
occurs in reali ty.
To il lust rate, consider the at t ribute Unit pricebelonging to the Product ent it y. Unit pricecontains the defaultsell ing price of theproduct. To understand why it is the defaultprice, consider the case of a Fat Cat mug that
typically sells for $5.50. What if there is a
FIGURE 3.12: Indicat e t he many-to-manycardinality of the relationship.
Customer Productbuys
Add a crows foot to the customerside to indicate that each productis purchased by many customers.
1111
Add a second crows foot to theproduct side to indicate that eachcustomer purchases many products.
2222
FIGURE 3.13: An init ial ERD for t he kit chensupply environment.
Customer Productbuys
ProductID
Unit price Qty on hand
CustID
Name Contact person
ExercisesAn int roduct ion to data modeling
http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
13/23
Exercises13 o f 23
An int roduct ion to data modeling
part icular mug that has a minor f law? Althoughthe mug can still be sold, the customer mayexpect a discounted price to compensate forthe flaw. The question is therefore: Where on
the diagram do we indicate the act ualsellingprice of a part icular mug?
A second example is the purchase quantity.What if the customer purchases a dozen FatCat mugs? Where is this informat ion recorded?The Qty ordered at t ribute does not belong to
the Customer entity because customers ordermany products besides mugs. Similarly, Qtyordered does not belong to the Product ent it ybecause different customers may orderdif ferent quanti t ies.
This is a problem t hat t ypically arises in many-
to-many relationships: There are certainimportant at t ributes that do not seem to belongto eit her of t he ent it ies part icipat ing in therelationship. The solut ion is to assign theat t ributes to the relationship it self .
3.3.2.1 Attributes of relationships
The issues surrounding price and order quantityarise because the attributes belong to theinteractionof t he ent it ies in t he many-to-manyrelationship. Thus, the price of a flawed mug isan at t ribute of a part icular product being soldto a particular customer on a particular day.
To summarize, there are a number of att ributesthat should be att ached to the buysrelat ionship in Figure 3.13:
Date the date on which the purchase ismade;
Actual pr ice the price at which the it em(or multiple items within the same class ofproducts) are actually sold to the customer;
Quant it y ordered the number of items
with a certain product ID requested by thecustomer; and,
Quanti ty shipped the actual number ofitems shipped to the customer.
3.3.2.2 Associative entities
Given the number and importance of theat t ributes at tached to the buys relat ionship,it makes sense to t reat the relationship as anent it y in it s own right. To transform arelationship into an entity on an ERD, we use aspecial symbol called an associati ve ent it y. Thenotat ion for an associat ive entit y is a
relationship diamond nested inside of an entityrectangle, as shown in Figure 3.14.
To t ransform your many-to-many relat ionship(without attributes) into an associative entity(wit h at t ributes), do the following:
ExercisesAn int roduct ion to data modeling
http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
14/23
Exercises14 o f 23
An int roduct ion to data modeling
Draw a rectangle around the buysrelationship.
Replace the relationship name buys with
an appropriate noun, for example Sale .
Remember, ent it iesincluding associat iveentit iesare named with nouns.
Decompose the many-to-many relationshipinto two one-to-many relationships.
Add the attributes to the associative entity,as shown in Figure 3.14.
Although Sale is now t reated as an ent ity,there is no requirement to add
relationship diamonds between Product
and Sale or Customer and Sale. Anassociative entity serves as an entity anda relationship at the same time.
The meaning of the Sale associative entity inFigure 3.14 is the following: Each customer canbe involved in many sales t ransactions, but eachindividual sales transaction involves only onecustomer. Similarly, each product can beinvolved in many sales transactions, but eachsales transaction involves only one type ofproduct.
3.3.2.3 Illustration
To better understand how an associative enti tyworks, it is wort hwhile to jump ahead a bit andconsider what the data might look l ike. In
Figure 3.15, sample data for the Customer,Product, and Sale entities are shown. There area couple of interesting things to notice aboutthe sample data:
1. Each ent it y contains informat ion relevant tothat entity only. For example, Customeronly contains informat ion about customers;Product only contains information aboutproducts, and so on.
2. Each row in the Sale ent it y shows thedetails of a single sales transaction. Eachsales transaction consists of a particularproduct being sold t o a part icular customer.
The transaction-specific information (such
FIGURE 3.14: Transform a many-t o-manyrelationship into an associative entity.
Customer ProductSale
DateQuantity ordered
Quantity shipped Actual price
?
!
ExercisesAn int roduct ion to data modeling
http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
15/23
Exercises15 o f 23
An int roduct ion to data modeling
as the actual selling price and quantity
ordered) are at t ributes of t he Sale ent ity.3. By using the data for t he Sale entit y, i t is
possible to determine which products havebeen purchased by a particular customer.Similarly, it is possible to determine whichcustomers have purchased a part icularproduct.
4. Only the minimum amount of informat ion
required to ident if y the customer andproduct is included in the Sale associativeentity. For example, neither the name ofthe customer or t he description of t heproduct appear in Sale since thisinformat ion can easily be found elsewhereusing the values of Customer ID and
Product ID respectively. In the context of
FIGURE 3.15: Data showing t he role of an associat ive ent ity
Each customer can
participate in manysales transactions.
Each product canparticipate in manysales transactions.
Customer Product
Sale
ExercisesAn int roduct ion to data modeling
http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
16/23
Exercises16 o f 23
An int roduct ion to data modeling
the Sale ent ity, the Customer ID andProduct ID attributes are called foreignkeys (foreign keys are so important thatLesson 6 is devoted to the topic).
3.3.3 Revising the ERD
There is an important problem with the ERD asit s now stands. The const raint that each salestransaction involves only a single productappears to be at odds wit h the realit y of t he
business situation described in Lesson 2. Forexample, the sales t ransact ions wit h whichwe are most familiarthe customer orders youreceive by faxtypically request manydif ferent products: a dozen Fat Cat mugs,two dozen spatulas, some wire whisks, and soon. The mismatch between the diagram and the
business environment means that the ERD mustbe revised.
3.3.3.1 Identifying the problem
The problem with the ERD in Figure 3.14 is thatignores the t echnology(broadly speaking) used
by customers to place orders. Specifically,customers normall y wait until they need enoughstock t o make an order worthwhile. In addit ion,factors such as minimum order values andshipping costs favor the batchingof small,single-product orders into large, mult i-productorders.
By taking the technology used for ordering intoaccount, it becomes clear that we have failedto model an important event ent it y: the arrivalof an order.
Be carefulnot all pieces of paper in theexist ing business process areautomatically event ent it ies. Forexample, the invoices that we send to ourcustomers are more properly t hought ofas reports(which can be generated fromthe informat ion already contained inother enti t ies). Wit h pract ice, t hedistinction between entities and non-entities will become clear.
3.3.3.2 Adding the new entity
In this sect ion, you are going to modify your ERDto include an Order entity.
Create a new ERD consist ing of ent it ies forcustomers, orders and products.
Here is where a CASE tool pays off : you
can delete entities or move entitiesaround the screen and the relat ionshipsand connector lines follow automatically.
Create relat ionships to reflect the fact thatcustomers place orders and orders consistof products.
!
?
ExercisesAn int roduct ion to data modeling
http://tables2.pdf/http://scenario.pdf/http://scenario.pdf/http://tables2.pdf/http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
17/23
17 o f 23
g
Add cardinali ty symbols to reflect the factthat each customer can place many orders,but each order belongs to a single customer.
Add cardinali ty symbols to reflect the factthat each order can contain many productsand each product may be contained in manyorders.
Add a handful of attributes to the diagramto help clarify the meaning of the entities.
The result ing ERD is shown in Figure 3.16.
3.3.3.3 Creating OrderDetails
Although Figure 3.16 is a great improvementover our previous ERD, much of the sameinformation missing from Figure 3.13such asactual price and quantity orderedis missingfrom the new ERD. As a consequence, we mustt ransform the contains relationship into anassociative entity with its own attributes.
Transform the contains relationship into
an associative ent it y using the proceduredescribed in Sect ion 3.3.2.2.
To remain consistent with MICROSOFTssample databases, I recommend using thename Order Detail for t he associat iveent it y. Alternat ives include Line Item
and Order Item .
The result ing ERD is shown in Figure 3.17. Tohelp understand the relationship between theOrder entity and the Order Detail associative
ent it y, look at t he orders included in the projectpackage. Each order has header information(such as customer, order dat e, and so on) andmult iple order details. Each detail hasinformat ion such as quant it y ordered, quant it yshipped, and price.
?
FIGURE 3.16: Revise t he ERD to include anOrder entity.
Customer
Product
places Order
contains
ProductID
Unit price
Qty on hand
CustID Name
Contact person
OrderID
Date
DiscussionAn int roduct ion to data modeling
http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
18/23
18 o f 23
g
3.4 Discussion
3.4.1 Logical versus physical models
A distinction is typically made between logicaland physical data models:
Logical data models logical models
capture general informat ion about ent it ies
and relationships and are used forcommunication with business users.
Physical data models physical models
serve as a precise specif icat ion for theimplemented system. As a consequence,the models must take into account thetechnology used to store the data. Forexample, a given logical data modelt ranslates into very dif ferent physical datamodels depending on whether the target
technology is a file-based system, arelat ional database, or an object-orienteddatabase.
Normally, you start with a high-level logicalmodel and refine with the help of users overseveral iterat ions. Once you are happy wit h thelogical model, you transform it into a physicalmodel and hand it to a database administrator(DBA) for implementation as a database.
For relational databases, the translationprocess from logical to physical is relativelyst raightforward and involves the foll owingsteps:
1. Decompose all many-to-manyrelationshipsSince the relationaldatabase model does not support many-to-many relationships, you must replace allmany-to-many relationships withassociative entities as described in
Sect ion 3.3.2.2.
FIGURE 3.17: Create an associat ive ent it y tomodel individual order details.
Customer
Product
places Order
ProductID
Unit price
Qty on hand
CustID Name
Contact person
OrderID
Date
Orderdetail
Actual price
Quantity ordered
Quantity shipped
DiscussionAn int roduct ion to data modeling
http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
19/23
19 o f 23
2. Add at t ributes The data model should be fully at t ributed before handing it over t oa DBA.
3. Identify primary keys Each entityrequires an attribute that uniquelyidentifies instances.
4. Add foreign keys In relational databases,relationships between entities areimplemented using foreign keys.
At this point , i t is not crit ical t hat you
understand each of these steps (you wil l get lotsof practice in subsequent lessons). What isimportant is that you understand that there is aclear progression f rom high-level, graphical,conceptual models to low-level databaseschemas.
3.4.2 ERDs versus the ACCESS relationship
window
If you have used the relationship window inMICROSOFT ACCESS, you know that the relat ionshipdiagrams used by ACCESS resemble ERDs. Theprimary dif ferences between the two are
The relat ionships between tables are notnamed in ACCESS.
Many-to-many relationships are notsupported. That is, the relationshipswindow permits physical data models only.
ACCESSuses the symbols 1 and insteadof 1 and N , as shown in Figure 3.18.
3.4.3 Why do I need to know about data
modeling?
Clearly, you do not needto know t he intr icaciesof data modeling to start using a databasepackage such as MICROSOFT ACCESS. In fact,
MICROSOFT has gone through great lengths
FIGURE 3.18: The relationship window inACCESSuses a notation similar to that used in
ERDs.
The ACCESS relationship window supportsphysical data modeling. One-to-manyrelationships are denoted using 1 and .
DiscussionAn int roduct ion to data modeling
http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
20/23
20 o f 23
between the release of ACCESS version 2.0 andACCESS 2000 to make the product moreaccessible to data modeling neophytes. Forexample, there is a table analyzer, a table
design wizard, and all sorts of other aidsint ended to automate the database designprocess. In my view, there are three problemswith MICROSOFTs dumbing-down st rategy:
1. No wizard or add-in t ool is going to changethe fact that database management systems
(DBMSs) are specialized software packagesthat presuppose an enormous amount ofprior knowledge. Even the error messagesgenerated by ACCESS (as you will soondiscover) can only be understood if youhave a firm grasp on the theoreticalfundamentals of the relational database
model. For example, what do you do whenyou accidentally violate a referent ialintegri t y const raint ? What is a primarykey ? What is an ambiguous outer j oin ?
2. Trends in applicat ion developmentincreasingly emphasize data and de-emphasize programming. Thus, a solid
understanding of your data is critical. Putanother way, j ust about anyone can build areasonably sophisticated system if theunderlying database is well designed(indeed, by doing these tutorials, you willsee just how far you can go wit hout wri t inga single line of programming code). In
contrast, if the database design is poor, youwill have to be a programming wizard j ustto create the illusion that the applicationworks.
3. CASE tools from vendors such as ORACLE,COMPUTER ASSOCIATES, VISIO (now part ofMICROSOFT), and many others t ranslate ERDsdirectly into database tables. Thus, if youknow how to draw diagrams similar to theone shown in Figure 3.17, you know how to
design databases.In short, the last thing you want to do is rely onwizards to shelter you from the database designprocess. Instead, you want to get in at thenitty-gritty level and understand the trade-offsbetween various designs. Once the design iscomplete you can use wizards and shortcuts for
everyt hing else.
3.4.4 How do I learn about data modeling?
One important problem that I perceive as auniversity professor is that despite theimportance of data modeling, i t is very dif f icult
to find good practical training as a datamodeler.
In the standard computer science databasecourse, we tend to focus on theoret ical issuessuch as relational algebra, set theory, indexing,normalization, and so on. Although such
knowledge is certainly important for DBAs and
Appl icat ion to the project21 f 23
An int roduct ion to data modeling
http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
21/23
21 o f 23
other technical professionals, it provides littleguidance when we are faced with real-worldmodeling problems.
Conversely, the introductory informationsystems (IS) courses that we of fer in businessschools provide only cursory treatment of datamodeling and database design. I suppose therationale is that it should be possible to hire acomputer science graduate to do the datamodeling!
3.4.5 CASE tools and the design process
Computer-aided software engineering (CASE)tools are software packages that simplify theprocess of creating conceptual models. Inaddit ion, some CASE packages are more t han
j ust drawing tools: they t ranslate the datamodels into database tables, programming codetemplates, and so on.
To illustrate, consider the diagram inFigure 3.19 which was drawn using the database modeling tool in VISIO ENTERPRISE.The database modeling tool allows a designer t o
start with a a physical ERD-l ike diagram and addimplementation-level metadata (data aboutdata). For example, in Figure 3.20, t hephysical-level propert ies of the ActualPriceat t ribute are specif ied in a dialog box.
Once the physical-level metadata has been
added to the model, many CASE tools can
generate table schemas for t he targetdatabase. For example, ORACLE DESIGNER cangenerate st ructured query language (SQL) datadefinition commands for ORACLE databases. VISIO
ENTERPRISEcan generate SQL or create the tablesin various database packages (including ACCESS)directly.
Given CASE tools with this type of functionality,it is clear t hat the most import ant skil l fordatabase designers is not a complete knowledge
of SQL syntax. Instead, it is the ability toanalyze a real-world problem and create a datamodel that accurately and elegantly capturesthe critical elements of the problem.
3.5 Application to the project
Complete your own ERD for the order entryscenario. Remember that at t his point , t hescope of t he proj ect is very l imit ed. It wil lbe expanded somewhat in subsequentlessons.
HINT: If you get stuck, you can refer to the VISIOdiagram in Figure 3.19. Keep in mind,however, that Figure 3.19 is a physicalERD; you should be working with logicalERDs at this stage of the developmentprocess.
Appl icat ion to the project22 f 23
An int roduct ion to data modeling
http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
22/23
22 o f 23
CustNameBillingAddressCityProvState
PostalCodeRegionCodeContactPersonContactPhoneOnLineOrdering
CustID
Customer
CustIDOrderDateProcessed
OrderID
Order
places
QtyOrderedQtyShippedActualPrice
OrderID
ProductID
OrderDetail
consists of
DescriptionUnitUnitPriceQtyOnHand
ProductID
Product
is a
FIGURE 3.19: An physical database model for the kitchen supply environment created using a CASEtool.
VISIO ENTERPRISE canbe used to drawvarious styles oflogical and physicaldata models. This isan example of aphysical model.
Attributes areshown within thebody of the entityand primary keysare shown in bold.
Crows footnotation is usedindicate thecardinality of therelationships.
Since this is a physical database diagram, thereare no many-to-many relationships or associativeentities. For example, the Order Detail associativeentity is shown as a regular entity.
Appl icat ion to the project23 o f 23
An int roduct ion to data modeling
http://map.pdf/http://map.pdf/ -
7/31/2019 developerdmodel
23/23
23 o f 23
FIGURE 3.20: A CASE tool can be used t o add implementation details to a graphical model.
The implementation details ofeach attribute can be saved withthe model. This informationpermits VISIO to automaticallygenerate the table schema forthe database.
http://map.pdf/