developerdmodel

Upload: taqi-shah

Post on 05-Apr-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 developerdmodel

    1/23

    Michael Brydon (mj [email protected])Last update: 02-May-01 1 o f 23

    Lesson 3: An introduction to data modeling

    3.1 Introduction: The importance of

    conceptual models

    Before you sit down in front of the keyboardand start creat ing a database applicat ion, i t iscritical that you take a step back and consideryour business problemin this case, the kitchensupply scenario presented in Lesson 2from aconcept ualpoint of view. To facilitate this

    process, a number of conceptual modelingtechniques have been developed by computerscientists, psychologists, and consultants.

    For our purposes, we can think of aconceptual model as a pictureof t heinformat ion system we are going to build.

    To use an analogy, conceptual models areto informat ion systems what blueprintsare to buildings.

    There are many different conceptual modelingtechniques used in practice. Each techniqueuses a dif ferent set of symbols and may focus on

    a dif ferent part of the problem (e.g., data,processes, informat ion flows, objects, and soon). Despit e dif ferences in notation and focus,however, the underlying rationale forconceptual modeling techniques is always the

    same: understand the problem before you startconstructing a solution.

    There are two important things to keep in mindwhen learning about and doing data modeling:

    1. Data modeling is f irst and foremost a toolfor communicat ion.Their i s no single rightmodel. Instead, a valuable model highlightstricky issues, allows users, designers, andimplementors to discuss the issues using thesame vocabulary, and leads to better designdecisions.

    2. The modeling process is inherent lyiterative: you create a model, check itsassumpt ions with users, make the necessarychanges, and repeat the cycle unt il you are

    sure you understand the critical issues.

    In this background lesson, you are going to use adata modeling techniquespecif ically, Entity-Relationship Diagrams (ERDs)to model thebusiness scenario from Lesson 2. The datamodel you create in this lesson will form the

    foundat ion of the database that you usethroughout the remaining lessons.

    ?

    http://scenario.pdf/http://scenario.pdf/http://scenario.pdf/http://scenario.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    2/23

    Introduct ion: The importance of conceptualAn int roduct ion to data modeling

    3.1.1 What is data modeling?

    A data model is a simply a diagram thatdescribes the most important things in your

    business environment from a data-centric pointof view. To illustrate, consider the simple ERDshown in Figure 3.1. The purpose of t he diagramis to describe the relationship between the datastored about products and the data storedabout the organizations that supply theproducts.

    3.1.1.1 Entities and attributes

    The rectangles in Figure 3.1 are called entitytypes (typicall y short ened to ent it ies ) and

    the ovals are called attributes. The ent it ies arethe things in the business environment aboutwhich we want to store data. The attributesprovide us with a means of organizing andstructuring the data. For example, we need tostore certain information about the productsthat we sell, such as the typical selling price of

    the product ( Unit price ) and the quanti ty ofthe product current ly in inventory ( Qty onhand ). These pieces of data are att ributes ofthe Product entity.

    It is important to note t hat the precise mannerin which data are usedand processedwithin a

    particular business application is a separateissue from data modeling. For example, thedata model says nothing about howthe value of Qty on hand is changed over t ime. The focusin data modeling is on capturing data about theenvironment . You wil l learn how to change thisdata (e.g., process orders so that the inventory

    values are updated) once you have masteredthe art of database design.

    A data modeler assumes that if the rightdata is available, the other elements ofthe applicat ion wil l f all into placeeff ort lessly and wonderfully. For now, t his

    is a good working assumption.

    FIGURE 3.1: An ERD showing a relationshipbetween products and suppliers.

    Product

    supplied

    by

    Supplier

    Unit price

    ProductID

    Qty on hand

    Name

    Address

    Entity

    Attributes

    Relationship

    Cardinality

    ?

    http://map.pdf/
  • 7/31/2019 developerdmodel

    3/23

    Introduct ion: The importance of conceptualAn int roduct ion to data modeling

    3.1.1.2 Notation for relationships

    In addit ion to ent it ies and at t ributes, Figure 3.1shows a relationship between the two ent it ies

    using a line and a diamond. The relationshipconst ruct is usednot surprisinglyto indicatethe existence or absence of a relationshipbetween entit ies. A crows foot at either end ofa relationship line is used to denote thecardinality of the relationship.

    For example, the crows foot on the productside of t he relationship in Figure 3.1 indicatesthat a particular supplier may provide yourcompany with several different products, suchas bowls, spatulas, wire whisks and so on. Theabsenceof a crows foot on the supplier sideindicates that each product in your inventory is

    provided by a single supplier. Thus, therelationship in Figure 3.1 indicates that youalways buy all your wire whisks from the samecompany.

    3.1.1.3 Modeling assumptions

    The relationship shown in Figure 3.1 is calledone-to-many: each supplier supplies manyproducts (where many means any numberincluding zero ) but each product is supplied byone supplier (where one means at mostone ).

    The decision to use a one-to-many relationship

    reflects an assumption about the business

    environment in which your wholesale companyoperates. However, it is easy to imagine adif ferent environment in which each product issupplied by multiplesuppliers. For example,many suppliers may carry a part icular brand ofwire whisk. When you run out of whisks, it is upto you to decide where to place your order. Inother words, it is possible that a many-to-manyrelationship exists between suppliers andproducts.

    If mult iple supplier exist , at t ributes of t heproduct, such as it s price and product numbermay vary from supplier to supplier. In thissit uat ion, the data requirements of a many-to-many environment are slight ly more complexthan those of the one-to-many environment. Ifyou design and implement your database around

    the one-to-many assumption but then discoverthat certain goods are supplied by multiplesuppliers, much effort is going to be required tofix the problem.

    Herein lies the point of drawing an ERD:The diagram makes your assumpt ionsabout the relationships wit hin a part icularbusiness environment explicit beforeyoustart building things.

    3.1.1.4 The role of the modeler

    In the environment used in these tutorials, you

    are the user, the designer, and the implementor

    ?

    http://map.pdf/
  • 7/31/2019 developerdmodel

    4/23

    Introduct ion: The importance of conceptualAn int roduct ion to data modeling

    of the system. In a more realistic environment,however, these roles are played by dif ferentindividuals (or groups) wit h dif ferentbackgrounds and priorit ies. For example, acommon stereotype of implementors(programmers, database specialists, and so on)is that they seldom leave their cubicles tocommunicate with end-users of the softwarethey are wri t ing. Similarly, it is generally safe toassume that users have no interest in, or

    understanding of, low-level technical details(such as the cardinality of relat ionships onERDs, mechanisms to enforce referent ialintegrity, and so on). Thus, it is up to thebusiness analyst to bridge the communicationgap between the different groups involved inthe construction, use, and administration of an

    information system.As a business analyst (or more generall y, adesigner), it is critical that you walk throughyour conceptual models with users and makesure that your modeling assumptions areappropriate. In some cases, you may have toexamine sample data from the existingcomputer-based or manual system to determinewhether (for instance) there are any productsthat are supplied by multiple suppliers.

    At the modeling stage, making changes such asconverting a one-to-many relationship to amany-to-many relationship is t ri vialall t hat is

    required is the addit ion of a crows foot to one

    end of the relationship, as shown in Figure 3.2.In contrast , making the same change once youhave implemented tables, built a userinterface, and writ ten code is a t ime-consumingand frustrating chore.

    Generally, you can count on the 10

    ruleof t humb when building software: t hecost of making a change increases by anorder of magnitude for each stage of thesystems development lifecycle that youcomplete.

    FIGURE 3.2: An ERD for an environment inwhich there is a many-to-many relationship

    between products and suppliers.

    Product

    supplied

    by

    Supplier

    Unit price

    ProductID

    Qty on hand

    Name

    Address

    The addition of asecond crowsfoot transformsthe one-to-manyrelationship into amany-to-manyrelationship.

    !

    http://map.pdf/
  • 7/31/2019 developerdmodel

    5/23

    Introduct ion: The importance of conceptualAn int roduct ion to data modeling

    3.1.2 Core modeling constructs and notation

    Data modelers typically adopt a set ofnotational conventions so that their diagrams

    are consistent . For example, large ITorganizations and consultancies typically adopta methodology1a set of tools and proceduresfor applying the tools that specif ies thenotat ion used within the organization. Enforcingstandardizat ion in t his way facili tates teamworkon large projects. Similarly, if a computer-

    aided soft ware engineer ing (CASE) tool is usedfor conceptual modeling and design, notationalconvent ions are oft en enforced by the soft ware.

    What follows is a brief summary of thenotational conventions that I use when drawingERDs. Keep in mind, however, that ERDs are f irst

    and foremost a tool f or communicat ion betweenhumans. As such, the precise notat ion you use isnot part icularly important as long as people canread and understand the diagrams. Withexperience, you wil l come to realize thatdifferences in the shapes of the boxes and lineshave little effect on the core concepts of data

    modeling.

    3.1.2.1 Entities

    Entities are drawn as rectangular boxescontaining a noun in singular form, as shown in

    Figure 3.3.

    You will see later t hat each ent it y you drawult imately becomes a table in your database.You might want to keep this t ransformationfrom entity to table in mind when selecting thenames of your enti t ies. For example, yourentity names should be short but descriptive.

    3.1.2.2 Relationships

    A relationship between entit ies is drawn as aline bisected by a diamond. The diamondcontains a verb (or short verb phrase) thatdescribes the nature of the relationship

    between the entities, as shown in Figure 3.4.

    Named relat ionships are used to make the ERDsmore readable. However, unlike entit y names,relationship names never show up in the finaldatabase. Consequently, it does not reallymat ter how you label your relat ionships, as long1 It can be argued that t he term method is

    grammatically preferable. In Europe, for example,

    the term method tends to be favored.

    Customer

    FIGURE 3.3: An entity named Customer .

    http://map.pdf/
  • 7/31/2019 developerdmodel

    6/23

    Introduct ion: The importance of conceptualAn int roduct ion to data modeling

    as the labels make the diagram easier to

    interpret.To illustrate, consider the relationship betweenproducts and suppliers shown in Figure 3.1. Therelationship is described by the verb phrase supplied by . Alt hough one could have optedfor t he short er relationship name has instead,the result ing diagram (e.g., Supplier has

    product ) would be more dif f icult for readers ofthe diagram to interpret.

    3.1.2.3 Relationship direction

    One issue that sometimes troubles neophytedata modelers is that the directionof t he

    relationship is not made explicit on thediagram. Returning to Figure 3.1, it is obviousto me (since I drew the diagram) that therelationship should be read: Product issuppli ed bysupplier. Reading the relat ionshipin the other direct ion ( Supplier is suppli edbyproduct ) makes very lit t le sense to anyone who

    is familiar with the particular problem domain.

    General ly, ERDs make certain assumpt ionsabout the readers knowledge of the underlyingbusiness domain.

    A notational convention support ed bysome CASE tools is to require two namesfor each relationship: one that makessense in one direction (e.g., is suppliedby ), and another that makes sense in t heopposit e direct ion (e.g., supplies ).

    Although double-naming may make thediagram easier to read, i t also addsclutter (twice as many labels) andimposes an additional burden on themodeler.

    3.1.2.4 Cardinality

    As discussed in Sect ion 3.1.1.2, the cardinalityof a relationship const rains the number ofinstances of one entity type that can beassociated with a single instance of the otherentity type.

    The cardinality of relationships has animportant impact on number andstructure of the tables in the database.Consequently, it is important to get thecardinali ty right on paper before start ingthe implementation.

    buys

    FIGURE 3.4: A relat ionship named buys .

    ?

    !

    http://map.pdf/
  • 7/31/2019 developerdmodel

    7/23

    Introduct ion: The importance of conceptualAn int roduct ion to data modeling

    There are three fundamental types ofcardinality in ERDs:

    One-to-many You have already seen an

    example of a one-to-many relationship inFigure 3.1. You wil l soon discover that one-to-many relationships are the bread andbutter of relational databases.

    One-to-one At this point in your datamodeling career, you should avoid one-to-one relationships. To illustrate the basicissue, consider the ERD shown in Figure 3.5.Based on an exist ing paper-based system,the modeler has assumed that eachcustomer is associated wit h one customerrecord (i.e., a paper form containinginformation about the customer, such as

    address, fax number, and so on). Clearly,each customer has only one customerrecord and each customer record belongs toa single customer. However, if we automatethe system and get rid of the paper form,then there is no reason not to combine theCustomer and Customer Record ent it ies into

    a single entity called Customer.

    In many cases, one-to-one relationshipsindicate a modeling error. When you havea one-to-one relationship such as the oneshown in Figure 3.5, you should combinethe two entities into a single entity.

    Many-to-many The world is full of many-to-many relationships. A well-used exampleis Student takes course. Many-to-manyrelat ionships also arise when you considerthe historyof an ent it y. To il lust rate,consider the ERD shown in Figure 3.6. Atf irst glance, t he relationship betweenFamily and Single-Family Dwelling (SFD)might seem to be one-to-one since aparticular family can only live in one SFD ata t ime and each SFD can (by definit ion) onlycontain a single family. However, it ispossible for a family to live in different

    houses over time. Similarly, it is possiblethat many famil ies inhabit a part icularhouse over the years. Thus, if the conceptof t ime is considered, the relationshipbecomes many-t o-many.

    We will discuss how you go about determiningcardinalit y in subsequent sections. At this point ,

    it is sufficient to recognize that there are two

    !

    FIGURE 3.5: An incorrect one-to-onerelationship

    Customer associatedwith

    CustomerRecord

    http://map.pdf/
  • 7/31/2019 developerdmodel

    8/23

    Introduct ion: The importance of conceptualAn int roduct ion to data modeling

    popular (and equivalent) approaches todenoting one and many on an ERD: thecrows foot notation you have already seen andthe 1:N notat ion.

    1. Crows foot notation In the crows footnotation, three little lines (resembling acrows foot ) are used to indicate many .

    Not surprisingly, the absence of a crowsfoot indicates one . Thus, t he relationshipin Figure 3.7 indicates that each product issupplied by at most one supplier, whereas each supplier may supply many products. .

    2. 1:N notat ion In the 1:N notation, thesymbol N (and/ or M ) is used to indicate many whereas 1 is used to indicate one . An example of the 1:N notation isshown in Figure 3.8.

    The ERD model also supports addit ionalcardinality information in the form ofcardinality constraints. To keep things

    simple, however, the discussion ofcardinali t y const raints is deferred unti lSect ion 6.3.2.

    3.1.2.5 Attributes

    At t ributes are propert ies or characterist ics of a

    particularly entity about which we wish tocollect and store data. In addition, there istypicall y one at t ribute that uniquely identi fiesparticular instances of the entity. For example,each of your customers may have a uniquecustomer ID. Such attributes are known as keyattributes.

    FIGURE 3.6: What is the cardinali t y of thisrelationship?

    Family livesin

    Single-Family

    Dwelling

    Product suppliedby

    Supplier

    FIGURE 3.7: A one-to-many relationship incrows foot notation

    Productsupplied

    by Supplier

    N 1

    FIGURE 3.8: A one-to-many relat ionship in1:N notation

    ?

    http://tables2.pdf/http://tables2.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    9/23

    Learning object ives9 o f 23

    An int roduct ion to data modeling

    For ERDs that are drawn manually, attributesare t radit ionally shown as ovals containing thename of the attribute. If the attribute is a key,it is typically underlined. A number of

    at t ributes for t he Customer entit y are shown inFigure 3.9.

    Adding attributes to ERDs can result invery clut tered diagrams. Some CASE toolslist the at t ributes inside the enti tyrectangle (see the discussion of CASEtools in Sect ion 3.4.5). Another way to

    reduce clut ter is to only show a handful ofcritical attributes on the diagram.

    3.2 Learning objectives

    ! understand the core constructs of theent it y-relat ionship model

    ! create an ERD based on yourunderstanding of a business scenario

    ! use associative entities to add

    attributes to relationships! gain some famil iarity wit h the role of

    data modeling and CASE tools in t hedevelopment process

    3.3 Exercises

    In the sections that follow, we step through theconstruction of an ERD for the kitchen supplyscenario. By following along, you should gain abetter understanding of the basic techniquesinvolved in data modeling as well as some of thedesign pit falls that should be avoided.

    If this lesson was on how to play golf, youwould not read it and then assume thatyou are a good golfer. Golf is a skill thatrequires both theoretical knowledge andhours of practice. Thus, the only way tobecome a good golfer is to acquire a solidunderstanding of the fundamentals and

    then go out and hit thousands of balls.The same principles apply to datamodeling.

    3.3.1 Starting simple

    Let us begin with the simplest and most

    essential statement one can make about the

    Customer

    CustID

    Name Phone No.

    Contact person

    FIGURE 3.9: A number of attributes of theCustomer entity are shown in ovals.

    ?!

    E iA i d i d d li

    http://map.pdf/
  • 7/31/2019 developerdmodel

    10/23

    Exercises10 o f 23

    An int roduct ion to data modeling

    wholesaling environment: cust omers buyproducts. It is natural that you would want toboth automate and informate (recallSect ion 2.4) t his important business process.

    3.3.1.1 Step one: identify the entities

    Ent it ies are physical things, organizations, rolesand events about which we want to storeinformation. In the wholesaling scenario, twoentit ies are immediately obvious: Customer and

    Product. However, before we add the entit ies toour diagram, it is important that we have a f irmunderstanding of what exact lythese entitiescorrespond to in t he real world:

    1. Customers In the wholesalingenvironment , a customer is an organization,not a person. There may be a single personat the organization through whom weconduct our business. We will refer to thisperson as the contact person for thecustomer in order to maintain a cleandistinction between people andorganizations.

    2. Products It is not immediately clearwhether t he Product ent it y refers to aspecif ic it em or a class of similar items. Forexample, one of the products you sell is the Fat Cat mug. The Product ID of t he mug is 88 4017 and it normally sells for $5.50.Note, however, that there are many

    individual Fat Cat mugs and each one is

    slightly dif ferent due to irregularit ies,variations in paint ing, and so on. In ourcase, there are advantages to ignoring theindividualit y of each mug and treating them

    all as a single group of interchangeableit ems. Thus, when we talk about aproduct or a SKU , we are talking aboutan ent ire classof similar instances, notindividual instances themselves.1

    Having made these assumptions explicit ly, we

    can now create our first ERD.

    Take out a piece of paper and a pencil.

    Unless you have a special-purpose CASEtool, it is seldom worth the effort to drawthe early drafts of your conceptual

    models on a computer.

    ERDs typicall y require many modif icat ionsso you should not invest much timemaking your diagrams look nice. In fact ,the diagram you are about to begin will

    1 In some environments, it may be necessary to treat

    products as individual items. For example, in the

    aerospace industry, there is a requirement to track

    individual parts by serial number in case a part fails.

    The requirement for unit eff ectivit y necessit ates a

    dif ferent set of assumptions about t he Product ent it y

    and thus leads to a different database design.

    ?

    !

    E iA i t d t i t d t d li

    http://scenario.pdf/http://scenario.pdf/http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    11/23

    Exercises11 o f 23

    An int roduct ion to data modeling

    end up the in recycle bin by the end ofthe lesson.

    Add the Customer and Product entities to

    your diagram, as shown in Figure 3.10.

    3.3.1.2 Step two: specify a relationship between the

    entities

    We know that customers buy products and thatproducts are bought by customers. It is a simplemat ter t o create a relationship to communicatethis fact.

    Add a relationship line between theCustomer and Product entities.

    Label t he relat ionship buys , as shown inFigure 3.11.

    Unlike flow charts, the arrangement ofboxes and the direction of lines in an ERDhave no signif icanceany arrangement

    that f its on the page is valid. Similarly,

    the relationship line does not denote any

    type of sequence or f low of informat ion.

    3.3.1.3 Step three: determine the cardinality of the

    relationship

    Each customer can buy many productsindeed,that is the whole purpose of being in this l ine ofbusiness. To show this possibil it y on our ERD, weadd a crows foot to the Product side of therelat ionship l ine.

    Similarly, each product can be purchased bymany customers. For example, a number of ourcustomers may chose to stock the Fat Catmug (keeping in mind that the product refers to

    a style of mug, not an individual mug). As aresult, a crows foot is added to the Customerside of the relationship.

    Designate the buys relationship as many-to-many using the crows foot notation, asshown in Figure 3.12.

    FIGURE 3.10: Add the first two entities toyour ERD.

    Customer Product

    ?

    FIGURE 3.11: Add a relationship betweenthe two enti t ies.

    Customer Productbuys

    ExercisesAn int roduct ion to data modeling

    http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    12/23

    Exercises12 o f 23

    An int roduct ion to data modeling

    3.3.1.4 Step four: identify a few important attributes

    There is a need to f ind a balance between thedescript iveness of the ERD and the ease withwhich others can decipher it . As such, I preferto show only a handful of attributes on the

    ERDs.

    Add a small number of important att ributesto your diagram, as shown in Figure 3.13.

    By adding at t ributes such as Qty on hand tothe Product ent it y, it is clear t hat the ent it y

    refers to a class of products, not an individual

    product. In cont rast , if Serial number wereadded to the diagram as an at t ribute instead of Qty on hand , the reader of the ERD wouldcome to a different conclusion about themeaning of the Product enti ty.

    3.3.2 Dealing with many-to-many

    relationships

    Although the diagram in Figure 3.13 istechnically correct, it is missing a great deal ofinformat ion about how a purchase t ransact ion

    occurs in reali ty.

    To il lust rate, consider the at t ribute Unit pricebelonging to the Product ent it y. Unit pricecontains the defaultsell ing price of theproduct. To understand why it is the defaultprice, consider the case of a Fat Cat mug that

    typically sells for $5.50. What if there is a

    FIGURE 3.12: Indicat e t he many-to-manycardinality of the relationship.

    Customer Productbuys

    Add a crows foot to the customerside to indicate that each productis purchased by many customers.

    1111

    Add a second crows foot to theproduct side to indicate that eachcustomer purchases many products.

    2222

    FIGURE 3.13: An init ial ERD for t he kit chensupply environment.

    Customer Productbuys

    ProductID

    Unit price Qty on hand

    CustID

    Name Contact person

    ExercisesAn int roduct ion to data modeling

    http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    13/23

    Exercises13 o f 23

    An int roduct ion to data modeling

    part icular mug that has a minor f law? Althoughthe mug can still be sold, the customer mayexpect a discounted price to compensate forthe flaw. The question is therefore: Where on

    the diagram do we indicate the act ualsellingprice of a part icular mug?

    A second example is the purchase quantity.What if the customer purchases a dozen FatCat mugs? Where is this informat ion recorded?The Qty ordered at t ribute does not belong to

    the Customer entity because customers ordermany products besides mugs. Similarly, Qtyordered does not belong to the Product ent it ybecause different customers may orderdif ferent quanti t ies.

    This is a problem t hat t ypically arises in many-

    to-many relationships: There are certainimportant at t ributes that do not seem to belongto eit her of t he ent it ies part icipat ing in therelationship. The solut ion is to assign theat t ributes to the relationship it self .

    3.3.2.1 Attributes of relationships

    The issues surrounding price and order quantityarise because the attributes belong to theinteractionof t he ent it ies in t he many-to-manyrelationship. Thus, the price of a flawed mug isan at t ribute of a part icular product being soldto a particular customer on a particular day.

    To summarize, there are a number of att ributesthat should be att ached to the buysrelat ionship in Figure 3.13:

    Date the date on which the purchase ismade;

    Actual pr ice the price at which the it em(or multiple items within the same class ofproducts) are actually sold to the customer;

    Quant it y ordered the number of items

    with a certain product ID requested by thecustomer; and,

    Quanti ty shipped the actual number ofitems shipped to the customer.

    3.3.2.2 Associative entities

    Given the number and importance of theat t ributes at tached to the buys relat ionship,it makes sense to t reat the relationship as anent it y in it s own right. To transform arelationship into an entity on an ERD, we use aspecial symbol called an associati ve ent it y. Thenotat ion for an associat ive entit y is a

    relationship diamond nested inside of an entityrectangle, as shown in Figure 3.14.

    To t ransform your many-to-many relat ionship(without attributes) into an associative entity(wit h at t ributes), do the following:

    ExercisesAn int roduct ion to data modeling

    http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    14/23

    Exercises14 o f 23

    An int roduct ion to data modeling

    Draw a rectangle around the buysrelationship.

    Replace the relationship name buys with

    an appropriate noun, for example Sale .

    Remember, ent it iesincluding associat iveentit iesare named with nouns.

    Decompose the many-to-many relationshipinto two one-to-many relationships.

    Add the attributes to the associative entity,as shown in Figure 3.14.

    Although Sale is now t reated as an ent ity,there is no requirement to add

    relationship diamonds between Product

    and Sale or Customer and Sale. Anassociative entity serves as an entity anda relationship at the same time.

    The meaning of the Sale associative entity inFigure 3.14 is the following: Each customer canbe involved in many sales t ransactions, but eachindividual sales transaction involves only onecustomer. Similarly, each product can beinvolved in many sales transactions, but eachsales transaction involves only one type ofproduct.

    3.3.2.3 Illustration

    To better understand how an associative enti tyworks, it is wort hwhile to jump ahead a bit andconsider what the data might look l ike. In

    Figure 3.15, sample data for the Customer,Product, and Sale entities are shown. There area couple of interesting things to notice aboutthe sample data:

    1. Each ent it y contains informat ion relevant tothat entity only. For example, Customeronly contains informat ion about customers;Product only contains information aboutproducts, and so on.

    2. Each row in the Sale ent it y shows thedetails of a single sales transaction. Eachsales transaction consists of a particularproduct being sold t o a part icular customer.

    The transaction-specific information (such

    FIGURE 3.14: Transform a many-t o-manyrelationship into an associative entity.

    Customer ProductSale

    DateQuantity ordered

    Quantity shipped Actual price

    ?

    !

    ExercisesAn int roduct ion to data modeling

    http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    15/23

    Exercises15 o f 23

    An int roduct ion to data modeling

    as the actual selling price and quantity

    ordered) are at t ributes of t he Sale ent ity.3. By using the data for t he Sale entit y, i t is

    possible to determine which products havebeen purchased by a particular customer.Similarly, it is possible to determine whichcustomers have purchased a part icularproduct.

    4. Only the minimum amount of informat ion

    required to ident if y the customer andproduct is included in the Sale associativeentity. For example, neither the name ofthe customer or t he description of t heproduct appear in Sale since thisinformat ion can easily be found elsewhereusing the values of Customer ID and

    Product ID respectively. In the context of

    FIGURE 3.15: Data showing t he role of an associat ive ent ity

    Each customer can

    participate in manysales transactions.

    Each product canparticipate in manysales transactions.

    Customer Product

    Sale

    ExercisesAn int roduct ion to data modeling

    http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    16/23

    Exercises16 o f 23

    An int roduct ion to data modeling

    the Sale ent ity, the Customer ID andProduct ID attributes are called foreignkeys (foreign keys are so important thatLesson 6 is devoted to the topic).

    3.3.3 Revising the ERD

    There is an important problem with the ERD asit s now stands. The const raint that each salestransaction involves only a single productappears to be at odds wit h the realit y of t he

    business situation described in Lesson 2. Forexample, the sales t ransact ions wit h whichwe are most familiarthe customer orders youreceive by faxtypically request manydif ferent products: a dozen Fat Cat mugs,two dozen spatulas, some wire whisks, and soon. The mismatch between the diagram and the

    business environment means that the ERD mustbe revised.

    3.3.3.1 Identifying the problem

    The problem with the ERD in Figure 3.14 is thatignores the t echnology(broadly speaking) used

    by customers to place orders. Specifically,customers normall y wait until they need enoughstock t o make an order worthwhile. In addit ion,factors such as minimum order values andshipping costs favor the batchingof small,single-product orders into large, mult i-productorders.

    By taking the technology used for ordering intoaccount, it becomes clear that we have failedto model an important event ent it y: the arrivalof an order.

    Be carefulnot all pieces of paper in theexist ing business process areautomatically event ent it ies. Forexample, the invoices that we send to ourcustomers are more properly t hought ofas reports(which can be generated fromthe informat ion already contained inother enti t ies). Wit h pract ice, t hedistinction between entities and non-entities will become clear.

    3.3.3.2 Adding the new entity

    In this sect ion, you are going to modify your ERDto include an Order entity.

    Create a new ERD consist ing of ent it ies forcustomers, orders and products.

    Here is where a CASE tool pays off : you

    can delete entities or move entitiesaround the screen and the relat ionshipsand connector lines follow automatically.

    Create relat ionships to reflect the fact thatcustomers place orders and orders consistof products.

    !

    ?

    ExercisesAn int roduct ion to data modeling

    http://tables2.pdf/http://scenario.pdf/http://scenario.pdf/http://tables2.pdf/http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    17/23

    17 o f 23

    g

    Add cardinali ty symbols to reflect the factthat each customer can place many orders,but each order belongs to a single customer.

    Add cardinali ty symbols to reflect the factthat each order can contain many productsand each product may be contained in manyorders.

    Add a handful of attributes to the diagramto help clarify the meaning of the entities.

    The result ing ERD is shown in Figure 3.16.

    3.3.3.3 Creating OrderDetails

    Although Figure 3.16 is a great improvementover our previous ERD, much of the sameinformation missing from Figure 3.13such asactual price and quantity orderedis missingfrom the new ERD. As a consequence, we mustt ransform the contains relationship into anassociative entity with its own attributes.

    Transform the contains relationship into

    an associative ent it y using the proceduredescribed in Sect ion 3.3.2.2.

    To remain consistent with MICROSOFTssample databases, I recommend using thename Order Detail for t he associat iveent it y. Alternat ives include Line Item

    and Order Item .

    The result ing ERD is shown in Figure 3.17. Tohelp understand the relationship between theOrder entity and the Order Detail associative

    ent it y, look at t he orders included in the projectpackage. Each order has header information(such as customer, order dat e, and so on) andmult iple order details. Each detail hasinformat ion such as quant it y ordered, quant it yshipped, and price.

    ?

    FIGURE 3.16: Revise t he ERD to include anOrder entity.

    Customer

    Product

    places Order

    contains

    ProductID

    Unit price

    Qty on hand

    CustID Name

    Contact person

    OrderID

    Date

    DiscussionAn int roduct ion to data modeling

    http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    18/23

    18 o f 23

    g

    3.4 Discussion

    3.4.1 Logical versus physical models

    A distinction is typically made between logicaland physical data models:

    Logical data models logical models

    capture general informat ion about ent it ies

    and relationships and are used forcommunication with business users.

    Physical data models physical models

    serve as a precise specif icat ion for theimplemented system. As a consequence,the models must take into account thetechnology used to store the data. Forexample, a given logical data modelt ranslates into very dif ferent physical datamodels depending on whether the target

    technology is a file-based system, arelat ional database, or an object-orienteddatabase.

    Normally, you start with a high-level logicalmodel and refine with the help of users overseveral iterat ions. Once you are happy wit h thelogical model, you transform it into a physicalmodel and hand it to a database administrator(DBA) for implementation as a database.

    For relational databases, the translationprocess from logical to physical is relativelyst raightforward and involves the foll owingsteps:

    1. Decompose all many-to-manyrelationshipsSince the relationaldatabase model does not support many-to-many relationships, you must replace allmany-to-many relationships withassociative entities as described in

    Sect ion 3.3.2.2.

    FIGURE 3.17: Create an associat ive ent it y tomodel individual order details.

    Customer

    Product

    places Order

    ProductID

    Unit price

    Qty on hand

    CustID Name

    Contact person

    OrderID

    Date

    Orderdetail

    Actual price

    Quantity ordered

    Quantity shipped

    DiscussionAn int roduct ion to data modeling

    http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    19/23

    19 o f 23

    2. Add at t ributes The data model should be fully at t ributed before handing it over t oa DBA.

    3. Identify primary keys Each entityrequires an attribute that uniquelyidentifies instances.

    4. Add foreign keys In relational databases,relationships between entities areimplemented using foreign keys.

    At this point , i t is not crit ical t hat you

    understand each of these steps (you wil l get lotsof practice in subsequent lessons). What isimportant is that you understand that there is aclear progression f rom high-level, graphical,conceptual models to low-level databaseschemas.

    3.4.2 ERDs versus the ACCESS relationship

    window

    If you have used the relationship window inMICROSOFT ACCESS, you know that the relat ionshipdiagrams used by ACCESS resemble ERDs. Theprimary dif ferences between the two are

    The relat ionships between tables are notnamed in ACCESS.

    Many-to-many relationships are notsupported. That is, the relationshipswindow permits physical data models only.

    ACCESSuses the symbols 1 and insteadof 1 and N , as shown in Figure 3.18.

    3.4.3 Why do I need to know about data

    modeling?

    Clearly, you do not needto know t he intr icaciesof data modeling to start using a databasepackage such as MICROSOFT ACCESS. In fact,

    MICROSOFT has gone through great lengths

    FIGURE 3.18: The relationship window inACCESSuses a notation similar to that used in

    ERDs.

    The ACCESS relationship window supportsphysical data modeling. One-to-manyrelationships are denoted using 1 and .

    DiscussionAn int roduct ion to data modeling

    http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    20/23

    20 o f 23

    between the release of ACCESS version 2.0 andACCESS 2000 to make the product moreaccessible to data modeling neophytes. Forexample, there is a table analyzer, a table

    design wizard, and all sorts of other aidsint ended to automate the database designprocess. In my view, there are three problemswith MICROSOFTs dumbing-down st rategy:

    1. No wizard or add-in t ool is going to changethe fact that database management systems

    (DBMSs) are specialized software packagesthat presuppose an enormous amount ofprior knowledge. Even the error messagesgenerated by ACCESS (as you will soondiscover) can only be understood if youhave a firm grasp on the theoreticalfundamentals of the relational database

    model. For example, what do you do whenyou accidentally violate a referent ialintegri t y const raint ? What is a primarykey ? What is an ambiguous outer j oin ?

    2. Trends in applicat ion developmentincreasingly emphasize data and de-emphasize programming. Thus, a solid

    understanding of your data is critical. Putanother way, j ust about anyone can build areasonably sophisticated system if theunderlying database is well designed(indeed, by doing these tutorials, you willsee just how far you can go wit hout wri t inga single line of programming code). In

    contrast, if the database design is poor, youwill have to be a programming wizard j ustto create the illusion that the applicationworks.

    3. CASE tools from vendors such as ORACLE,COMPUTER ASSOCIATES, VISIO (now part ofMICROSOFT), and many others t ranslate ERDsdirectly into database tables. Thus, if youknow how to draw diagrams similar to theone shown in Figure 3.17, you know how to

    design databases.In short, the last thing you want to do is rely onwizards to shelter you from the database designprocess. Instead, you want to get in at thenitty-gritty level and understand the trade-offsbetween various designs. Once the design iscomplete you can use wizards and shortcuts for

    everyt hing else.

    3.4.4 How do I learn about data modeling?

    One important problem that I perceive as auniversity professor is that despite theimportance of data modeling, i t is very dif f icult

    to find good practical training as a datamodeler.

    In the standard computer science databasecourse, we tend to focus on theoret ical issuessuch as relational algebra, set theory, indexing,normalization, and so on. Although such

    knowledge is certainly important for DBAs and

    Appl icat ion to the project21 f 23

    An int roduct ion to data modeling

    http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    21/23

    21 o f 23

    other technical professionals, it provides littleguidance when we are faced with real-worldmodeling problems.

    Conversely, the introductory informationsystems (IS) courses that we of fer in businessschools provide only cursory treatment of datamodeling and database design. I suppose therationale is that it should be possible to hire acomputer science graduate to do the datamodeling!

    3.4.5 CASE tools and the design process

    Computer-aided software engineering (CASE)tools are software packages that simplify theprocess of creating conceptual models. Inaddit ion, some CASE packages are more t han

    j ust drawing tools: they t ranslate the datamodels into database tables, programming codetemplates, and so on.

    To illustrate, consider the diagram inFigure 3.19 which was drawn using the database modeling tool in VISIO ENTERPRISE.The database modeling tool allows a designer t o

    start with a a physical ERD-l ike diagram and addimplementation-level metadata (data aboutdata). For example, in Figure 3.20, t hephysical-level propert ies of the ActualPriceat t ribute are specif ied in a dialog box.

    Once the physical-level metadata has been

    added to the model, many CASE tools can

    generate table schemas for t he targetdatabase. For example, ORACLE DESIGNER cangenerate st ructured query language (SQL) datadefinition commands for ORACLE databases. VISIO

    ENTERPRISEcan generate SQL or create the tablesin various database packages (including ACCESS)directly.

    Given CASE tools with this type of functionality,it is clear t hat the most import ant skil l fordatabase designers is not a complete knowledge

    of SQL syntax. Instead, it is the ability toanalyze a real-world problem and create a datamodel that accurately and elegantly capturesthe critical elements of the problem.

    3.5 Application to the project

    Complete your own ERD for the order entryscenario. Remember that at t his point , t hescope of t he proj ect is very l imit ed. It wil lbe expanded somewhat in subsequentlessons.

    HINT: If you get stuck, you can refer to the VISIOdiagram in Figure 3.19. Keep in mind,however, that Figure 3.19 is a physicalERD; you should be working with logicalERDs at this stage of the developmentprocess.

    Appl icat ion to the project22 f 23

    An int roduct ion to data modeling

    http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    22/23

    22 o f 23

    CustNameBillingAddressCityProvState

    PostalCodeRegionCodeContactPersonContactPhoneOnLineOrdering

    CustID

    Customer

    CustIDOrderDateProcessed

    OrderID

    Order

    places

    QtyOrderedQtyShippedActualPrice

    OrderID

    ProductID

    OrderDetail

    consists of

    DescriptionUnitUnitPriceQtyOnHand

    ProductID

    Product

    is a

    FIGURE 3.19: An physical database model for the kitchen supply environment created using a CASEtool.

    VISIO ENTERPRISE canbe used to drawvarious styles oflogical and physicaldata models. This isan example of aphysical model.

    Attributes areshown within thebody of the entityand primary keysare shown in bold.

    Crows footnotation is usedindicate thecardinality of therelationships.

    Since this is a physical database diagram, thereare no many-to-many relationships or associativeentities. For example, the Order Detail associativeentity is shown as a regular entity.

    Appl icat ion to the project23 o f 23

    An int roduct ion to data modeling

    http://map.pdf/http://map.pdf/
  • 7/31/2019 developerdmodel

    23/23

    23 o f 23

    FIGURE 3.20: A CASE tool can be used t o add implementation details to a graphical model.

    The implementation details ofeach attribute can be saved withthe model. This informationpermits VISIO to automaticallygenerate the table schema forthe database.

    http://map.pdf/