10.1.1.90.2320

26
Conceptual Modeling Solutions for the Data Warehouse Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Chapter I Conceptual Modeling Solutions for the Data Warehouse Stefano Rizzi DEIS-University of Bologna, Italy Abstract In the context of data warehouse design, a basic role is played by conceptual mod- eling, that provides a higher level of abstraction in describing the warehousing process and architecture in all its aspects, aimed at achieving independence of implementation issues. This chapter focuses on a conceptual model called the DFM that suits the variety of modeling situations that may be encountered in real projects of small to large complexity. The aim of the chapter is to propose a comprehensive set of solutions for conceptual modeling according to the DFM and to give the de- signer a practical guide for applying them in the context of a design methodology. Besides the basic concepts of multidimensional modeling, the other issues discussed are descriptive and cross-dimension attributes; convergences; shared, incomplete, recursive, and dynamic hierarchies; multiple and optional arcs; and additivity.

Upload: itashok1

Post on 12-Sep-2015

214 views

Category:

Documents


1 download

DESCRIPTION

g

TRANSCRIPT

  • Conceptual Modeling Solutions for the Data Warehouse

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    Chapter.I

    Conceptual.Modeling.Solutions.for.the.Data.Warehouse

    Stefano RizziDEIS-University of Bologna, Italy

    Abstract

    In the context of data warehouse design, a basic role is played by conceptual mod-eling, that provides a higher level of abstraction in describing the warehousing process and architecture in all its aspects, aimed at achieving independence of implementation issues. This chapter focuses on a conceptual model called the DFM that suits the variety of modeling situations that may be encountered in real projects of small to large complexity. The aim of the chapter is to propose a comprehensive set of solutions for conceptual modeling according to the DFM and to give the de-signer a practical guide for applying them in the context of a design methodology. Besides the basic concepts of multidimensional modeling, the other issues discussed are descriptive and cross-dimension attributes; convergences; shared, incomplete, recursive, and dynamic hierarchies; multiple and optional arcs; and additivity.

  • Rizzi

    Copyright 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited.

    Introduction

    Operational databases are focused on recording transactions, thus they are prevalently characterized by an OLTP (online transaction processing) workload. Conversely, data warehouses (DWs) allow complex analysis of data aimed at decision support; the workload they support has completely different characteristics, and is widely known as OLAP (online analytical processing). Traditionally, OLAP applications are based on multidimensional modeling that intuitively represents data under the metaphor of a cube whose cells correspond to events that occurred in the business domain (Figure 1). Each event is quantified by a set of measures; each edge of the cube corresponds to a relevant dimension for analysis, typically associated to a hierarchy of attributes that further describe it. The multidimensional model has a twofold benefit. On the one hand, it is close to the way of thinking of data analyz-ers, who are used to the spreadsheet metaphor; therefore it helps users understand data. On the other hand, it supports performance improvement as its simple structure allows designers to predict the user intentions.Multidimensional modeling and OLAP workloads require specialized design tech-niques. In the context of design, a basic role is played by conceptual modeling that provides a higher level of abstraction in describing the warehousing process and architecture in all its aspects, aimed at achieving independence of implementation issues. Conceptual modeling is widely recognized to be the necessary foundation for building a database that is well-documented and fully satisfies the user require-

    Figure 1. The cube metaphor for multidimensional modeling

  • Conceptual Modeling Solutions for the Data Warehouse

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    ments;usually,itreliesonagraphicalnotationthatfacilitateswriting,understand-ing,andmanagingconceptualschematabybothdesignersandusers.Unfortunately, in the field of data warehousing there still is no consensus about a formalismforconceptualmodeling(Sen&Sinha,2005).Theentity/relationship(E/R)modeliswidespreadintheenterprisesasaconceptualformalismtoprovidestandarddocumentationforrelationalinformationsystems,andagreatdealofef-forthasbeenmadetouseE/Rschemataastheinputfordesigningnonrelationaldatabasesaswell (Fahrner&Vossen,1995);nevertheless,asE/Risoriented tosupport queries that navigate associations between data rather than synthesizethem,itisnotwellsuitedfordatawarehousing(Kimball,1996).Actually,theE/RmodelhasenoughexpressivitytorepresentmostconceptsnecessaryformodelingaDW;ontheotherhand,initsbasicform,itisnotabletoproperlyemphasizethekeyaspectsofthemultidimensionalmodel,sothatitsusageforDWsisexpensivefromthepointofviewofthegraphicalnotationandnotintuitive(Golfarelli,Maio,&Rizzi,1998).Somedesignersclaimtousestarschemataforconceptualmodeling.Astar schemaisthestandardimplementationofthemultidimensionalmodelonrelationalplat-forms; it is just a (denormalized) relational schema, so it merely defines a set of relationsandintegrityconstraints.Usingthestarschemaforconceptualmodelingislikestartingtobuildacomplexsoftwarebywritingthecode,withoutthesup-portofandstatic,functional,ordynamicmodel,whichtypicallyleadstoverypoorresultsfromthepointsofviewofadherencetouserrequirements,ofmaintenance,andofreuse.Forallthesereasons,inthelastfewyearstheresearchliteraturehasproposedseveraloriginalapproachesformodelingaDW,somebasedonextensionsofE/R,someonextensionsofUML.Thischapterfocusesonanadhocconceptualmodel,thedimensional fact model (DFM), that was first proposed in Golfarelli et al. (1998) and continuously enriched and refined during the following years in order to optimally suitthevarietyofmodelingsituationsthatmaybeencounteredinrealprojectsofsmalltolargecomplexity.TheaimofthechapteristoproposeacomprehensivesetofsolutionsforconceptualmodelingaccordingtotheDFMandtogiveapracticalguideforapplyingtheminthecontextofadesignmethodology.Besidesthebasicconceptsofmultidimensionalmodeling,namelyfacts,dimensions,measures,andhierarchies,theotherissuesdiscussedaredescriptiveandcross-dimensionattributes;convergences;shared,incomplete,recursive,anddynamichierarchies;multipleandoptionalarcs;andadditivity.Afterreviewingtherelated literature in thenextsection, in the thirdandfourthsections,weintroducetheconstructsofDFMforbasicandadvancedmodeling,re-spectively. Then, in the fifth section we briefly discuss the different methodological approachestoconceptualdesign.Finally,inthesixthsectionweoutlinetheopenissuesinconceptualmodeling,andinthelastsectionwedrawtheconclusions.

  • Rzz

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    Related.Literature

    Inthecontextofdatawarehousing,theliteratureproposedseveralapproachestomultidimensionalmodeling.Someofthemhavenographicalsupportandareaimedatestablishingaformalfoundationforrepresentingcubesandhierarchiesaswellasanalgebraforqueryingthem(Agrawal,Gupta,&Sarawagi,1995;Cabibbo&Torlone,1998;Datta&Thomas,1997;Franconi&Kamble,2004a;Gyssens&Lakshmanan,1997;Li&Wang,1996;Pedersen&Jensen,1999;Vassiliadis,1998);sincewebelievethatadistinguishingfeatureofconceptualmodelsisthatofprovid-ingagraphicalsupporttobeeasilyunderstoodbybothdesignersanduserswhendiscussingandvalidatingrequirements,wewillnotdiscussthem.TheapproachestostrictconceptualmodelingforDWsdevisedsofararesum-marized inTable1.Foreachmodel, the tableshows if it isassociated tosomemethodforconceptualdesignandifitisbasedonE/R,isobject-oriented,orisanadhocmodel.ThediscussionaboutwhetherE/R-based,object-oriented,oradhocmodelsarepreferableiscontroversial.SomeclaimthatE/Rextensionsshouldbeadoptedsince(1)E/Rhasbeentestedforyears;(2)designersarefamiliarwithE/R;(3)E/Rhasproven flexible and powerful enough to adapt to a variety of application domains; and(4)severalimportantresearchresultswereobtainedfortheE/R(Sapia,Blas-chka, Hofling, & Dinter, 1998; Tryfona, Busborg, & Borch Christiansen, 1999). On theother hand, advocates of object-orientedmodels argue that (1) they aremoreexpressiveandbetterrepresentstaticanddynamicpropertiesofinformationsystems;(2)theyprovidepowerfulmechanismsforexpressingrequirementsandconstraints;(3)object-orientationiscurrentlythedominanttrendindatamodeling;and (4) UML, in particular, is a standard and is naturally extensible (Abell, Samos, &Saltor,2002;Lujn-Mora,Trujillo,&Song,2002).Finally,webelievethatadhocmodelscompensateforthelackoffamiliarityfromdesignerswiththefactthat(1)theyachievebetternotationaleconomy;(2)theygiveproperemphasistothepeculiaritiesofthemultidimensionalmodel,thus(3)theyaremoreintuitiveand

    E/Rextension object-oriented adhoc

    nomethod

    FranconiandKamble(2004b);

    Sapiaetal.(1998);Tryfonaetal.(1999)

    Abell et al. (2002);Nguyen,Tjoa,andWagner

    (2000)Tsoisetal.(2001)

    method Lujn-Moraetal.(2002) Golfarellietal.(1998);Hsemannetal.(2000)

    Table 1. Approaches to conceptual modeling

  • Conceptual Modeling Solutions for the Data Warehouse

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    readablebynonexpertusers.Inparticular,theycanmodelsomeconstraintsrelatedtofunctionaldependencies(e.g.,convergencesandcross-dimensionalattributes)inasimplerwaythanUML,thatrequirestheuseofformalexpressionswritten,forinstance,inOCL.AcomparisonofthedifferentmodelsdonebyTsois,Karayannidis,andSellis(2001)pointedoutthat,abstractingfromtheirgraphicalform,thecoreexpressivityissimi-lar. In confirmation of this, we show in Figure 2 how the same simple fact could be modeledthroughanE/Rbased,anobject-oriented,andanadhocapproach.

    Figure 2. The SALE fact modeled through a starER (Sapia et al., 1998), a UML class diagram (Lujn-Mora et al., 2002), and a fact schema (Hsemann, Lechtenbrger, & Vossen, 2000)

  • Rzz

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    The.Dimensional.Fact.Model:............Basic.Modeling

    Inthischapterwefocusonanadhocmodelcalledthedimensionalfactmodel.TheDFM is a graphical conceptual model, specifically devised for multidimensional modeling,aimedat:

    Effectivelysupportingconceptualdesign Providing an environment on which user queries can be intuitively ex-

    pressed Supporting the dialogue between the designer and the end users to refine the

    specification of requirements

    Creatingastableplatformtogroundlogicaldesign Providinganexpressiveandnon-ambiguousdesigndocumentation

    TherepresentationofrealitybuiltusingtheDFMconsistsofasetoffact schemata.Thebasicconceptsmodeledarefacts,measures,dimensions,andhierarchies.Inthe following we intuitively define these concepts, referring the reader to Figure 3 thatdepictsasimplefactschemaformodelinginvoicesatlinegranularity;aformaldefinition of the same concepts can be found in Golfarelli et al. (1998).

    Definition 1: Afactisafocusofinterestforthedecision-makingprocess;typically,itmodelsasetofeventsoccurringintheenterpriseworld.A

    Figure 3. A basic fact schema for the INVOICE LINE fact

  • Conceptual Modeling Solutions for the Data Warehouse

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    factisgraphicallyrepresentedbyaboxwithtwosections,oneforthefactnameandoneforthemeasures.

    Examplesoffactsinthetradedomainaresales,shipments,purchases,claims;inthe financial domain: stock exchange transactions, contracts for insurance policies, grantingofloans,bankstatements,creditcardspurchases.Itisessentialforafacttohavesomedynamicaspects,thatis,toevolvesomehowacrosstime.

    Guideline.1:Theconceptsrepresentedinthedatasourcebyfrequently-updated archives are good candidates for facts; those represented byalmost-staticarchivesarenot.

    Asamatteroffact,veryfewthingsarecompletelystatic;eventherelationshipbetweencitiesandregionsmightchange,ifsomeborderwererevised.Thus,thechoiceoffactsshouldbebasedeitherontheaverageperiodicityofchanges,oronthe specific interests of analysis. For instance, assigning a new sales manager to a salesdepartmentoccurslessfrequentlythancouplingapromotiontoaproduct;thus,whiletherelationshipbetweenpromotionsandproductsisagoodcandidatetobemodeledasafact,thatbetweensalesmanagersanddepartmentsisnotexceptforthepersonnelmanager,whoisinterestedinanalyzingtheturnover!

    Definition 2:Ameasureisanumericalpropertyofafact,anddescribesoneofitsquantitativeaspectsofinterestsforanalysis.Measuresarein-cludedinthebottomsectionofthefact.

    Forinstance,eachinvoicelineismeasuredbythenumberofunitssold,thepriceperunit,thenetamount,andsoforth.Thereasonwhymeasuresshouldbenumeri-calisthattheyareusedforcomputations.Afactmayalsohavenomeasures,iftheonlyinterestingthingtoberecordedistheoccurrenceofevents;inthiscasethefactschemeissaidtobeemptyandistypicallyqueriedtocounttheeventsthatoccurred.

    Definition 3: Adimension is a fact property with a finite domain and describesoneofitsanalysiscoordinates.Thesetofdimensionsofafactdetermines its finest representation granularity. Graphically, dimensions arerepresentedascirclesattachedtothefactbystraightlines.

    Typicaldimensionsfortheinvoicefactareproduct,customer,agent,anddate.

  • Rzz

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    Guideline.2:Atleastoneofthedimensionsofthefactshouldrepresenttime,atanygranularity.

    Therelationshipbetweenmeasuresanddimensionsisexpressed,attheinstancelevel,bytheconceptofevent.

    Definition 4:Aprimary eventisanoccurrenceofafact,andisidenti-fied by a tuple of values, one for each dimension. Each primary event is describedbyonevalueforeachmeasure.

    Primaryeventsaretheelementalinformationwhichcanberepresented(inthecubemetaphor,theycorrespondtothecubecells).Intheinvoiceexampletheymodeltheinvoicingofoneproducttoonecustomermadebyoneagentononeday;itisnotpossibletodistinguishbetweeninvoicespossiblymadewithdifferenttypes(e.g.,active,passive,returned,etc.)orindifferenthoursoftheday.

    Guideline.3:.Ifthegranularityofprimaryeventsasdeterminedbythesetofdimensionsiscoarserthanthegranularityoftuplesinthedatasource,measures should be defined as either aggregations of numerical attributes inthedatasource,orascountsoftuples.

    Remarkably, some multidimensional models in the literature focus on treatingdimensionsandmeasuressymmetrically(Agrawaletal.,1995;Gyssens&Lak-shmanan,1997).Thisisanimportantachievementfromboththepointofviewofthe uniformity of the logical model and that of the flexibility of OLAP operators. Neverthelessweclaimthat,ataconceptuallevel,distinguishingbetweenmeasuresand dimensions is important since it allows logical design to be more specifically aimed at the efficiency required by data warehousing applications.

    Aggregation is the basic OLAP operation, since it allows significant information usefulfordecisionsupporttobesummarizedfromlargeamountsofdata.Fromaconceptualpointofview,aggregationiscarriedoutonprimaryeventsthankstothedefinition of dimension attributes and hierarchies.

    Definition 5:Adimension attribute is a property, with a finite domain, of adimension.Likedimensions,itisrepresentedbyacircle.

    Forinstance,aproductisdescribedbyitstype,category,andbrand;acustomer,byitscityanditsnation.Therelationshipsbetweendimensionattributesareexpressedbyhierarchies.

  • Conceptual Modeling Solutions for the Data Warehouse

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    Definition 6: Ahierarchyisadirectedtree,rootedinadimension,whosenodesareallthedimensionattributesthatdescribethatdimension,andwhosearcsmodelmany-to-oneassociationsbetweenpairsofdimensionattributes.Arcsaregraphicallyrepresentedbystraightlines.

    Guideline 4:Hierarchiesshouldreproducethepatternofinterattributefunctionaldependenciesexpressedbythedatasource.

    Hierarchies determine how primary events can be aggregated into secondaryevents and selected significantly for the decision-making process. The dimension in which a hierarchy is rooted defines its finest aggregation granularity, while the other dimension attributes define progressively coarser granularities. For instance, thankstotheexistenceofamany-to-oneassociationbetweenproductsandtheircategories,theinvoicingeventsmaybegroupedaccordingtothecategoryoftheproducts.

    Definition 7:Given a set of dimension attributes, each tupleof theirvalues identifies a secondary eventthataggregatesallthecorrespondingprimaryevents.Eachsecondaryeventisdescribedbyavalueforeachmeasurethatsummarizesthevaluestakenbythesamemeasureinthecorrespondingprimaryevents.

    Weclosethissectionbysurveyingsomealternativeterminologyusedeitherintheliteratureorinthecommercialtools.Thereissubstantialagreementonus-ingthetermdimensionstodesignatetheentrypointstoclassifyandidentifyevents;whilewe refer inparticular to the attributedetermining theminimumfactgranularity,sometimesthewholehierarchiesarenamedasdimensions(forinstance,thetermtimedimensionoftenreferstothewholehierarchybuiltondimensiondate).Measuresaresometimescalledvariablesormetrics.Finally,insomedatawarehousingtools,thetermhierarchydenoteseachsinglebranchofthetreerootedinadimension.

    The.Dimensional.Fact.Model:............Advanced.Modeling

    Theconstructsweintroduceinthissection,withthesupportofFigure4,aredescrip-tiveandcross-dimensionattributes;convergences;shared,incomplete,recursive,

  • 0 Rizzi

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    anddynamichierarchies;multipleandoptionalarcs;andadditivity.Thoughsomeofthemarenotnecessaryinthesimplestandmostcommonmodelingsituations,theyarequiteusefulinordertobetterexpressthemultitudeofconceptualshadesthatcharacterizereal-worldscenarios.Inparticularwewillseehow,followingtheintroduction of some of this constructs, hierarchies will no longer be defined as trees tobecome,inthegeneralcase,directedgraphs.

    Descriptive Attributes

    Inseveralcasesitisusefultorepresentadditionalinformationaboutadimensionattribute,thoughitisnotinterestingtousesuchinformationforaggregation.Forinstance,theusermayaskforknowingtheaddressofeachstore,buttheuserwillhardlybeinterestedinaggregatingsalesaccordingtotheaddressofthestore.

    Definition 8:Adescriptive attribute specifies a property of a dimension attribute,towhichisrelatedbyanx-to-oneassociation.Descriptiveat-tributesarenotusedforaggregation;theyarealwaysleavesoftheirhi-erarchyandaregraphicallyrepresentedbyhorizontallines.

    Therearetwomainreasonswhyadescriptiveattributeshouldnotbeusedforag-gregation:

    Guideline. 5:A descriptive attribute either has a continuously-valueddomain(forinstance,theweightofaproduct),orisrelatedtoadimen-sionattributebyaone-to-oneassociation(forinstance,theaddressofacustomer).

    Cross-Dimension Attributes

    Definition 9: Across-dimension attributeisa(eitherdimensionordescrip-tive)attributewhosevalueisdeterminedbythecombinationoftwoormoredimensionattributes,possiblybelongingtodifferenthierarchies.Itisdenotedbyconnectingthroughacurvelinethearcsthatdetermineit.

    Forinstance,iftheVATonaproductdependsonboththeproductcategoryandthestatewheretheproductissold,itcanberepresentedbyacross-dimensionattributeasshowninFigure4.

  • Conceptual Modeling Solutions for the Data Warehouse

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    Convergence

    Considerthegeographichierarchyondimensioncustomer(Figure4):customersliveincities,whicharegroupedintostatesbelongingtonations.Supposethatcustom-ersaregroupedintosalesdistrictsaswell,andthatnoinclusionrelationshipsexistbetweendistrictsandcities/states;ontheotherhand,salesdistrictsnevercrossthenationboundaries.Inthiscase,eachcustomerbelongstoexactlyonenationwhich-everofthetwopathsisfollowed(customer city state nation or customer sales district naton).

    Definition 10:Aconvergencetakesplacewhentwodimensionattributeswithinahierarchyare connectedby twoormorealternativepathsofmany-to-oneassociations.Convergencesarerepresentedbylettingtwoormorearcsconvergeonthesamedimensionattribute.

    Theexistenceofapparentlyequalattributesdoesnotalwaysdetermineaconver-gence.Ifintheinvoicefactwehadabrand cityattributeontheproducthierarchy,representingthecitywhereabrandismanufactured,therewouldbenoconvergencewithattribute(customer) city,sinceaproductmanufacturedinacitycanobviouslybesoldtocustomersofothercitiesaswell.

    Figure 4. The complete fact schema for the INVOICE LINE fact

  • 2 Rizzi

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    Optional.Arcs

    Definition 11:Anoptional arcmodelsthefactthatanassociationrepre-sented within the fact scheme is undefined for a subset of the events. An optionalarcisgraphicallydenotedbymarkingitwithadash.

    Forinstance,attributediettakesavalueonlyforfoodproducts;fortheotherprod-ucts, it is undefined.

    Inthepresenceofasetofoptionalarcsexitingfromthesamedimensionattribute,theircoveragecanbedenotedinordertoposeaconstraintontheoptionalitiesin-volved.LikeforIS-AhierarchiesintheE/Rmodel,thecoverageofasetofoptionalarcsischaracterizedbytwoindependentcoordinates.Letabeadimensionattribute,andb1,...,bmbeitschildrenattributesconnectedbyoptionalarcs:

    Thecoverageistotalifeachvalueofaalwayscorrespondstoavalueforatleastoneofitschildren;conversely,ifsomevaluesofaexistforwhichallofits children are undefined, the coverage is said to be partial.

    Thecoverageisdisjointifeachvalueofacorrespondstoavaluefor,atmost,oneofitschildren;conversely,ifsomevaluesofaexistthatcorrespondtovaluesfortwoormorechildren,thecoverageissaidtobeoverlapped.

    Thus,overall,therearefourpossiblecoverages,denotedbyT-D,T-O,P-D,andP-O.Figure4showsanexampleofoptionalityannotatedwithitscoverage.Weassumethatproductscanhavethreetypes:food,clothing,andhousehold,sinceexpirationdate and size are defined only for, respectively, food and clothing, the coverage is partialanddisjoint.

    Multiple.Arcs

    Inmostcases,asalreadysaid,hierarchiesincludeattributesrelatedbymany-to-oneassociations.Ontheotherhand,insomesituationsitisnecessarytoincludealsoat-tributesthat,forasinglevaluetakenbytheirfatherattribute,takeseveralvalues.

    Definition 12: Amultiple arc isanarc,withinahierarchy,modelingamany-to-manyassociationbetweenthetwodimensionattributesitconnects.Graphically,itisdenotedbydoublingthelinethatrepresentsthearc.

  • Conceptual Modeling Solutions for the Data Warehouse

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    Considerthefactschemamodelingthesalesofbooksinalibrary,representedinFigure5,whosedimensionsaredateandbook.Userswillprobablybeinterestedinanalyzingsalesforeachbook author;ontheotherhand,sincesomebookshavetwoormoreauthors,therelationshipbetweenbookandauthormustbemodeledasamultiplearc.

    Guideline.6:Inpresenceofmany-to-manyassociations,summarizabil-ityisnolongerguaranteed,unlessthemultiplearcisproperlyweighted.Multiplearcsshouldbeusedsparinglysince,inROLAPlogicaldesign,theyrequirecomplexsolutions.

    Summarizabilityisthepropertyofcorrectingsummarizingmeasuresalonghierar-chies(Lenz&Shoshani,1997).Weightsrestoresummarizability,buttheirintro-duction is artificial in several cases; for instance, in the book sales fact, each author ofamultiauthoredbookshouldbeassignedanormalizedweightexpressinghercontributiontothebook.

    Shared.Hierarchies

    Sometimes,largeportionsofhierarchiesarereplicatedtwiceormoreinthesamefactschema.Atypicalexampleisthetemporalhierarchy:afactfrequentlyhasmorethanonedimensionoftypedate,withdifferentsemantics,anditmaybeusefultodefine on each of them a temporal hierarchy month-week-year.Anotherexamplearegeographic hierarchies, that may be defined starting from any location attribute in thefactschema.Toavoidredundancy,theDFMprovidesagraphicalshorthandfordenotinghierarchysharing.Figure4showstwoexamplesofsharedhierarchies.FactINVOICE LINEhastwodatedimensions,withsemantics invoice dateandorderdate,

    Figure 5. The fact schema for the SALES fact

  • 4 Rizzi

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    respectively.Thisisdenotedbydoublingthecirclethatrepresentsattributedateandspecifyingtworolesinvoiceandorderontheenteringarcs.Thesecondsharedhierarchyistheoneonagent,thatmayhavetworoles:theorderingagent,thatisadimension,andtheagentwhoisresponsibleforacustomer(optional).

    Guideline.8:Explicitlyrepresentingsharedhierarchiesonthefactschemaisimportantsince,duringROLAPlogicaldesign,itenablesadhocsolu-tionsaimedatavoidingreplicationofdataindimensiontables.

    Ragged.Hierarchies

    Leta1,...,an be a sequence of dimension attributes that define a path within a hier-archy(suchascity,state,naton).Uptonowweassumedthat,foreachvalueofa1,exactlyonevalueforeveryotherattributeonthepathexists.Inthepreviouscase,thisisactuallytrueforeachcityintheU.S.,whileitisfalseformostEuropeancountries where no decomposition in states is defined (see Figure 6).

    Definition 13: Aragged (orincomplete) hierarchyisahierarchywhere,forsomeinstances, the values of one or more attributes are missing (since undefined or un-known).Araggedhierarchyisgraphicallydenotedbymarkingwithadashtheat-tributeswhosevaluesmaybemissing.

    AsstatedbyNiemi(2001),withinaraggedhierarchyeachaggregationlevelhaspreciseandconsistentsemantics,butthedifferenthierarchyinstancesmayhavedifferentlengthsinceoneormorelevelsaremissing,makingtheinterlevelrelation-shipsnotuniform(thefatherofSanFranciscobelongstolevelstate,thefatherofRometolevelnaton).Thereisanoticeabledifferencebetweenaraggedhierarchyandanoptionalarc.Inthe first case we model the fact that, for some hierarchy instances, there is no value foroneormoreattributesin any position of the hierarchy.Conversely,throughan

    Figure 6. Ragged geographic hierarchies

  • Conceptual Modeling Solutions for the Data Warehouse

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    optionalarcwemodelthefactthatthereisnovalueforanattributeand for all of its descendents.

    Guideline.9:.Raggedhierarchiesmayleadtosummarizabilityproblems.Awayforavoidingthemistofragmentafactintotwoormorefacts,eachincludingasubsetofthehierarchiescharacterizedbyuniforminterlevelrelationships.

    Thus,intheinvoiceexample,fragmenting INVOICE LINEintoU.S. INVOICE LINE andE.U. INVOICE LINE (the first with the stateattribute,thesecondwithoutstate)restoresthecompletenessofthegeographichierarchy.

    Unbalanced Hierarchies

    Definition 14: Anunbalanced (orrecursive)hierarchy is ahierarchywhere, though interattribute relationshipsareconsistent, the instancesmayhavedifferentlength.Graphically,itisrepresentedbyintroducingacyclewithinthehierarchy.

    Atypicalexampleofunbalancedhierarchyistheonethatmodelsthedependenceinterrelationshipsbetweenworkingpersons.Figure4includesanunbalancedhierar-chy on sale agents: there are no fixed roles for the different agents, and the different leafagentshaveavariablenumberofsupervisoragentsabovethem.

    Guideline.10:.RecursivehierarchiesleadtocomplexsolutionsduringROLAP logicaldesign and topoorqueryingperformance.Away foravoidingthemistounrollthemforagivennumberoftimes.

    Forinstance,intheagentexample,iftheuserstatesthattwoisthemaximumnumberofinterestinglevelsforthedependencerelationship,thecustomerhierarchycouldbetransformedasinFigure7.

    Dynamic.Hierarchies

    Timeisakeyfactorindatawarehousingsystems,sincethedecisionprocessisoftenbasedontheevaluationofhistoricalseriesandonthecomparisonbetweensnapshotsoftheenterprisetakenatdifferentmoments.Themultidimensionalmodelsimplicitlyassumethattheonlydynamiccomponentsdescribedinacubearetheeventsthat

  • 6 Rizzi

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    instantiateit;hierarchiesaretraditionallyconsideredtobestatic.Ofcoursethisisnotcorrect:salesmanageralternate,thoughslowly,ondifferentdepartments;newproductsareaddedeveryweektothosealreadybeingsold;theproductcategorieschange, and their relationship with products change; sales districts can be modified, andacustomermaybemovedfromonedistricttoanother.1

    Theconceptualrepresentationofhierarchydynamicityisstrictlyrelatedtoitsimpactonuserqueries.Infact,inpresenceofadynamichierarchywemaypicturethreedifferenttemporalscenariosforanalyzingevents(SAP,1998):

    . Today. for. yesterday: All events are referred to the current configuration ofhierarchies.Thus,assumingonJanuary1,2005theresponsibleagentforcustomerSmithhaschangedfromMr.BlacktoMr.White,andthatanewcustomerOHarahasbeenacquiredandassignedtoMr.Black,whencomput-ingtheagentcommissionsallinvoicesforSmithareattributedtoMr.White,whileonlyinvoicesforOHaraareattributedtoMr.Black.

    . Yesterday.for.today: All events are referred to some past configuration of hierarchies.Inthepreviousexample,allinvoicesforSmithareattributedtoMr.Black,whileinvoicesforOHaraarenotconsidered.

    . Today.or.yesterday.(or.historical.truth):Eacheventisreferredtothecon-figuration hierarchies had at the time the event occurred. Thus, the invoices forSmithupto2004andthoseforOHaraareattributedtoMr.Black,whileinvoicesforSmithfrom2005areattributedtoMr.White.

    Whileintheagentexample,dynamicityconcernsanarcofahierarchy,theoneexpressingthemany-to-oneassociationbetweencustomerandagent,insomecasesitmayaswellconcernadimensionattribute:forinstance,thenameofaproductcategory may change. Even in this case, the different scenarios are defined in much thesamewayasbefore.Ontheconceptualschema,itisusefultodenotewhichscenariostheuserisinterestedfor each arc and attribute, since this heavily impacts on the specific solutions to be adoptedduringlogicaldesign.Bydefault,wewillassumethattheonlyinterestingscenarioistodayforyesterdayitisthemostcommonone,andtheonewhose

    Figure 7. Unrolling the agent hierarchy

  • Conceptual Modeling Solutions for the Data Warehouse

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    implementationonthestarschemaissimplest.Ifsomeattributesorarcsrequiredifferentscenarios,thedesignershouldspecifythemonatablelikeTable2.

    Additivity

    Aggregation requires defining a proper operator to compose the measure values characterizingprimaryeventsintomeasurevaluescharacterizingeachsecondaryevent.Fromthispointofview,wemaydistinguishthreetypesofmeasures(Lenz&Shoshani,1997):

    . Flow.measures:.Theyrefertoatimeperiod,andarecumulativelyevaluatedattheendofthatperiod.Examplesarethenumberofproductssoldinaday,themonthlyrevenue,thenumberofthoseborninayear.

    . Stock.measures:Theyareevaluatedatparticularmomentsintime.Examplesarethenumberofproductsinawarehouse,thenumberofinhabitantsofacity,thetemperaturemeasuredbyagauge.

    . Unit.measures:Theyareevaluatedatparticularmomentsintime,buttheyareexpressedinrelativeterms.Examplesaretheunitpriceofaproduct,thediscountpercentage,theexchangerateofacurrency.

    Theaggregationoperatorsthatcanbeusedonthethreetypesofmeasuresaresum-marizedinTable3.

    arc/attribute todayforyesterday yesterdayfortoday todayoryesterday

    customer-resp. agent YES YES YES

    customer-city YES YES

    sale district YES

    Table 2. Temporal scenarios for the INVOICE fact

    temporalhierarchies nontemporalhierarchies

    flow measures SUM,AVG,MIN,MAX SUM,AVG,MIN,MAX

    stockmeasures AVG,MIN,MAX SUM,AVG,MIN,MAX

    unitmeasures AVG,MIN,MAX AVG,MIN,MAX

    Table 3. Valid aggregation operators for the three types of measures (Lenz, 1997)

  • 8 Rizzi

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    Definition 15:Ameasureissaidtobeadditivealongadimensionifitsvaluescanbeaggregatedalongthecorrespondinghierarchybythesumoperator,otherwise it is callednonadditive.Anonadditivemeasure isnonaggregableifnootheraggregationoperatorcanbeusedonit.

    Table 3 shows that, in general, flow measures are additive along all dimensions, stockmeasuresarenonadditivealongtemporalhierarchies,andunitmeasuresarenonadditivealongalldimensions.Ontheinvoicescheme,mostmeasuresareadditive.Forinstance,quantity has flow type:thetotalquantityinvoicedinamonthisthesumofthequantitiesinvoicedinthesingledaysofthatmonth.Measureunit pricehasunittypeandisnonadditivealongalldimensions.Thoughitcannotbesummedup,itcanstillbeaggregatedbyusingoperatorssuchasaverage,maximum,andminimum.Sinceadditivityisthemostfrequentcase,inordertosimplifythegraphicnotationintheDFM,onlytheexceptionsarerepresentedexplicitly.Inparticular,ameasureisconnectedtothedimensionsalongwhichitisnonadditivebyadashedlinelabeledwiththeotheraggregationoperators(ifany)whichcanbeusedinstead.Ifameasureisaggregatedthroughthesameoperatoralongalldimensions,thatoperatorcanbesimplyreportedonitsside(seeforinstance unit priceinFigure4).

    Approaches. to.Conceptual.Design

    Inthissectionwediscusshowconceptualdesigncanbeframedwithinamethod-ology for DW design. The approaches to DW design are usually classified in two categories(Winter&Strauch,2003):

    Data-driven(orsupply-driven)approachesthatdesigntheDWstartingfromadetailedanalysisofthedatasources;userrequirementsimpactondesignbyallowingthedesignertoselectwhichchunksofdataarerelevantfordecisionmakingandbydeterminingtheirstructureaccordingtothemultidimensionalmodel(Golfarellietal.,1998;Hsemannetal.,2000).

    Requirement-driven(ordemand-driven)approachesstartfromdeterminingtheinformationrequirementsofendusers,andhowtomaptheserequirementsontotheavailabledatasourcesisinvestigatedonlya posteriori(Prakash&Gosain,2003;Schiefer,List&Bruckner,2002).

    Whiledata-drivenapproachessomehowsimplifythedesignofETL(extraction,trans-formation,andloading),sinceeachdataintheDWisrootedinoneormoreattributes

  • Conceptual Modeling Solutions for the Data Warehouse

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    ofthesources,theygiveuserrequirementsasecondaryroleindeterminingtheinfor-mationcontentsforanalysis,andgivethedesignerlittlesupportinidentifyingfacts,dimensions,andmeasures.Conversely,requirement-drivenapproachesbringuserrequirementstotheforeground,butrequirealargereffortwhendesigningETL.

    Data-Driven.Approaches

    Data-drivenapproachesarefeasiblewhenallofthefollowingaretrue:(1)detailedknowledgeofdatasourcesisavailablea priorioreasilyachievable;(2)thesourceschemata exhibit a gooddegreeof normalization; (3) the complexityof sourceschemataisnothigh.Inpractice,whenthechosenarchitecturefortheDWreliesonareconciled level(oroperational data store)theserequirementsarelargelysat-isfied: in fact, normalization and detailed knowledge are guaranteed by the source integrationprocess.Thesameholds,thankstoacarefulsourcerecognitionactivity,inthefrequentcasewhenthesourceisasinglerelationaldatabase,well-designedandnotverylarge.Inadata-drivenapproach,requirementanalysisistypicallycarriedoutinformally,basedonsimplerequirementglossaries(Lechtenbrger,2001)ratherthanonformaldiagrams.Conceptualdesignisthenheavilyrootedonsourceschemataandcanbelargelyautomated.Inparticular,thedesignerisactivelysupportedinidentifyingdimensionsandmeasures,inbuildinghierarchies,indetectingconvergencesandsharedhierarchies.Forinstance,theapproachproposedbyGolfarellietal.(1998)consists of five steps that, starting from the source schema expressed either by an E/Rschemaorarelationalschema,createtheconceptualschemafortheDW:

    1. Choosefactsofinterestonthesourceschema2. Foreachfact,buildanattribute treethatcapturesthefunctionaldependencies

    expressedbythesourceschema3. Edittheattributetreesbyadding/deletingattributesandfunctionaldependencies4. Choosedimensionsandmeasures5. Createthefactschemata

    Whilestep2iscompletelyautomated,someadvancedconstructsoftheDFMaremanuallyappliedbythedesignerduringstep5.On-the-field experience shows that, when applicable, the data-driven approach is preferablesinceitreducestheoveralltimenecessaryfordesign.Infact,notonlyconceptualdesigncanbepartiallyautomated,butevenETLdesignismadeeasiersincethemappingbetweenthedatasourcesandtheDWisderivedatnoadditionalcostduringconceptualdesign.

  • 20 Rizzi

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    Requirement-Driven.Approaches

    Conversely,withinarequirement-drivenframework,intheabsenceofknowledgeofthesourceschema,thebuildingofhierarchiescannotbeautomated;themainassuranceofasatisfactoryresultistheskillandexperienceofthedesigner,andthedesignersabilitytointeractwiththedomainexperts.Inthiscaseitmaybeworthadoptingformaltechniquesforspecifyingrequirementsinordertomoreaccuratelycaptureusersneeds;forinstance,thegoal-orientedapproachproposedbyGiorgini,Rizzi,andGarzetti(2005)isbasedonanextensionoftheTroposformalismandincludesthefollowingsteps:

    1. Create,intheTroposformalism,anorganizational modelthatrepresentsthestakeholders,theirrelationships,theirgoalsaswellastherelevantfactsfortheorganizationandtheattributesthatdescribethem.

    2. Create,intheTroposformalism,adecisional modelthatexpressestheanalysisgoalsofdecisionmakersandtheirinformationneeds.

    3. Createpreliminaryfactschematafromthedecisionalmodel.4. Editthefactschemata,forinstance,bydetectingfunctionaldependenciesbe-

    tweendimensions,recognizingoptionaldimensions,andunifyingmeasuresthatonlydifferfortheaggregationoperator.

    This approach is, in our view, more difficult to pursue than the previous one. Neverthe-less,itistheonlyalternativewhenadetailedanalysisofdatasourcescannotbemade(forinstance,whentheDWisfedfromanERPsystem),orwhenthesourcescomefromlegacysystemswhosecomplexitydiscouragesrecognitionandnormalization.

    Mixed.Approaches

    Finally,alsoafewmixed approachestodesignhavebeendevised,aimedatjoiningthefacilitiesofdata-drivenapproacheswiththeguaranteesofrequirement-drivenones(Bonifati,Cattaneo,Ceri,Fuggetta,&Paraboschi,2001;Giorginietal.,2005).Heretheuserrequirements,capturedbymeansofagoal-orientedformalism,arematchedwiththeschemaofthesourcedatabasetodrivethealgorithmthatgener-atestheconceptualschemafortheDW.Forinstance,theapproachproposedbyGiorginietal.(2005)encompassesthreephases:

    1. Create,intheTroposformalism,anorganizational modelthatrepresentsthestakeholders,theirrelationships,theirgoals,aswellastherelevantfactsfortheorganizationandtheattributesthatdescribethem.

  • Conceptual Modeling Solutions for the Data Warehouse 2

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    2. Create,intheTroposformalism,adecisional modelthatexpressestheanalysisgoalsofdecisionmakersandtheirinformationneeds.

    3. Map facts, dimensions, and measures identified during requirement analysis ontoentitiesinthesourceschema.

    4. Generateapreliminaryconceptualschemabynavigatingthefunctionalde-pendenciesexpressedbythesourceschema.

    5. Editthefactschematatofullymeettheuserexpectations.

    Notethat,thoughstep4maybebasedonthesamealgorithmemployedinstep2ofthedata-drivenapproach,herenavigationisnotblindbutratheritisactivelybiasedbytheuserrequirements.Thus,thepreliminaryfactschematageneratedheremaybeconsiderablysimplerandsmallerthanthoseobtainedinthedata-drivenap-proach.Besides,whileinthatapproachtheanalystisaskedforidentifyingfacts,dimensions, and measures directly on the source schema, here such identification isdrivenbythediagramsdevelopedduringrequirementanalysis.Overall,themixedframeworkisrecommendablewhensourceschemataarewell-knownbuttheirsizeandcomplexityaresubstantial.Infact,thecostforamorecarefulandformalanalysisofrequirementisbalancedbythequickeningofcon-ceptualdesign.

    Open. Issues

    A lot of work has been done in the field of conceptual modeling for DWs; never-thelesssomeveryimportantissuesstillremainopen.Wereportsomeoftheminthissection,astheyemergedduringjointdiscussionatthePerspective Seminar on Data Warehousing at the CrossroadsthattookplaceatDagstuhl,GermanyonAugust2004.

    Lack. of. a. standard: Though several conceptual models have been pro-posed,noneofthemhasbeenacceptedasastandardsofar,andallvendorsproposetheirownproprietarydesignmethods.Weseetwomainreasonsforthis:(1)thoughtheconceptualmodelsdevisedaresemanticallyrich,someofthemodeledpropertiescannotbeexpressedinthetargetlogicalmodels,sothetranslationfromconceptualtologicalisincomplete;and(2)commercialCASEtoolscurrentlyenabledesignerstodirectlydrawlogicalschemata,thusno industrial push is given to any of the models. On the other hand, a unified conceptualmodelforDWs,implementedbysophisticatedCASEtools,wouldbeavaluablesupportforboththeresearchandindustrialcommunities.

  • Rzz

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    Design.patterns:.Insoftwareengineering,designpatternsareaprecioussup-portfordesignerssincetheyproposestandardsolutionstoaddresscommonmodelingproblems.Recently,somepreliminaryattemptshavebeenmadetoidentifyrelevantpatternsformultidimensionaldesign,aimedatassistingDWdesignersduringtheirmodelingtasksbyprovidinganapproachforrecogniz-ingdimensionsinasystematicandusableway(Jones&Song,2005).Thoughwe agree that DW design would undoubtedly benefit from adopting a pattern-basedapproach,andwealsorecognizetheutilityofpatternsinincreasingtheeffectivenessofteachinghowtodesign,webelievethatfurtherresearchisnecessaryinordertoachieveamorecomprehensivecharacterizationofmul-tidimensionalpatternsforbothconceptualandlogicaldesign.

    Modeling.security:Informationsecurityisaseriousrequirementthatmustbecarefullyconsideredinsoftwareengineering,notinisolationbutasanissueunderlyingallstagesofthedevelopmentlifecycle,fromrequirementanalysistoimplementationandmaintenance.TheproblemofinformationsecurityisevenbiggerinDWs,asthesesystemsareusedtodiscovercrucialbusinessinformation in strategic decisionmaking. Some approaches to security inDWs,focused,forinstance,onaccesscontrolandmultilevelsecurity,canbefoundintheliterature(see,forinstance,Priebe&Pernul,2000),butneitherofthemtreatssecurityascomprisingallstagesoftheDWdevelopmentcycle.Besides,theclassicalsecuritymodelusedintransactionaldatabases,centeredontables,rows,andattributes,isunsuitableforDWandshouldbereplacedbyanadhocmodelcenteredonthemainconceptsofmultidimensionalmodel-ingsuchasfacts,dimensions,andmeasures.

    Modeling.ETL:.ETLisacornerstoneofthedatawarehousingprocess,anditsdesignandimplementationmayeasilytake50%ofthetotaltimeforsettingupaDW.IntheliteraturesomeapproachesweredevisedforconceptualmodelingoftheETLprocessfromeitherthefunctional(Vassiliadis,Simitsis,&Skia-dopoulos,2002),thedynamic(Bouzeghoub,Fabret,&Matulovic,1999),orthestatic(Calvanese,DeGiacomo,Lenzerini,Nardi,&Rosati,1998)pointsofview.Recently,alsosomeinterestingworkontranslatingconceptualintologicalETLschematahasbeendone(Simitsis,2005).Nevertheless,issuessuchastheoptimizationofETLlogicalschemataarenotverywellunderstood.Besides,thereisaneedfortechniquesthatautomaticallypropagatechangesoccurredinthesourceschemastotheETLprocess.

    Conclusion

    InthischapterwehaveproposedasetofsolutionsforconceptualmodelingofaDWaccordingtotheDFM.Since1998,theDFMhasbeensuccessfullyadopted,inreal

  • Conceptual Modeling Solutions for the Data Warehouse 2

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    DW projects mainly in the fields of retail, large distribution, telecommunications, health,justice,andinstruction,whereithasprovedexpressiveenoughtocaptureawidevarietyofmodelingsituations.Remarkably,inmostprojectstheDFMwasalsousedtodirectlysupportdialoguewithendusersaimedatvalidatingrequire-ments,andtoexpresstheexpectedworkloadfortheDWtobeusedforlogicalandphysicaldesign.ThiswasmadepossiblebytheadoptionofaCASEtoolnamedWAND (warehouseintegrateddesigner),entirelydevelopedattheUniversityofBologna,thatassiststhedesignerinstructuringaDW.WANDcarriesoutdata-drivenconceptualdesigninasemiautomaticfashionstartingfromthelogicalschemeofthe source database (see Figure 8), allows for a core workload to be defined on the conceptualscheme,andcarriesoutworkload-basedlogicaldesigntoproduceanoptimizedrelationalschemefortheDW(Golfarelli&Rizzi,2001).Overall, our on-the-field experience confirmed that adopting conceptual modeling withinaDWprojectbringsgreatadvantagessince:

    Conceptual schemata are the best support for discussing, verifying, and refining user specifications since they achieve the optimal trade-off between expres-sivityandclarity.Starschematacouldhardlybeusedtothispurpose.

    Forthesamereason,conceptualschemataareanirreplaceablecomponentofthedocumentationfortheDWproject.

    Theyprovidea solidandplatform-independent foundation for logical andphysicaldesign.

    TheyareaneffectivesupportformaintainingandextendingtheDW. Theymaketurn-overofdesignersandadministratorsonaDWprojectquicker

    andsimpler.

    Figure 8. Editing a fact schema in WAND

  • Rzz

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    References

    Abell, A., Samos, J., & Saltor, F. (2002, July 17-19). YAM2 (Yet another multidi-mensionalmodel):AnextensionofUML.InProceedings of the International Database Engineering & Applications Symposium(pp.172-181).Edmonton,Canada.

    Agrawal,R.,Gupta,A.,&Sarawagi,S.(1995).Modeling multidimensional databases(IBMResearchReport).IBMAlmadenResearchCenter,SanJose,CA.

    Bonifati,A.,Cattaneo,F.,Ceri,S.,Fuggetta,A.,&Paraboschi,S.(2001).Designingdatamartsfordatawarehouses.ACM Transactions on Software Engineering and Methodology,10(4),452-483.

    Bouzeghoub,M.,Fabret,F.,&Matulovic,M.(1999).Modelingdatawarehouserefreshment process as a workflow application. In Proceedings of the Inter-national Workshop on Design and Management of Data Warehouses,Heidel-berg,Germany.

    Cabibbo,L.,&Torlone,R.(1998,March23-27).Alogicalapproachtomultidimen-sionaldatabases.InProceedings of the International Conference on Extending Database Technology (pp.183-197).Valencia,Spain.

    Calvanese,D.,DeGiacomo,G.,Lenzerini,M.,Nardi,D.,&Rosati,R.(1998,August20-22).Informationintegration:Conceptualmodelingandreasoningsupport.InProceedings of the International Conference on Cooperative Information Systems (pp.280-291).NewYork.

    Datta,A.,&Thomas,H.(1997).Aconceptualmodelandalgebraforon-lineana-lyticalprocessing indatawarehouses. InProceedings of the Workshop for Information Technology and Systems(pp.91-100).

    Fahrner,C.,&Vossen,G.(1995).Asurveyofdatabasetransformationsbasedontheentity-relationshipmodel.Data & Knowledge Engineering,15(3),213-250.

    Franconi,E.,&Kamble,A.(2004a,June7-11).TheGMDdatamodelandalgebraformultidimensionalinformation.InProceedings of the Conference on Ad-vanced Information Systems Engineering(pp.446-462).Riga,Latvia.

    Franconi,E.,&Kamble,A.(2004b).Adatawarehouseconceptualdatamodel.InProceedings of the International Conference on Statistical and Scientific Da-tabase Management(pp.435-436).

    Giorgini,P.,Rizzi,S.,&Garzetti,M.(2005,November4-5).Goal-orientedrequirementanalysisfordatawarehousedesign.InProceedings of the ACM International Workshop on Data Warehousing and OLAP(pp.47-56).Bremen,Germany.

    Golfarelli,M.,Maio,D.,&Rizzi,S.(1998).Thedimensionalfactmodel:Acon-ceptualmodel for data warehouses. International Journal of Cooperative Information Systems,7(2-3),215-247.

  • Conceptual Modeling Solutions for the Data Warehouse 2

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    Golfarelli,M.,&Rizzi,S.(2001,April2-6).WAND:ACASEtoolfordataware-housedesign.InDemo Proceedings of the International Conference on Data Engineering(pp.7-9).Heidelberg,Germany.

    Gyssens,M.,&Lakshmanan,L.V.S.(1997).Afoundationformulti-dimensionaldatabases.InProceedings of the International Conferenceon Very Large Data Bases(pp.106-115),Athens,Greece.

    Hsemann,B.,Lechtenbrger,J.,&Vossen,G.(2000).Conceptualdatawarehousedesign.InProceedings of the International Workshop on Design and Manage-ment of Data Warehouses,Stockholm,Sweden.

    Jones,M.E.,&Song,I.Y.(2005).Dimensionalmodeling:Identifying,classifying&applyingpatterns.InProceedings of the ACM International Workshop on Data Warehousing and OLAP(pp.29-38).Bremen,Germany.

    Kimball,R.(1996).The data warehouse toolkit.NewYork:JohnWiley&Sons.Lechtenbrger ,J. (2001).Data warehouse schema design (Tech.Rep.No.79).

    DISDBISAkademischeVerlagsgesellschaftAkaGmbH,Germany.Lenz,H.J.,&Shoshani,A.(1997).SummarizabilityinOLAPandstatisticaldata-

    bases.InProceedings of the 9th International Conference on Statistical and Scientific Database Management(pp.132-143).Washington,DC.

    Li, C., &Wang, X. S. (1996).A data model for supporting on-line analyticalprocessing.InProceedings of the International Conference on Information and Knowledge Management(pp.81-88).Rockville,Maryland.

    Lujn-Mora,S.,Trujillo,J.,&Song,I.Y.(2002).ExtendingtheUMLformultidi-mensionalmodeling.InProceedings of the International Conference on the Unified Modeling Language(pp.290-304).Dresden,Germany.

    Niemi,T.,Nummenmaa,J.,&Thanisch,P.(2001,June4).Logicalmultidimensionaldatabasedesignforraggedandunbalancedaggregation.Proceedings of the 3rd International Workshop on Design and Management of Data Warehouses, Interlaken,Switzerland (p.7).

    Nguyen,T.B.,Tjoa,A.M.,&Wagner,R.(2000).Anobject-orientedmultidimen-sionaldatamodelforOLAP.InProceedings of the International Conference on Web-Age Information Management(pp.69-82).Shanghai,China.

    Pedersen,T.B.,&Jensen,C.(1999).Multidimensionaldatamodelingforcomplexdata.InProceedings of the International Conference on Data Engineering(pp.336-345).Sydney,Austrialia.

    Prakash,N.,&Gosain,A.(2003).Requirementsdrivendatawarehousedevelop-ment.InProceedings of the Conference on Advanced Information Systems EngineeringShort Papers,Klagenfurt/Velden,Austria.

    Priebe,T.,&Pernul,G.(2000).TowardsOLAPsecuritydesign:Surveyandre-searchissues.InProceedings of the ACM International Workshop on Data Warehousing and OLAP(pp.33-40).Washington,DC.

  • Rzz

    Copyright2007,IdeaGroupInc.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIdeaGroupInc.isprohibited.

    SAP.(1998).Data modeling with BW.SAPAmericaInc.andSAPAG,Rockville,MD.

    Sapia, C., Blaschka, M., Hofling, G., & Dinter, B. (1998). Extending the E/R model forthemultidimensionalparadigm.InProceedings of the International Con-ference on Conceptual Modeling,Singapore.

    Schiefer, J.,List,B.,&Bruckner,R. (2002).Aholistic approach formanagingrequirements of datawarehouse systems. InProceedings of the Americas Conference on Information Systems.

    Sen,A.,&Sinha,A.P.(2005).Acomparisonofdatawarehousingmethodologies.Communications of the ACM,48(3),79-84.

    Simitsis,A.(2005).MappingconceptualtologicalmodelsforETLprocesses.InProceedings of the ACM International Workshop on Data Warehousing and OLAP (pp.67-76).Bremen,Germany.

    Tryfona,N.,Busborg,F.,&BorchChristiansen,J.G.(1999).starER:Aconceptualmodelfordatawarehousedesign.InProceedings of the ACM International Workshop on Data Warehousing and OLAP, KansasCity,Kansas(pp.3-8).

    Tsois,A.,Karayannidis,N.,&Sellis,T.(2001).MAC:ConceptualdatamodelingforOLAP.InProceedings of the International Workshop on Design and Man-agement of Data Warehouses(pp.5.1-5.11).Interlaken,Switzerland.

    Vassiliadis,P.(1998).Modelingmultidimensionaldatabases,cubesandcubeopera-tions.InProceedings of the 10th International Conference on Statistical and Scientific Database Management,Capri,Italy.

    Vassiliadis,P.,Simitsis,A.,&Skiadopoulos,S.(2002,November8).ConceptualmodelingforETLprocesses.InProceedings of the ACM International Work-shop on Data Warehousing and OLAP(pp.14-21).McLean,VA.

    Winter,R.,&Strauch,B.(2003).Amethodfordemand-driveninformationrequire-mentsanalysisindatawarehousingprojects.InProceedings of the Hawaii International Conference on System Sciences, Kona(pp.1359-1365).

    Endnote

    1 Inthischapterwewillonlyconsiderdynamicityattheinstancelevel.Dy-namicityattheschemalevelisrelatedtotheproblemofevolutionofDWsandisoutsidethescopeofthischapter.