ddb_presentation3skinny fact tables

Upload: calebkpabiteytetteh

Post on 06-Jul-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    1/26

     

    • As the fact table contains the vast volume of

    records it is important that it is memory spaceefficient

    • Foreign keys are usually represented in integer

    form and do not require much memory space

    • Facts too are often numeric properties and canusually be represented as integers (contrast todimensional attributes which are usually long textstrings)

    • This space efficiency is critical to the memoryspace consumption of the data warehouse

      Skinny fact tables

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    2/26

     

    Keys

    • hoose the data warehouse keys to be

    meaningless surrogate keys! "et a surrogate key be a simple integer

    ! #$byte ($$$$$$$$%$$$$$$$$%$$$$$$$$%$$$$$$$$)

    can contain &'& values ( & billion positiveintegers% starting with )

    Keys

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    3/26

     

    Key

    • *se surrogate keys also for the Time dimension

    ! +,"$based date key% is typically - bytes% so # bytesare wasted! bypassing .oins leads to embedding knowledge of thecalendar in the application% rather than reading itfrom the time dimension

    ! it is not possible to encode a data stamp as /0 do notknow1% /0t has not happen yet1% etc

    • Avoid smart keys• Avoid production keys! production may decide to reuse keys! the company may acquire a competitor and therebychange the key building rules

    ! changed record% but deliberately not changed key

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    4/26

     

    2ntity$relationship modelling$ a logical design technique to eliminate data

    redundancy to keep consistency and storageefficiency

    $ makes transaction simple and deterministic$ 23 models for enterprise are usually complex%

    e4g4 they often have hundreds% or eventhousands% of entities5tables

    6imensional modelling$ a logical design technique that present data in a

    intuitive way and that allow high$performanceaccess$ aims at model decision support data$ easier to navigate for the user and high

    performance

    Dimensional modelling vs. ER-modelling

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    5/26

     

    Why dimensional modelling?

    • the logical model is easy understand• a predictable standard framework for end user

    applications• the logical design can be done nearly independent of

    expected query pattern• handle changes easy $ at least adding new dimensional

    attributes• high performance /browsing1 across the attributes%

    eliminating .oins and make use bit vector indexes• strategy to handling aggregates% e4g4 summery records that

    are logical redundant with base table to enhance query

    performance• the database engine can make strong assumption how tooptimise

    • strategies for handling slowly changing dimensions%heterogeneous products% event$handling (/factless fact

    tables1)

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    6/26

     

    Steps in the Design Process

    4 hoose a business process to model

    A business process is a ma.or operational process in an organisation% that is supported by some kind

     of a legacy system(s) from which data can be

     collected% e4g4% orders% invoices% shipments%

     inventory4

    &4 hoose the grain of the business process

    The grains is the level of detail at which the data isrepresented in the 674 Typical grains are

    individual transactions% individual daily (monthly)

    snapshots4

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    7/26

     

    eps n e es gn rocess

    '4 hoose the dimensions that will apply toeach fact table record4 Typicaldimensions are time% product% customer%

    store% etc4

    #4hoose the measured facts that will

    populate fact table 24g4% quantity sold%dollars sold

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    8/26

     

    Eample of a Dimensional !odel

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    9/26

     

    "ggregation• Aggregations can be created on$the$fly or by theprocess of pre$aggregation

    • An aggregate is a fact table record representing asummarisation of base$level fact table records

    ! ategory$level product aggregates by store by day! 6istrict$level store aggregates by product by day! 8onthly sales aggregates by product by store! ategory$level product aggregates by store district

    by day! ategory$level product aggregates by store districtby month

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    10/26

     

    #$%P vs. #$"P

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    11/26

     

    How to store Aggregates?

     

    • as new "evel fields in an already existing

    Fact table

    • as new fact tables

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    12/26

     

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    13/26

     

    "n Eample

    9 :sold napkin5day9 :sold tissue5day9 :sold paper5day

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    14/26

     

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    15/26

     

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    16/26

     

    ;ew Fact table for Aggregates

    The creation of aggregate fact tablerequires the creation of<a derivative dimensionan artificial key for each new derivative dimension

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    17/26

     

    &o' to store "ggregates

    • as new "evel fields in an already existing Fact table

    ! problems with double count! visible for the users

    • as new fact tables

    = no problems with double count= invisible for the users= are easily introduced and5or reduced at differentpoints in time

    = simpler metadata= simpler choice of key= the si>e of the field for the summarised data doesnot increase the si>e of the field for the basic data

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    18/26

     

    "ggregation (avigator 

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    19/26

     

    "n Eample of S)$ )*ery

    A E l f SQL Q (2)

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    20/26

     

     An Example of SQL Query (2)

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    21/26

     

     Aggregation Navigator 

    • 0nsulates end user applications from thechanging portfolio of aggregates

    • Allows the 6?A to dynamically andseamlessly for the end user ad.ust theaggregates without having to roll overthe applications base

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    22/26

     

     Aggregation Summary

    • @re$aggregation demands more storagespace but provides better queryperformance

    • "owest level of aggregation is determinedby the granularity of the fact table

    • Aggregation is easier when facts are alladditive

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    23/26

     

    Fats an! A!!itive "roperty

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    24/26

     

    Semia!!itive fat Example

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    25/26

     

    Numeri #easures of $ntensity

  • 8/17/2019 DDB_presentation3Skinny Fact Tables

    26/26

     

    En!