03 data warehousing

Upload: kiran

Post on 09-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 03 Data Warehousing

    1/17

    Data Warehousing

    Chapter # 3

    Carlo Vercellis

  • 8/8/2019 03 Data Warehousing

    2/17

    Data Warehouse data warehouse is the foremost

    repository for the data available fordeveloping business intelligencearchitectures and decision supportsystems.

    Data Warehousing indicates the wholeset of interrelated activities involvedin designing, implementing and usinga data warehouse.

  • 8/8/2019 03 Data Warehousing

    3/17

    Data Sources of Data Warehouse Internal Data

    External Data

    Personal Data

  • 8/8/2019 03 Data Warehousing

    4/17

    Internal Data

    Transactional or Operational Systemscapturing day to day businessprocesses back-ofce systems, that collect basic

    transactional records such as orders, invoices,inventories, production and logistics data;

    front-ofce systems, that contain dataoriginating from call-center activities,customer assistance, execution of marketingcampaigns;

    web-based systems, that gather salestransactions on e-commerce web sites, visitsto websites, data available on forms lled out

    by existing and prospective customers.

  • 8/8/2019 03 Data Warehousing

    5/17

    External Data Data present externally to an organization

    For example:

    Some agencies gather and make available

    data relative to sales, market share andfuture trend predictions for specicbusiness industries, as well as economicand nancial indicators.

    Other agencies provide data market

    surveys and consumer opinions collectedthrough questionnaires.

    Geographic Information Systems (GIS),which represent a set of applications foracquiring, organizing, storing and

    presenting territorial data.

  • 8/8/2019 03 Data Warehousing

    6/17

    Personal Data Information and Personal

    Assessments stored inside

    worksheets or local databases locatedin their computers.

  • 8/8/2019 03 Data Warehousing

    7/17

    OLTP versus OLAP Software applications that are at the heart

    of operational systems are referred to ason-line transaction processing (OLTP).

    The whole set of tools aimed at performingbusiness intelligence analyses andsupporting decision-making processes goby the name of on-line analytical

    processing (OLAP).

    We can therefore assume that the function of a data warehouseis to provide input data to OLAP applications.

  • 8/8/2019 03 Data Warehousing

    8/17

  • 8/8/2019 03 Data Warehousing

    9/17

    Data MartsFunctional or departmental data

    warehouse of a smaller size and a

    more specic type than the overallcompany data warehouse.

  • 8/8/2019 03 Data Warehousing

    10/17

  • 8/8/2019 03 Data Warehousing

    11/17

    Architecture & Functions of a DW

  • 8/8/2019 03 Data Warehousing

    12/17

    Data Warehouse: Implementation

    Approach

    Top-down. The top-down methodology is based on the overalldesign of the data warehouse, and is therefore more

    systematic. However, it implies longer development times andhigher risks of not being completed within schedule since thewhole data warehouse is actually being developed.

    Bottom-up. The bottom-up method is based on the use ofprototypes and therefore system extensions are madeaccording to a step-by-step scheme. This approach is usuallyquicker, provides more tangible results but lacks an overallvision of the entire system to be developed.

    Mixed. The mixed methodology is based on the overall design ofthe data warehouse, but then proceeds with a prototypingapproach, by sequentially implementing different parts of the

    entire system. This approach is highly practical and usuallypreferable, since it allows small and controlled steps to be

  • 8/8/2019 03 Data Warehousing

    13/17

    ETL Tools Extraction. During the rst phase, data are extracted

    from the available internal and external sources.

    Transformation. The goal of the cleaning andtransformation phase is to improve the quality of the

    data extracted from the different sources, through thecorrection of inconsistencies, inaccuracies and missingvalues.

    Loading. Finally, after being extracted andtransformed, data are loaded into the tables of the

    data warehouse to make them available to analystsand decision support applications.

  • 8/8/2019 03 Data Warehousing

    14/17

    Dimensional Model: Star Schema

  • 8/8/2019 03 Data Warehousing

    15/17

    Three-Dimensional Cube

  • 8/8/2019 03 Data Warehousing

    16/17

    Hierarchies of concepts and OLAP

    operations Drill-Down

    Drill Up

    Slice & Dice Pivot

  • 8/8/2019 03 Data Warehousing

    17/17

    Materialization of cubes of data OLAP analyses developed by knowledge

    workers may need to access theinformation associated with severalcuboids, based on the specic queries andanalyses being carried out. In order toguarantee adequate response time, it mightbe useful to design a data warehousewhere all (or at least a large portion of)values of the measures of interest

    associated with all possible cuboids arepre-calculated. This approach is termedfull materialization of the informationrelative to the data cubes.