01 introduction to dw

Upload: balasubramanian-sharma

Post on 02-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 01 Introduction to DW

    1/64

    Systech Education

    DW Concepts

    Introduction to DW

    Author: Deepak Natarajan &Aditya Gollapudi

  • 7/27/2019 01 Introduction to DW

    2/64

    Introduction to DW Systech Education 2

    Course Agenda

    Session 1 What is DW?

    Elements of DW

    Session 2 OLTP vs. OLAP

    Types of OLAP Session 3

    Implementing Life Cycle

  • 7/27/2019 01 Introduction to DW

    3/64

    Introduction to DW Systech Education 3

    Purpose

    The purpose of this module is to give an insightinto the basic concepts and terminology of data

    warehousing and business intelligence

  • 7/27/2019 01 Introduction to DW

    4/64

    Introduction to DW Systech Education 4

    Objective

    Upon completion of this chapter a participant can

    Define a Data warehouse.

    Understand the elements of DW.

    Differentiate between OLTP and OLAP system.

    Explain the types of OLAP system

    Understand the implementation life cycle

  • 7/27/2019 01 Introduction to DW

    5/64

    Systech Education

    What is DW?

    Data Warehouse

  • 7/27/2019 01 Introduction to DW

    6/64

    Introduction to DW Systech Education 6

    The ProblemIBM

    VAX

    Freds PC

    Hunter #?

    WLIC

    WASTE

    ?

    ?

    HP

    SI ERS

    ?

  • 7/27/2019 01 Introduction to DW

    7/64

    Introduction to DW Systech Education 7

    The Solution

    DATAWAREHOUSE

    meta data

  • 7/27/2019 01 Introduction to DW

    8/64

    Introduction to DW Systech Education 8

    Data Warehouse

    A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in

    support of management's decision making process.

  • 7/27/2019 01 Introduction to DW

    9/64

    Introduction to DW Systech Education 9

    Data Warehouse

    Subject Oriented:

    Data that gives information about a particular subject

    instead of about a company's ongoing operations.

    Integrated:

    Data that is gathered into the data warehouse from a

    variety of sources and merged into a coherent whole.

  • 7/27/2019 01 Introduction to DW

    10/64

    Introduction to DW Systech Education 10

    Data Warehouse

    Time-variant:

    All data in the data warehouse is identified with a

    particular time period.

    Non-volatile

    Data is stable in a data warehouse. More data is added

    but data is never removed. This enables management togain a consistent picture of the business.

  • 7/27/2019 01 Introduction to DW

    11/64

    Systech Education

    Elements of DW

    Source SystemStaging AreaDimension ModelData MartData Warehouse ODSEDWKimballs ApproachInmons Approach

  • 7/27/2019 01 Introduction to DW

    12/64

    Introduction to DW Systech Education 12

    Source System

    An operational system of record whose function isto capture the transactions of the business.

    Characteristics of Source System:

    Priorities are uptime and availability.

    Queries against source system are narrow and severely

    restricted. Maintains little historical data.

  • 7/27/2019 01 Introduction to DW

    13/64

    Introduction to DW Systech Education 13

    Staging Area

    It is a storage area and set of processes that clean,transform, combine, de-duplicate, household,

    archive and prepare source data for use in the datawarehouse.

    Characteristics of Staging Area:

    It is layered between the source system and presentationserver.

    It does not provide query and presentation services.

  • 7/27/2019 01 Introduction to DW

    14/64

    Introduction to DW Systech Education 14

    Dimensional Model

    It is a technique for modeling data that is alternative toentity-relationship (E-R) modeling. A dimensional model

    contains the same information as an E-R model butpackages the data in a symmetric format.

    Components Of Dimensional Model Fact Table

    Is the primary table in each dimensional model that is meant to contain

    measurements of the business. Dimension Table

    Each dimension is defined by its primary key that serves as the basisfor referential integrity with any given fact table.

  • 7/27/2019 01 Introduction to DW

    15/64

    Introduction to DW Systech Education 15

    Data Mart

    A logical subset of the complete data warehouse. Adata mart is usually built for a single part of the

    business and organized around single businessprocess.

    Every data mart must be represented by a dimensional

    model. Basis for top-down and bottom-up approach in data

    warehouse.

  • 7/27/2019 01 Introduction to DW

    16/64

    Introduction to DW Systech Education 16

    Data Warehouse

    The queryable source of data in the enterprise. Thedata warehouse is the union of all the constituent

    data marts. Historical data is maintained.

    Data is fed from the data staging area.

    It is also frequently updated on a controlled load basis asdata is corrected, snapshots are accumulated and label arechanged.

  • 7/27/2019 01 Introduction to DW

    17/64

    Introduction to DW Systech Education 17

    Operational Data Store

    An ODS is an integrated database of operationaldata. Its sources include legacy systems and it

    contains current or near term data. An ODS maycontain 30 to 60 days of information, while a datawarehouse typically contains years of data.

  • 7/27/2019 01 Introduction to DW

    18/64

    Introduction to DW Systech Education 18

    Operational Data Store

    An ODS is usually designed to contain low level oratomic (indivisible) data such as transactions and

    prices.

    Only Data warehouse contains aggregate data.

  • 7/27/2019 01 Introduction to DW

    19/64

    Introduction to DW Systech Education 19

    Enterprise Data Warehouse

    An Enterprise Data Warehouse is a data warehousecontaining all publishable quality data of a

    permanent nature collected by an organization. Thisinevitably includes historic data from multiple datasources.

  • 7/27/2019 01 Introduction to DW

    20/64

    Introduction to DW Systech Education 20

    Operational transaction data is usually excluded dueto its volatile nature. Enterprise data warehouses are

    valuable resources, are costly to construct, andrequire a long time to evolve.

  • 7/27/2019 01 Introduction to DW

    21/64

    Introduction to DW Systech Education 21

    Kimballs Approach

    Start with clearly defined user requirements.

    Build a subject area at a time based on a star

    schema. The data warehouse is the union of all the data

    marts but only if the dimensions conform across allthe fact tables.

    Before loading the facts and dimensions, firstconcentrate on the staging area.

  • 7/27/2019 01 Introduction to DW

    22/64

    Introduction to DW Systech Education 22

    Inmons Approach

    Advocates normalized enterprise data warehouse.

    Time variant data structures.

    Dont be concerned about requirements too much

    up front.

    Build it and they will come.

  • 7/27/2019 01 Introduction to DW

    23/64

    Systech Education

    OLTP vs. OLAP

    OLTPER Model

    OLAPER Model

    Relationship between ER Model and Dimension ModelAdvantages of Dimension Modeling

  • 7/27/2019 01 Introduction to DW

    24/64

    Introduction to DW Systech Education 24

    OLTP - ER Model

    Logical design technique that seeks to eliminatedata redundancy

    conceptual data model that views the real world asentities and relationships

    Entity-Relationship diagram is used to visually

    represents data objects

  • 7/27/2019 01 Introduction to DW

    25/64

    Introduction to DW Systech Education 25

    Facts about ER Models

    The ER modeling technique is a discipline used toilluminate the microscopic relationships among data

    elements.

    The highest art form of ER modeling is to removeall redundancy in the data.

    This is immensely beneficial to transactionprocessing because transactions are made verysimple and deterministic.

  • 7/27/2019 01 Introduction to DW

    26/64

    Introduction to DW Systech Education 26

    Example:- The transaction of updating a customer'saddress may devolve to a single record lookup in a

    customer address master table. This lookup iscontrolled by a customer address key, which definesuniqueness of the customer address record andallows an indexed lookup that is extremely fast.

    It is safe to say that the success of transactionprocessing in relational databases is mostly due tothe discipline of ER modeling

  • 7/27/2019 01 Introduction to DW

    27/64

    Introduction to DW Systech Education 27

  • 7/27/2019 01 Introduction to DW

    28/64

    Introduction to DW Systech Education 28

    OLAPDimensional Modeling

    Logical design technique that seeks to present thedata in a standard framework that is intuitive and

    allows for high performance access.

    Every dimensional model is composed of

    One table with a multi part key called the fact table

    A set of smaller tables called dimension tables

  • 7/27/2019 01 Introduction to DW

    29/64

    Introduction to DW Systech Education 29

  • 7/27/2019 01 Introduction to DW

    30/64

    Introduction to DW Systech Education 30

    The Relation between Dimensional

    Modeling and Entity-RelationshipModeling

    A single entity relationship diagram breaks downinto multiple fact table diagrams

    ER diagrams are useful, but they are meant to be

    viewed in small sections, not all at once.

  • 7/27/2019 01 Introduction to DW

    31/64

    Introduction to DW Systech Education 31

    OLTP OLAP

    OLTP is a class of program thatfacilitates and manages transaction-oriented applications, typically for

    data entry and retrieval transactions

    OLAP enables a user to easily andselectively extract and view datafrom different points-of-view.

    Source of data:

    Operational data; OLTPs are theoriginal source of the data

    Source of data:

    Consolidation data;

    OLAP data comes from the various

    OLTP databases

  • 7/27/2019 01 Introduction to DW

    32/64

    Introduction to DW Systech Education 32

    OLTP OLAP

    Purpose of Data:

    To control and run

    fundamental business tasks

    Purpose of Data:

    To help with planning,

    problem solving, and

    decision support

    What the data reveals:

    A snapshot of ongoing business

    processes

    What the data reveals:

    Multi-dimensional views of variouskinds of business activities

  • 7/27/2019 01 Introduction to DW

    33/64

    Introduction to DW Systech Education 33

    OLTP OLAP

    Inserts and Updates:

    Short and fast inserts

    and updates initiated

    by end users

    Inserts and Updates:

    Short and fast inserts

    and updates initiated

    by end users

    Queries: Relatively standardized

    and simple queries returning

    relatively few records

    Queries: Often complex queries

    involving aggregations

  • 7/27/2019 01 Introduction to DW

    34/64

    Introduction to DW Systech Education 34

    OLTP OLAP

    Processing speed: Typically very fast Processing speed: Depends on the

    amount of data involved;

    batch data refreshes and

    complex queries may

    take many hours;query speed can be

    improved by creating indexes

    Space requirements: Can berelatively small

    if historical data is archived

    Space requirements: Larger due tothe existence of aggregationstructures and history data;

    requires more indexes

    than OLTP

  • 7/27/2019 01 Introduction to DW

    35/64

    Introduction to DW Systech Education 35

    OLTP OLAP

    Database design: Highlynormalized with many tables

    Database design: Typically de-normalized with fewer tables;

    use of star and/or snowflake

    schemas

    Data Access:Very frequent access;small quantities of data peroperation;

    Reading, writing,

    modifying, deletion

    Data Access: Moderate accessfrequency;large quantities of data;

    predominantly reading

    operations

  • 7/27/2019 01 Introduction to DW

    36/64

    Introduction to DW Systech Education 36

    OLTP OLAP

    Backup and recovery: Backupreligiously; operational data iscritical to run the business,

    data loss is likely to entailsignificant monetary

    loss and legal liability

    Backup and recovery: Instead ofregular backups,

    some environments

    may consider simply

    reloading the OLTP data

    as a recovery method

  • 7/27/2019 01 Introduction to DW

    37/64

    Introduction to DW Systech Education 37

    Steps Involved in Converting ER Model

    to Dimensional Model The first step in converting an ER diagram to a set of DM

    diagrams is to separate the ER diagram into its discrete

    business processes and to model each one separately. The second step is to select those many-to-many

    relationships in the ER model containing numeric andadditive nonkey facts and to designate them as fact tables.

    The third step is to denormalize all of the remaining tablesinto flat tables with single-part keys that connect directly tothe fact tables. These tables become the dimension tables.

  • 7/27/2019 01 Introduction to DW

    38/64

    Introduction to DW Systech Education 38

    Structure of a Dimensional Model evolved

    from an ER Model The master DM model of a data warehouse for a

    large enterprise will consist of somewhere between

    10 and 25 very similar-looking star join schemas. Each star join will have four to 12 dimension tables.

    If the design has been done correctly, many of these

    dimension tables will be shared from fact table tofact table.

  • 7/27/2019 01 Introduction to DW

    39/64

    Introduction to DW Systech Education 39

    Dimension Model and Drilling

    Drilling Down means adding more dimensionattributes to the SQL answer set from within a

    single star join. Drilling Up means removing more dimension

    attributes from the SQL answer set within a singlestar join.

    Drilling Across means linking separate fact tablestogether through the conformed (shared)dimensions.

  • 7/27/2019 01 Introduction to DW

    40/64

    Introduction to DW Systech Education 40

    Advantages of Dimensional Modeling

    predictable, standard framework

    predictable framework of the star join schema

    withstands unexpected changes in user behavior

    extensible to accommodate unexpected new dataelements and new design decisions

    administrative utilities and software processes helpsto manage and use aggregates

  • 7/27/2019 01 Introduction to DW

    41/64

    Systech Education

    Types of OLAP

    ROLAP

    MOLAPHOLAP

  • 7/27/2019 01 Introduction to DW

    42/64

    Introduction to DW Systech Education 42

    Types of OLAP

    ROLAP Relational OLAP. ROLAP systems store data in the

    relational database. MOLAP

    Multidimensional OLAP. MOLAP systems store data inthe multidimensional cubes.

    HOLAP HOLAP technologies attempt to combine the advantages

    of MOLAP and ROLAP.

  • 7/27/2019 01 Introduction to DW

    43/64

    Introduction to DW Systech Education 43

    ROLAP

    Relational Viewers

    SQL-

    Read

    SourceSystems

    DataWarehouseServer

    Clients

    Relational Data

  • 7/27/2019 01 Introduction to DW

    44/64

    Introduction to DW Systech Education 44

    MOLAP

    Multi Dimensional Viewers

    Multi Dimensional Data

    RDBMSServer

    MDBMSServer

    Clients

    SQL-Read

  • 7/27/2019 01 Introduction to DW

    45/64

    Introduction to DW Systech Education 45

  • 7/27/2019 01 Introduction to DW

    46/64

    Introduction to DW Systech Education 46

    HOLAP

    Multi Dimensional Viewers

    Multi Dimensional Data

    SQL ReachTrough

    RDBMSServer

    MDBMSServer

    Clients

  • 7/27/2019 01 Introduction to DW

    47/64

    Systech Education

    Implementing Life Cycle

    PlanningBusiness Requirements DefinitionDimensional Modeling

    Physical DesignData Staging Design and DevelopmentTechnical Architecture Design

  • 7/27/2019 01 Introduction to DW

    48/64

    Systech Education

    Implementing Life Cycle

    (Contd..)

    Product Selection and InstallationEnd User Application SpecificationEnd User Application Development

    DeploymentMaintenance and GrowthProject Management

  • 7/27/2019 01 Introduction to DW

    49/64

    Introduction to DW Systech Education 49

    Lifecycle Approach

    Successful implementation of a data warehousedepends on the appropriate integration of

    numerous tasks and components. You need to coordinate the many facets of a data

    warehouse and demonstrate strength across allaspects of the project for success.

    The Business Dimensional Lifecycle ensures thatthe project pieces are brought together in the rightorder and at the right time.

  • 7/27/2019 01 Introduction to DW

    50/64

    Introduction to DW Systech Education 50

    Business Dimensional Lifecycle

    ProjectPlanning

    BusinessRequirementDefinition

    TechnicalArchitecture

    Design

    DimensionalModeling

    End-UserApplication

    Specification

    ProductSelection &Installation

    PhysicalDesign

    End-UserApplication

    Development

    Data StagingDesign &

    Development

    Deployment Maintenance& Growth

    Project Management

    Technology

    Track

    Data

    Track

    Application

    Track

  • 7/27/2019 01 Introduction to DW

    51/64

    Introduction to DW Systech Education 51

    Project Planning

    The lifecycle begins with project planning.

    It addresses the definition and scoping of the data

    warehouse project, including early critical tasks likereadiness assessment and business justification.

    Then project planning focuses on resource and skill-levelstaffing requirements coupled with project task

    assignments, duration and sequencing. Project planning is dependent on the business requirements,

    as denoted by the two-way arrow between the activities.

  • 7/27/2019 01 Introduction to DW

    52/64

    Introduction to DW Systech Education 52

    Business Requirements Definition

    A data warehouses success is greatly increased by a sound

    understanding of the business end users and their analytical

    requirements. The designers must understand the key factors driving the

    business to effectively determine business requirements andtranslate them into design considerations.

    The business requirements establish the foundation for thethree parallel tracks focused on technology, data and enduser applications.

  • 7/27/2019 01 Introduction to DW

    53/64

    Introduction to DW Systech Education 53

    Dimension Modeling

    The definition of the business requirementsdetermines the data needed to address business

    users analytical requirements. Then data models to support these analyses are

    designed by the construction a matrix thatrepresents key business processes and their

    dimensionality which will serve as a blueprint toensure that the data warehouse is extensible acrossthe organization over time.

  • 7/27/2019 01 Introduction to DW

    54/64

    Introduction to DW Systech Education 54

    Dimension Modeling (Cont.)

    Coupling this data analysis with our earlierunderstanding of the business requirements, we

    then develop a dimensional model with a fact tablegrain, associated dimensions, attributes, andhierarchical drill paths and facts.

    The logical database design is completed with the

    appropriate table structure and primary/foreign keyrelationships and also the preliminary aggregationplan is also developed.

  • 7/27/2019 01 Introduction to DW

    55/64

    Introduction to DW Systech Education 55

    Physical Design

    Physical database design focuses on defining thephysical structures necessary to support the logical

    database design. Primary elements include defining the naming

    standards and setting up the database environment.

    Preliminary indexing and partitioning strategies arealso determined.

  • 7/27/2019 01 Introduction to DW

    56/64

    Introduction to DW Systech Education 56

    Data Staging Design & Development

    The data staging process has three major steps:

    Extraction, Transformation and Load

    The extract process always exposes data quality issues thathave been buried within the operational data store.

    If not addressed properly they can significantly impact thecredibility of the data warehouse.

    Also two warehouse staging processes need to be build: One for the initial population of the data warehouse.

    Another for the regular, incremental loads.

  • 7/27/2019 01 Introduction to DW

    57/64

    Introduction to DW Systech Education 57

    Technical Architecture Design

    The technical architecture design establishes theoverall architecture framework and vision.

    Three factors must be considered simultaneously toestablish the data warehouse technical architecturedesign:

    Business requirements, current technical environment,and planned strategic technical directions.

  • 7/27/2019 01 Introduction to DW

    58/64

    Introduction to DW Systech Education 58

    Product Selection and Installation

    Using the technical architecture design as framework,specific architectural components such as the hardwareplatform, database management system, data staging tool,or data access tool will need to be evaluated and selected.

    A standard technical evaluation process is defined alongwith specific evaluation factors for each architecturalcomponent.

    Once the product has been evaluated and selected, they arethen installed and thoroughly tested to ensure end-to-endintegration with the data warehouse environment.

  • 7/27/2019 01 Introduction to DW

    59/64

    Introduction to DW Systech Education 59

    End User Application Specification

    A set of standard end user application is usuallydefined since not all business users need ad hoc

    access to the data warehouse. Application specification describe the report

    template, user driven parameters, and requiredcalculations.

    These specifications ensure that the developmentteam and the business users have a commonunderstanding of the applications to be delivered.

  • 7/27/2019 01 Introduction to DW

    60/64

    Introduction to DW Systech Education 60

    End User Application Development

    The development of the end user applicationsinvolves configuring the tool metadata and

    constructing the specified reports. Optimally these applications are build using an

    advanced data access tool that provides significantproductivity gains for the development team.

    In addition, it offers a powerful mechanism forbusiness users to easily modify existing reporttemplates.

  • 7/27/2019 01 Introduction to DW

    61/64

    Introduction to DW Systech Education 61

    Deployment

    Deployment represents the convergence of technology,data, and end user applications accessible from the businessusers desktop.

    Extensive planning is required to ensure that these puzzlepieces fit together properly.

    Business users education integrating all aspects of theconvergence must be developed and delivered.

    In addition, user support and communication or feedbackstrategies should be established before any business usershave access to the data warehouse.

  • 7/27/2019 01 Introduction to DW

    62/64

    Introduction to DW Systech Education 62

    Maintenance & Growth

    Focus on the business users must continue byproviding them with ongoing support and

    education. Also focus on the backroom must continue to

    ensure that the processes and procedures are in

    place for effective ongoing operations of the datawarehouse.

  • 7/27/2019 01 Introduction to DW

    63/64

    Introduction to DW Systech Education 63

    Project Management

    Used to ensure that Business Dimensional Lifecycleactivities remain on track and in sync.

    Activities include: Monitoring project status

    Issue tracking

    Change control to preserve scope boundaries

    The development of a comprehensive projectcommunication plan that addresses both the businessand information systems organizations

  • 7/27/2019 01 Introduction to DW

    64/64

    Guidelines for using the Business

    Dimension Life Cycle Business Lifecycle diagram identifies:

    High level task sequencing

    Activities that should be happening concurrentlythroughout the technology data

    Application tracts

    Focus on sequencing and concurrency and notabsolute timelines.