basics of data warehousing, mis and etl

36
Data Warehousing, MIS & ETL Overview SivaSatish.K

Upload: ravi-vakula

Post on 06-Apr-2015

266 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Basics of Data Warehousing, MIS and ETL

Data Warehousing, MIS & ETL

Overview

SivaSatish.K

Page 2: Basics of Data Warehousing, MIS and ETL

Data Warehousing

Page 3: Basics of Data Warehousing, MIS and ETL

Data & Information

Data, in real terms is raw and has no facts behind its existence.

On the other hand, information is the processed raw data (as per the business requirement) and supported by facts.

Example : For an example, Management requires “Number of booking (booking creation trend) for Asia in year 2007”. Whenever this data will be retrieved from data sources as per business requirements (any input system) and stored in data base, this would be information to the management.

Page 4: Basics of Data Warehousing, MIS and ETL

Data Warehousing

It stores the data related to the subject area of the organizations and controlled/owned by centrally coordinated organizations. – In other words, it’s a relational database which is

designed for performing analysis and execution of queries. It contains historical data also.

– Technique to implement data warehouse is known as “Data Warehousing”

Page 5: Basics of Data Warehousing, MIS and ETL

Why do we need data warehousing ?

History

Page 6: Basics of Data Warehousing, MIS and ETL

Reasons:

In OLTP approach, historical data was archived to update the recent information in the system and even it is bit difficult to update that information everywhere in OLTP. Unlike OLTP; DW stores historical data to compare trends over period of time (now a days, it’s in Tera bytes)

As it has one DB at the end, number of sources can be used to transfer data and sources can be relational DB, Flat File, Unix file. ETL tool handles these source files to transfer data from source to target.

Page 7: Basics of Data Warehousing, MIS and ETL

Continued…..

DW stores the necessary data for specific analysis; not jus all data.

Small companies focus on one area so they can have data in data marts ; on the contrary large companies can have data in data warehouse and data marts to facilitate their users’ need.

DW is commonly used by organizations to analyze trends over time. Trends can be of Revenue/Sales/Purchases/ BL generation etc.

Page 8: Basics of Data Warehousing, MIS and ETL

Continued……

Most importantly, Though Organizations may use data warehousing to view day-to-day operations but primarily DW is to facilitate strategic planning resulting from long-term data overviews and these overviews, are the pillars for next business models, forecasts and other reports.

Page 9: Basics of Data Warehousing, MIS and ETL

Approaches:

Normalized Approach Dimensional approach or De - Normalized

approach

Page 10: Basics of Data Warehousing, MIS and ETL

Normalized Vs Dimensional Approach:

Firslty, any of these approaches can be used to store the data to the data warehouse but objective makes a difference to implement these.

In normalized approach, data element is completely ignored for data warehouse.

With dimensional approach, multidimensional model with dimensional approach came into picture. Here, processed data or transaction data are partitioned into "facts" and “dimensions”.

Page 11: Basics of Data Warehousing, MIS and ETL

Dimensional Approach continued…...

Facts in simple meaning are ‘measures’ (figures such as # of booking created, Container quantity, Revenue, Gross Weight etc)

Where as Dimensions are the entity to which these facts belong such as Customer name, continent, Week, Month)

Hence, MIS is an example of Dimensional approach.

Page 12: Basics of Data Warehousing, MIS and ETL

Continued…with an example

One Dimensional:-– For SVVD (PAX-ANX-054 W), there are 4572

containers. Here, SVVD is the dimension and 4572 (container qty) is the measure i.e. fact.

– For Shipper name “ABC 123”, revenue is 1.2 million $. Here, shipper is the dimension and revenue is the measure i.e. fact corresponds to dimensions.

Page 13: Basics of Data Warehousing, MIS and ETL

Continued….

Multidimensional example:– What is the gross weight for 47GP Containers

traveling from New York to Singapore with shipment direction “E” in year 2006?

– Here in this example, Gross weight is the only measure (Fact) and container type (47GP), Load Port (New York) and Discharge Port (Singapore), shipment direction (E) and Year (2006) are various dimensions.

Page 14: Basics of Data Warehousing, MIS and ETL

Characteristics : Data Warehouse

Subject Oriented Integrated Non Volatile Time Variant

Page 15: Basics of Data Warehousing, MIS and ETL

Continued……

Subject oriented in a way that it revolves around the subject area of corporate such as “Booking”, if management wants to see booking pattern of last quarter like which country has maximum booking and shipment started ports.

Integrated in a way that database should avoid naming conflicts and inconsistencies among keywords used and somehow related to subject oriented feature also.

Page 16: Basics of Data Warehousing, MIS and ETL

Continued…..

Non volatile in a way that once stored in the database, data should not be changed as it depicts what has occurred in the past.

Time Variant in a way that, In order to compare the trends in business over the past years, managements/decision makers need large amounts of data. A data warehouse's focus on change over time is what is meant by the term time variant which is unlike OLTP, where performance requirements demand that historical data be moved to an archive.

Page 17: Basics of Data Warehousing, MIS and ETL

Architecture of Data Warehouse

Data warehouse basic Flow:

Data Sources:

-Relation data source-Flat File

End Users Analysis

Meta Data

Raw Data

Summarized data

End Users Analysis

End Users Analysis

Page 18: Basics of Data Warehousing, MIS and ETL

Terminologies: Freq. used in DW

Source : Data where it is coming from Target : Destination database or data mart Test director Test management and defect

management tool from Mercury Sunopsis/ Informatica : ETL dev tool used to

extract, load and transformation Process Packages, Scenarios, Interfaces,

procedures : Different objects and their hierarchy in Sunopsis ETL tool

Page 19: Basics of Data Warehousing, MIS and ETL

MIS : Management Information system

Page 20: Basics of Data Warehousing, MIS and ETL

MIS: Management Information system

Important Module in terms of strategic decisions:– As most of the strategic decisions based on the

information.– As management decisions are based on the

information (information is nothing, just the processed data), processed data has been stored in database in more organized and structured manner.

MIS deals only with Data (back- end) It is treated as DSS ( Decision support system)

Page 21: Basics of Data Warehousing, MIS and ETL

Continued…

As said, deals only with Data; MIS is a Data Warehouse used to store the data for various modules such as Shipment, Equipment, DnD, AR and Cost etc.

Page 22: Basics of Data Warehousing, MIS and ETL

MIS : Architecture

Pictorial Representation:

CSSSybase

ODS Sybase

Gemstone Sybase

CPF Sybase

CDW OracleWare house

EQPMart

AR Mart

CostMart

ShpmtMart

CDW(ETL)

Data mart(ETL)

OLAP Cubes

Server

Cognos Repository

BI Reports

BI Client

Page 23: Basics of Data Warehousing, MIS and ETL

The components and the purpose :

OSCAR – One of the source systems for the master and transactional data

Middleware – The Sunopsis ETL tool is used as the Middleware tool to Extract, Transform and Load the source data into the target systems

CDW (centralized Data Warehouse) – This is one of the target systems, which holds the data for all the Marts (Shipment, Cost, Trade PL, AR and Equipment) which is used for reporting

Page 24: Basics of Data Warehousing, MIS and ETL

Types of Testing

Data Functional If Require, Middleware

Page 25: Basics of Data Warehousing, MIS and ETL

Data Testing includes:

Initial data Testing, Data validation, Data quality, Data integration and Data volume testing. – Data Validation ensures Null/not null values,

duplicate records, behavior of surrogate keys/primary keys, Data type and format etc.

– Data quality testing ensures the accurate information in dimension and fact tables.

– Initial testing means verifying the data in CDW, Master tables which will be populated through OSCAR and its web application (Front End System)

Page 26: Basics of Data Warehousing, MIS and ETL

Continued….

– Data Integration testing ensures the integrity of data in target system if data populated from more than one source system.

– Moreover, if any change to the data happens through UI, then possibly test the same changes to the target data warehouse.

– Data Volume testing ensures that Volume of data is available from the OSCAR Test environment then perform data volume testing

Page 27: Basics of Data Warehousing, MIS and ETL

Functional Testing

Number of records from source system (OSCAR) for each individual dimension and facttable should be tested into target data warehouse, sometimes it may differ due to join conditions between tables in source system.

Testing will be done in target data warehouse after data loads from Source system to target system.

If any changes occur in data source, check corresponding changes in target data warehouse.

Page 28: Basics of Data Warehousing, MIS and ETL

Continued…..

Application specific features to be tested according to business rules/ transformation rules in target data warehouse. (E.g. booking id should be system generated unique identifier).

Page 29: Basics of Data Warehousing, MIS and ETL

Middleware (sunopsis) Testing

Verify the connection between source & target database.

If needed, Verify Source to stage and stage to target mappings and based on that only test what is in source and what should be in target

If an errors occurs, Check the error notification through Open view

For some target tables, there are two different packages/scenarios (for source to stage and stage to target), run both packages manually

Page 30: Basics of Data Warehousing, MIS and ETL

Continued….

Performance testing / Fail-over testing: Test performance and fail-over of Middleware with huge volume of data (If possible and Volume of data will be available from the OSCAR Test environment)

Page 31: Basics of Data Warehousing, MIS and ETL

Test cycle

Subject areas for master target tables: Customer, Geography, Organization, Commodity, Product, Vendor/Tariff, Service contract and Equipment.

Subject areas for transactional target tables: B/L, Booking, Container Movement, Empty poisoning plan, Job Order and Job order Invoice (Transportation/Non-Transportation)

Page 32: Basics of Data Warehousing, MIS and ETL

ETL

Overview

Page 33: Basics of Data Warehousing, MIS and ETL

Why ETL tool ?

While storing the data to the target database, data will be processed according to the various requirements of business (like ignore a particular country, region etc) and later on saved through ETL tools and technique to the database (target repository)

Generate a Unique Id in Shipment Mart.

Page 34: Basics of Data Warehousing, MIS and ETL

ETL: Extraction, Transform and Load.

Extract the data (Sales in different countries in Asia region) through UI/Any other Relational Source or Flat File.

Transform that data based on various requirements from business.

Load the data to the target (table) and store it in database.

Page 35: Basics of Data Warehousing, MIS and ETL

Continued…..

The Simplest way to understand the ETL concepts such as – ETL process, – ETL software, – ETL developments – and ETL testing starts with Data warehousing

concepts which was explained earlier.

Page 36: Basics of Data Warehousing, MIS and ETL

Recommended References:

http://en.wikipedia.org/wiki/Data_warehouse