olap ppt

37
Online Analytical Online Analytical Processing (OLAP) Processing (OLAP) By By Aman Sachdeva Aman Sachdeva 2309001 2309001 CSE CSE

Upload: asachdeva92

Post on 30-Oct-2014

172 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: OLAP PPT

Online Analytical Online Analytical Processing (OLAP)Processing (OLAP)

ByByAman SachdevaAman Sachdeva

23090012309001CSECSE

Page 2: OLAP PPT

Topics CoveredTopics Covered1.What is OLAP ?1.What is OLAP ?2.OLAP v/s OLTP2.OLAP v/s OLTP3.Three-Tier Design Support Systems3.Three-Tier Design Support Systems4. Approaches to OLAP servers4. Approaches to OLAP servers5. Multi-dimensional data model5. Multi-dimensional data model6. ROLAP6. ROLAP7. MOLAP7. MOLAP8. HOLAP8. HOLAP9.OLAP Benefits9.OLAP Benefits10.OLAP Applications10.OLAP Applications11. Conclusion11. Conclusion

Page 3: OLAP PPT

What is OLAP ?What is OLAP ?

• The dynamic synthesis, analysis, and The dynamic synthesis, analysis, and consolidation of large volumes of multi-consolidation of large volumes of multi-dimensional data, Codd (1993).dimensional data, Codd (1993).

• Describes a technology that uses a multi-Describes a technology that uses a multi-dimensional view of aggregate data to dimensional view of aggregate data to provide quick access to strategic information provide quick access to strategic information for the purposes of advanced analysis. for the purposes of advanced analysis.

• Enables users to gain a deeper Enables users to gain a deeper understanding and knowledge about various understanding and knowledge about various aspects of their corporate data through fast, aspects of their corporate data through fast, consistent, interactive access to a wide consistent, interactive access to a wide variety of possible views of the datavariety of possible views of the data

Page 4: OLAP PPT

HHistoryistory

In 1993, E. F. Codd came up with the term In 1993, E. F. Codd came up with the term online analytical processing (OLAP) and and proposed 12 criteria to define an OLAP proposed 12 criteria to define an OLAP databasedatabase

The term OLAP seems perfect to describe The term OLAP seems perfect to describe databases designed to facilitate decision databases designed to facilitate decision making (analysis) in an organizationmaking (analysis) in an organization

Page 5: OLAP PPT

OLAP v/s OLTPOLAP v/s OLTPOLTPOLTP OLAPOLAP

• Clerk, IT ProfessionalClerk, IT Professional

• Day to day Day to day operationsoperations

• Application-oriented Application-oriented (E-R based)(E-R based)

• Current, IsolatedCurrent, Isolated

• Structured, RepetitiveStructured, Repetitive

• Short, Simple Short, Simple transactiontransaction

• Read/writeRead/write

• TensTens

• ThousandsThousands

• 100 MB-GB100 MB-GB

• Knowledge workerKnowledge worker

• Decision supportDecision support

• Subject-oriented (Star, Subject-oriented (Star, snowflake)snowflake)

• Historical, ConsolidatedHistorical, Consolidated

• Ad hocAd hoc

• Complex queryComplex query

• Read MostlyRead Mostly

• MillionsMillions

• HundredsHundreds

• 100GB-TB100GB-TB

User

Function

DB Design

Data

View

Unit of work

Access

Records accessed

Users

Db size

Page 6: OLAP PPT

Three-Tier Decision Three-Tier Decision Support SystemsSupport Systems

• Warehouse database serverWarehouse database server– Almost always a relational DBMS, rarely flat filesAlmost always a relational DBMS, rarely flat files

• OLAP serversOLAP servers– Relational OLAP (ROLAP): extended relational DBMS Relational OLAP (ROLAP): extended relational DBMS

that maps operations on multidimensional data to that maps operations on multidimensional data to standard relational operatorsstandard relational operators

– Multidimensional OLAP (MOLAP): special-purpose Multidimensional OLAP (MOLAP): special-purpose server that directly implements multidimensional data server that directly implements multidimensional data and operationsand operations

• ClientsClients– Query and reporting toolsQuery and reporting tools– Analysis toolsAnalysis tools– Data mining toolsData mining tools

Page 7: OLAP PPT

The Complete Decision Support The Complete Decision Support SystemSystem

Information Sources Data Warehouse Server(Tier 1)

OLAP Servers(Tier 2)

Clients(Tier 3)

OperationalDB’s

SemistructuredSources

extracttransformloadrefreshetc.

Data Marts

DataWarehouse

e.g., MOLAP

e.g., ROLAP

serve

OLAP

Query/Reporting

Data Mining

serve

serve

Page 8: OLAP PPT

Approaches to OLAP Approaches to OLAP ServersServers

Three possibilities for OLAP serversThree possibilities for OLAP servers

(1) Relational OLAP (ROLAP)(1) Relational OLAP (ROLAP)– Relational and specialized relational DBMS to store Relational and specialized relational DBMS to store

and manage warehouse dataand manage warehouse data– OLAP middleware to support missing piecesOLAP middleware to support missing pieces

((2) Multidimensional OLAP (MOLAP)2) Multidimensional OLAP (MOLAP)– Array-based storage structuresArray-based storage structures– Direct access to array data structuresDirect access to array data structures

(3) Hybrid OLAP (HOLAP)(3) Hybrid OLAP (HOLAP)– Storing detailed data in RDBMSStoring detailed data in RDBMS– Storing aggregated data in MDBMSStoring aggregated data in MDBMS– User access via MOLAP toolsUser access via MOLAP tools

Page 9: OLAP PPT

The Multi-Dimensional Data The Multi-Dimensional Data ModelModel

““Sales by product line over the past six months”Sales by product line over the past six months”

““Sales by store between 1990 and 1995”Sales by store between 1990 and 1995”

Prod Code Time Code Store Code Sales Qty

Store Info

Product Info

Time Info

. . .

Numerical MeasuresKey columns joining fact table

to dimension tables

Fact table for measures

Dimension tables

Page 10: OLAP PPT

ROLAP: Dimensional Modeling Using ROLAP: Dimensional Modeling Using Relational DBMSRelational DBMS

• Special schema design: Special schema design: star, snowflakestar, snowflake

• Special indexes: bitmap, multi-table joinSpecial indexes: bitmap, multi-table join

• Proven technology (relational model, DBMS), Proven technology (relational model, DBMS), tend to outperform specialized MDDB tend to outperform specialized MDDB especially on large data setsespecially on large data sets

• ProductsProducts– IBM DB2, Oracle, Sybase IQ, RedBrick, IBM DB2, Oracle, Sybase IQ, RedBrick,

InformixInformix

Page 11: OLAP PPT

Star Schema (in RDBMS)Star Schema (in RDBMS)

Page 12: OLAP PPT

Star Schema ExampleStar Schema Example

Page 13: OLAP PPT

The “Classic” Star SchemaThe “Classic” Star Schema

A single fact table, with A single fact table, with detail and summary datadetail and summary data

Fact table primary key Fact table primary key has only one key column has only one key column per dimensionper dimension

Each key is generatedEach key is generated Each dimension is a Each dimension is a

single table, highly de-single table, highly de-normalizednormalized

Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low maintenance, very simple metadata

PERIOD KEY

Store Dimension Time Dimension

Product Dimension

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Period DescYearQuarterMonthDayCurrent FlagResolutionSequence

Fact Table

PRODUCT KEY

Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.Level

Product Desc.BrandColorSizeManufacturerLevel

STORE KEY

Page 14: OLAP PPT

The “Snowflake” SchemaThe “Snowflake” Schema

STORE KEY

Store Dimension

Store DescriptionCityStateDistrict IDRegion_IDRegional Mgr.

District_IDDistrict Desc.Region_ID

Region_ID

Region Desc.Regional Mgr.

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Store Fact Table

Page 15: OLAP PPT

Aggregation in a Single Fact TableAggregation in a Single Fact Table

Drawbacks: Summary data in the fact table yields poorer performance for summary levels, huge dimension tables a problem

PERIOD KEY

Store Dimension Time Dimension

Product Dimension

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Period DescYearQuarterMonthDayCurrent FlagResolutionSequence

Fact Table

PRODUCT KEY

Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.Level

Product Desc.BrandColorSizeManufacturerLevel

STORE KEY

Page 16: OLAP PPT

PERIOD KEY

Store Dimension Time Dimension

Product Dimension

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Period DescYearQuarterMonthDayCurrent FlagSequence

Fact Table

PRODUCT KEY

Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.

Product Desc.BrandColorSizeManufacturer

STORE KEY

The “Fact Constellation” The “Fact Constellation” SchemaSchema

DollarsUnitsPrice

District Fact Table

District_IDPRODUCT_KEYPERIOD_KEY

DollarsUnitsPrice

Region Fact Table

Region_IDPRODUCT_KEYPERIOD_KEY

Page 17: OLAP PPT

TheThe

Aggregations using Aggregations using “Snowflake” Schema and “Snowflake” Schema and Multiple Fact TablesMultiple Fact Tables

• No No LEVELLEVEL in dimension tables in dimension tables

• Dimension tables are normalized by Dimension tables are normalized by decomposing at the attribute leveldecomposing at the attribute level

• Each dimension table has one key for Each dimension table has one key for each level of the dimensionís each level of the dimensionís hierarchy hierarchy

• The lowest level key joins the The lowest level key joins the dimension table to both the fact table dimension table to both the fact table and the lower level attribute tableand the lower level attribute table

How does it work? The best way is for the query to be built by understanding which summary levels exist, and finding the proper snowflaked attribute tables, constraining there for keys, then selecting from the fact table.

STORE KEY

Store Dimension

Store DescriptionCityStateDistrict IDDistrict Desc.Region_ IDRegion Desc.Regional Mgr.

District_ IDDistrict Desc.Region_ ID

Region_ ID

Region Desc.Regional Mgr.

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Store Fact Table

DollarsUnitsPrice

District Fact Table

District_IDPRODUCT_KEYPERIOD_KEY Dollars

UnitsPrice

RegionFact Table

Region_IDPRODUCT_KEYPERIOD_KEY

Page 18: OLAP PPT

Aggregation Contd …Aggregation Contd …

Advantage: Best performance when queries involve aggregation

Disadvantage: Complicated maintenance and metadata, explosion in the number of tables in the database

STORE KEY

Store Dimension

Store DescriptionCityStateDistrict IDDistrict Desc.Region_ IDRegion Desc.Regional Mgr.

District_ IDDistrict Desc.Region_ ID

Region_ ID

Region Desc.Regional Mgr.

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Store Fact Table

DollarsUnitsPrice

District Fact Table

District_IDPRODUCT_KEYPERIOD_KEY Dollars

UnitsPrice

RegionFact Table

Region_IDPRODUCT_KEYPERIOD_KEY

Page 19: OLAP PPT

AggregatesAggregates

sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1

81

Page 20: OLAP PPT

AggregatesAggregates Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date

ans date sum1 812 48

sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

Page 21: OLAP PPT

Points to be noticed about ROLAPPoints to be noticed about ROLAP

• Defines complex, multi-dimensional data with Defines complex, multi-dimensional data with simple modelsimple model

• Reduces the number of joins a query has to Reduces the number of joins a query has to processprocess

• Allows the data warehouse to evolve with rel. Allows the data warehouse to evolve with rel. low maintenancelow maintenance

• Can contain both detailed and summarized Can contain both detailed and summarized data.data.

• ROLAP is based on familiar, proven, and ROLAP is based on familiar, proven, and already selected technologies.already selected technologies.

BUT!!!BUT!!!

• SQL for multi-dimensional manipulation of SQL for multi-dimensional manipulation of calculations.calculations.

Page 22: OLAP PPT

MOLAP: Dimensional Modeling MOLAP: Dimensional Modeling Using the Multi Dimensional ModelUsing the Multi Dimensional Model

• MDDB: a special-purpose data modelMDDB: a special-purpose data model

• Facts stored in multi-dimensional Facts stored in multi-dimensional arraysarrays

• Dimensions used to index arrayDimensions used to index array

• Sometimes on top of relational DBSometimes on top of relational DB

• ProductsProducts– Pilot, Arbor Essbase, GentiaPilot, Arbor Essbase, Gentia

Page 23: OLAP PPT

The MOLAP CubeThe MOLAP Cube

sale prodId storeId amtp1 s1 12p2 s1 11p1 s3 50p2 s2 8

s1 s2 s3p1 12 50p2 11 8

Fact table view: Multi-dimensional cube:

dimensions = 2

Page 24: OLAP PPT

3-D Cube3-D Cube

dimensions = 3

Multi-dimensional cube:Fact table view:

sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

Page 25: OLAP PPT

ExampleExample

Store

Product

Time

M T W Th F S S

Juice

Milk

Coke

Cream

Soap

Bread

NYSF

LA

10

34

56

32

12

56

56 units of bread sold in LA on M

Dimensions:Time, Product, Store

Attributes:Product (upc, price, …)Store ……

Hierarchies:Product Brand …Day Week QuarterStore Region Country

roll-up to week

roll-up to brand

roll-up to region

Page 26: OLAP PPT

Cube Aggregation: Roll-upCube Aggregation: Roll-up

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

s1 s2 s3p1 56 4 50p2 11 8

s1 s2 s3sum 67 12 50

sump1 110p2 19

129

. . .

drill-down

rollup

Example: computing sums

Page 27: OLAP PPT

Cube Operators for Roll-upCube Operators for Roll-up

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

s1 s2 s3p1 56 4 50p2 11 8

s1 s2 s3sum 67 12 50

sump1 110p2 19

129

. . .

sale(s1,*,*)

sale(*,*,*)sale(s2,p2,*)

Page 28: OLAP PPT

s1 s2 s3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129

Extended CubeExtended Cube

day 2 s1 s2 s3 *p1 44 4 48p2* 44 4 48s1 s2 s3 *

p1 12 50 62p2 11 8 19* 23 8 50 81

day 1

*

sale(*,p2,*)

Page 29: OLAP PPT

Points to be noticed about MOLAPPoints to be noticed about MOLAP

• Pre-calculating or pre-consolidating transactional data Pre-calculating or pre-consolidating transactional data improves speed. improves speed.

BUTBUTFully pre-consolidating incoming data, MDDs require an Fully pre-consolidating incoming data, MDDs require an enormous amount of overhead both in processing time and in enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GBstorage. An input file of 200MB can easily expand to 5GB

MDDs are great candidates for the MDDs are great candidates for the <<50GB department data 50GB department data marts.marts.

• Rolling up and Drilling down through aggregate data.Rolling up and Drilling down through aggregate data.

• With MDDs, application design is essentially the definition of With MDDs, application design is essentially the definition of dimensions and calculation rules, while the RDBMS requires dimensions and calculation rules, while the RDBMS requires that the database schema be a star or snowflake.that the database schema be a star or snowflake.

Page 30: OLAP PPT

Hybrid OLAP (HOLAP)Hybrid OLAP (HOLAP)

• HOLAP = Hybrid OLAP:HOLAP = Hybrid OLAP:

– Best of both worldsBest of both worlds

– Storing detailed data in RDBMSStoring detailed data in RDBMS

– Storing aggregated data in MDBMSStoring aggregated data in MDBMS

– User access via MOLAP toolsUser access via MOLAP tools

Page 31: OLAP PPT

ExamplesExamples

• ROLAPROLAP– Telecommunication startup: call data records Telecommunication startup: call data records

(CDRs) (CDRs) – ECommerce SiteECommerce Site– Credit Card CompanyCredit Card Company

• MOLAPMOLAP– Analysis and budgeting in a financial departmentAnalysis and budgeting in a financial department– Sales analysisSales analysis

• HOLAPHOLAP– Sales department of a multi-national companySales department of a multi-national company– Banks and Financial Service ProvidersBanks and Financial Service Providers

Page 32: OLAP PPT

• Increased productivity of end-usersIncreased productivity of end-users..• Reduced backlog of applications Reduced backlog of applications

development for IT staff.development for IT staff.• Retention of organizational control over Retention of organizational control over

the integrity of corporate data.the integrity of corporate data.• Reduced query drag and network traffic on Reduced query drag and network traffic on

OLTP systems or on the data warehouse. OLTP systems or on the data warehouse. • Improved potential revenue and Improved potential revenue and

profitability.profitability.

OLAP BenefitsOLAP Benefits

Page 33: OLAP PPT

OLAP APPLICATIONSOLAP APPLICATIONS

• JIT(Just-In-Time) information is JIT(Just-In-Time) information is computed data that usually reflects computed data that usually reflects complex relationships and is often complex relationships and is often calculated on the fly. Also as data calculated on the fly. Also as data relationships may not be known in relationships may not be known in advance, the data model must be advance, the data model must be flexible.flexible.

Page 34: OLAP PPT

OLAP APPLICATIONSOLAP APPLICATIONS

• Although OLAP applications are found in Although OLAP applications are found in widely divergent functional areas, they all widely divergent functional areas, they all have the following key features:have the following key features:

– multi-dimensional views of datamulti-dimensional views of data

– support for complex calculationssupport for complex calculations

– Time intelligenceTime intelligence

Page 35: OLAP PPT

Examples of OLAP applications Examples of OLAP applications in various functional areasin various functional areas

Page 36: OLAP PPT

ConclusionConclusion

• IT organizations are faced with the challenge of delivering systems that allow knowledge workers to make strategic and tactical decisions based on corporate information. These decision support systems are referred to as Online Analytical Processing (OLAP) systems, and they allow knowledge workers to intuitively, quickly, and flexibly manipulate operational data using familiar business terms, in order to provide analytical insight.

Page 37: OLAP PPT

ThanksThanks

Any Queries ??Any Queries ??