business intelligence reminds on data warehousing

Post on 16-May-2022

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BUSINESS INTELLIGENCE

Reminds on Data Warehousing

(details at the Decision Support Database course)

Data Sciencce & Business Informatics Degree

BI Architecture2

Laboratory of Data Science

Star-schema datawarehouse

A fact table with star-schema dimension tables only

3

time_key

day

day_of_the_week

month

quarter

year

time

branch_key

street

city

province

country

branch

Sales Fact Table

time_key

item_key

employee_key

branch_key

units_sold

dollars_sold

dollars_costMeasures

item_key

item_name

brand

type

supplier_type

item

employee_key

employee_name

employee_type

employee

Laboratory of Data Science

Snowflake-schema datawarehouse

A fact table with star-schema, snowflake and parent-child dimension tables

4

time_key

day

day_of_the_week

month

quarter

year

time

branch_key

street

city_key

branch

Sales Fact Table

time_key

item_key

employee_key

branch_key

units_sold

dollars_sold

dollars_costMeasures

item_key

item_name

brand

type

supplier_key

item

supplier_key

supplier_type

supplier

city_key

city

province

country

cityemp_key

emp_name

emp_boss_key

employees

Laboratory of Data Science

Which DBMS technology for DW?

Storage technology

Architecture

5

Laboratory of Data Science

Storage

RDBMS: record oriented structure

Laboratory of Data Science

6

Columnar: column oriented structure

Advantages:• Faster Scan

• Data Compression (e.g. State)

Storage

Correlation value-based database

Data cells contain the index to an order-set value

In-memory database

Data is stored in compressed format in main memory

Extraction-based system

Storage of attribute extracted from continuous data

flows (eg., web traffic, sensors)

...

Laboratory of Data Science

7

Architecture

Sequential

SQL query processing by a single

processor

Parallel

SQL query plan processing by a

multi-processor machine, with shared

memory

Distributed (Map-reduce)

SQL query processing distributed to

a set of independent machines

◼ Teradata SQL-MR, Hadoop HiveQL

Laboratory of Data Science

8

BI Architecture10

Laboratory of Data Science

K-dimensional cuboid11

Milk Bread … ... Pasta

Jan 13

… ...

Dec 13

Feb 13

Product.Category

Store.City

Time.Month

OrangeRome

PiseFlorence

An hyper-cube with K axes, with a level of some hierarchy at each axis. A

cell of the cuboid contains the values of metrics for the conditions given by

the cell coordinates.

Lucca

Laboratory of Data Science

Cube navigation by different users12

product

Branch manager look at sales

of his/her stores

for any product and any period

Product managers look at sales

of some products

in any period and in any market

Finance manager look at sales

of a period compared to the previous period

for any product and any market

tim

e

Laboratory of Data Science

Cuboids in SQL

13

SELECT L.city, I.brand, T.month, SUM(dollars_sold)FROM fact AS F, location AS L, time AS T, item AS IWHERE F.location_key = L.location_key AND

F.time_key = T.time_key AND F.item_key = I.item_key

GROUP BY L.city, I.brand, T.month

MeasureAggregate

Star-Join

Hierarchy levels

Order or

pivoting

Laboratory of Data Science

Roll-up

How many cuboids?14

Product

Time

All

Time

Time

Product

All

All

Drill-Down

Roll-up

Drill-Down

Drill-Down

Roll-up

Laboratory of Data Science

15

Data Cube (extended cube, hypercube)

Total annual sales

of TVs in U.S.A.Date

Cou

ntr

y

*

*TV

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

U.S.A

Canada

Mexico

*

Laboratory of Data Science

Data cube in SQL Server16

SELECT L.city, I.brand, T.month, SUM(dollars_sold)FROM fact AS F, location AS L, time AS T, item AS IWHERE F.location_key = L.location_key AND

F.time_key = T.time_key AND F.item_key = I.item_key

GROUP BY CUBE(L.city, I.brand, T.month)

MeasureAggregate

Star-Join

Hierarchy levels

Order or

pivoting

GROUP BY ROLLUP(L.city, I.brand, T.month) - all initial subsequences of the group-by attributes

Laboratory of Data Science

Slice and Dice17

Product

Time

Slice

Product

Time

Product =A,B,C

Laboratory of Data Science

Slice in SQL Server18

SELECT L.city, I.brand, T.month, SUM(dollars_sold)FROM fact AS F, location AS L, time AS T, item AS IWHERE F.location_key = L.location_key AND

F.time_key = T.time_key AND F.item_key = I.item_key AND

T.year = 2016GROUP BY CUBE(L.city, I.brand, T.month)

MeasureAggregate

Star-Join

Hierarchy levels

Order or

pivoting

Slice

Laboratory of Data Science

top related