the data warehouse environment. agenda the structure of the data warehouse subject orientation day 1...

Post on 20-Dec-2015

221 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Data Warehouse Environment

Agenda

• The Structure of the Data Warehouse• Subject Orientation• Day 1 – day n Phenomenon• Granularity• Partitioning as a Design Approach• Structuring data in the Data Warehouse• Data Warehouse: The Standard Manual• Auditing and the Data Warehouse• Cost Justification

The Structure of the Data Warehouse

• Older level of detail

• Current level of detail

• A level of lightly summarized data

• A level of highly summarized data

Subject Orientation• The data warehouse is oriented to the major

subject areas of the corporation that have been defined in the high-level corporate data model.

• Typical subject areas include the following:– Customer– Product– Transaction or activity– Policy– Claim– Account

Day 1 – Day n Phenomenon

Granularity

• The single most important aspect of design of a data warehouse is the issue of granularity

• Indeed, the issue of granularity permeates the entire architecture that surrounds the data warehouse environment.

• Granularity refers to the level of detail or summarization of the units of data in the data warehouse.

• The more detail there is, the lower the level of granularity

• The less detail there is, the higher the level of granularity

Partitioning as a Design Approach

• A second major design issue of data in the warehouse (after that of granularity) is that of partitioning

• Partitioning of data refers to the breakup of data into separate physical units that can be handled independently.

• Proper partitioning can benefit the data warehouse in several ways:– Loading data– Accessing data– Archiving data– Deleting data– Monitoring data– Storing data

• Partitioning data properly allows data to grow and to be managed. Not partitioning data properly does not allow data to be managed or to grow gracefully

Partitioning of Data

• The purpose of partitioning of current detail data is to break data up into small, manageable physical units.

• Below is some of the tasks that cannot easily be performed when data resides in large physical units:– Restructuring

– Indexing

– Sequential Scanning, if needed

– Reorganization

– Recovery

– Monitoring

Partitioning of data (cont’d)• Data can be divided by many criteria, such as:

– By date– By line of business– By geography– By organizational unit– By all of the above

• The choice of partitioning data are strictly up to the developer. As an example of how a life insurance company may choose to partition its data, consider the following physical units of data:

• 2000 health claims, 2001 health claims, 2002 health claims• 1999 life claims, 2000 life claims, 2001 life claims, 2002 life claims• 2000 casuality claims, 2001 casuality claims, 2002 casuality claims• The insurance company has used the criteria of date, that is, year – and

type of claim to partition the data

Partitioning of data (cont’d)

• Partitioning can be done in many ways:– Partition at the system level– Partition at the application level

• As a rule, it makes sense to partition data warehouse data at the application level

Structuring data in the Data Warehouse

• There are many more ways to structure data within the data warehouse. The most common are these:– Simple cumulative– Rolling summary– Simple direct– Continuous

Structuring data in the Data Warehouse (cont’d)

Structuring data in the Data Warehouse (cont’d)

top related