slowly changning dimension
TRANSCRIPT
-
8/8/2019 Slowly Changning Dimension
1/20
Slowly Changing Dimension
-
8/8/2019 Slowly Changning Dimension
2/20
In Data warehouse, there are two types of tables:
--Dimension table
--Fact table.
Fact tables change regularly as new information collects, whereasdimension table changes very often.
Although dimension tables are typically static lists, most dimensiontables do change over time. For example, we might need to updatean inventory dimension once a month to reflect new or changed partnumbers. Since these changes are smaller in magnitude comparedto changes in fact tables, these dimensions are known as slowlygrowing or slowly changing dimensions.
Slowly changing dimensions are dimension tables that have slowlyincreasing dimension data and updates to existing dimensions.When updating existing dimensions, we decide whether to keep allhistorical dimension data, no historical data, or just the current andprevious versions of dimension data.
-
8/8/2019 Slowly Changning Dimension
3/20
Using the Slowly Changing
Dimensions Wizard The Slowly Changing Dimensions Wizard creates mappings to load slowly changing dimension tables: Type 1 Dimension mapping. Loads a slowly changing dimension table by inserting new dimensions and
overwriting existing dimensions. Use this mapping when you do not want a history of previous dimension data.
Type 2 Dimension/Version Data mapping. Loads a slowly changing dimension table by inserting new andchanged dimensions using a version number and incremented primary key to track changes. Use this mappingwhen you want to keep a full history of dimension data and to track the progression of changes.
Type 2 Dimension/Flag Current mapping. Loads a slowly changing dimension table by inserting new andchanged dimensions using a flag to mark current dimension data and an incremented primary key to trackchanges.Use this mapping when you want to keep a full history of dimension data, tracking the progression ofchanges while flagging only the current dimension.
Type 2 Dimension/Effective Date Range mapping. Loads a slowly changing dimension table by inserting newand changed dimensions using a date range to define current dimension data. Use this mapping when you want tokeep a full history of dimension data, tracking changes with an effective date range.
Type 3 Dimension mapping.
Loads a slowly changing dimension table by inserting new dimensions and updatingvalues in existing dimensions. Use this mapping when you want to keep the current and previous dimension
values in the dimension table.
-
8/8/2019 Slowly Changning Dimension
4/20
SCD Overview:
Slowly Changing
Dimensions Mapping
Target Table History Data Handling
Type 1 Dimension Slowly ChangingDimension
None Inserts new dimensions. Overwrites existingdimensions with changed dimensions.
Type 2
Dimension/Version
Data
Slowly Changing
Dimension
Full Inserts new and changed dimensions. Creates a
version number and increments the primary keyto track the changes.
Type 2 Dimension/Flag
Current
Slowly ChangingDimension
Full Inserts new and changed dimensions. Flags thecurrent version and increments the primary key
to track the changes.
Type 2
Dimension/Effective
Date Range
Slowly ChangingDimension
Full Inserts new and changed dimensions. Createsan effective date range to track changes.
Type 3 Dimension Slowly ChangingDimension
Partial Inserts new dimensions. Updates changedvalues in existing dimensions. Optionally uses
the load date to track changes.
-
8/8/2019 Slowly Changning Dimension
5/20
Creating a Type 1 Dimension Mapping
The Type 1 Dimension mapping filters source rows based on user-
defined comparisons and inserts only those found to be new
dimensions to the target.
Rows containing changes to existing dimensions are updated in the
target by overwriting the existing dimension. In the Type 1Dimension mapping, all rows contain current dimension data.
Handling Keys :
When we use the Type 1 Dimension option, the Designer creates an
additional column in the mapping target, PM_PRIMARYKEY. In thiscolumn, the Integration Service generates a primary key for each
row written to the target, incrementing new key values by 1.
-
8/8/2019 Slowly Changning Dimension
6/20
Understanding the SCD 1 Mapping
The Type 1 Dimension mapping performs the following tasks:
Selects all rows.
Caches the existing target as a lookup table.
Compares logical key columns in the source against corresponding columns in the target lookup table.
Compares source columns against corresponding target columns if key columns match.
Flags new rows and changed rows.
Creates two data flows: one for new rows, one for changed rows.
Generates a primary key for new rows.
Inserts new rows to the target.
Updates changed rows in the target, overwriting existing rows.
-
8/8/2019 Slowly Changning Dimension
7/20
Data Example:
PM_PRIMARYKEY
ITEM STYLES
65000 Sandal 5
Data after first run:
Data after second run:
PM_PRIMARYK
EY
ITEM STYLES
65000 Sandal 12
-
8/8/2019 Slowly Changning Dimension
8/20
Creating a Type 2 Dimension/Version
Data Mapping The Type 2 Dimension/Version Data mapping filters source rows based on user-definedcomparisons and inserts both new and changed dimensions into the target. Changes are trackedin the target table by versioning the primary key and creating a version number for eachdimension in the table.
In the Type 2 Dimension/Version Data target, the current version of a dimension has the highestversion number and the highest incremented primary key of the dimension.
When we use this option, the Designer creates two additional fields in the target:--PM_PRIMARYKEY. The Integration Service generates a primary key for each row
written to the target.
--PM_VERSION_NUMBER. The Integration Service generates a version number foreach row written to the target.
Handling Keys : In a Type 2 Dimension/Version Data mapping, the Integration Service generates a new primary
key value for each new dimension it inserts into the target. An Expression transformationincrements key values by 1,000 for new dimensions.
When updating a dimension, the Integration Service increments the existing primary key by 1.
-
8/8/2019 Slowly Changning Dimension
9/20
Understanding the SCD 2 Mapping
using versioning
The Type 2 Dimension/Version Data mapping performs the following tasks:
Selects all rows.
Caches the existing target as a lookup table.
Compares logical key columns in the source against corresponding columns in the target lookup table.
Compares source columns against corresponding target columns if key columns match.
Flags new rows and changed rows.
Creates two data flows: one for new rows, one for changed rows.
Generates a primary key and version number for new rows.
Inserts new rows to the target.
Increments the primary key and version number for changed rows.
Inserts changed rows in the target.
-
8/8/2019 Slowly Changning Dimension
10/20
Data Example:
PM_PRIMARYKEY ITEM STYLES PM_VERSION_NUMBER
65000 Sandal 5 0
Data after first run:
Data after second run:
PM_PRIMARYKEY ITEM STYLES PM_VERSION_NUMBER
65000 Sandal 5 0
65001 Sandal 12 1
65002 Sandal 17 2
-
8/8/2019 Slowly Changning Dimension
11/20
Creating a Type 2 Dimension/Flag
Current Mapping The Type 2 Dimension/Flag Current mapping filters source rows based on user-defined
comparisons and inserts both new and changed dimensions into the target. Changes are trackedin the target table by flagging the current version of each dimension and versioning the primarykey.
In the Type 2 Dimension/Flag Current target, the current version of a dimension has a current flagset to 1 and the highest incremented primary key.
When we use this option, the Designer creates two additional fields in the target:-- PM_CURRENT_FLAG. The Integration Service flags the current row 1 and all
previous versions 0.
--PM_VERSION_NUMBER. The Integration Service generates a version number foreach row written to the target.
Handling Keys : The Integration Service generates a current flag of 1 for each row written to the target. This flag
indicates the dimension is new or newly updated. If the row is an update to an existing dimension,the Integration Service resets the existing dimension current flag to 0.
As a result, all current versions of a dimension appear in the target with a current flag of 1. Allprevious versions have a current flag of 0.
-
8/8/2019 Slowly Changning Dimension
12/20
Understanding the SCD 2 Mapping
using current flagThe Type 2 Dimension/Flag Current mapping performs the following tasks:
Selects all rows.
Caches the existing target as a lookup table.
Compares logical key columns in the source against corresponding columns in the target lookup table.
Compares source columns against corresponding target columns if key columns match.
Flags new rows and changed rows.
Creates two data flows: one for new rows, one for changed rows.
Generates a primary key and current flag for new rows.
Inserts new rows to the target.
Increments the existing primary key and sets the current flag for changed rows.
Inserts changed rows in the target.
Updates existing versions of the changed rows in the target, resetting the current flag to indicate the row is nolonger current.
-
8/8/2019 Slowly Changning Dimension
13/20
Data Example:
PM_PRIMARYKEY ITEM STYLES PM_CURRENT
_FLAG
65000 Sandal 5 0
Data after first run:
Data after three runs:
PM_PRIMARYKEY ITEM STYLES PM_CURRENT_FLAG
65000 Sandal 5 0
65001 Sandal 12 0
65002 Sandal 17 1
-
8/8/2019 Slowly Changning Dimension
14/20
Creating a Type 2 Dimension/Effective
Date Range Mapping The Type 2 Dimension/Effective Date Range mapping filters source rows based on user-defined comparisons and
inserts both new and changed dimensions into the target. Changes are tracked in the target table by maintainingan effective date range for each version of each dimension in the target.
In the Type 2 Dimension/Flag Current target, the current version of a dimension has a current flag set to 1 and thehighest incremented primary key.
When we use this option, the Designer creates two additional fields in the target:
-- PM_BEGIN_DATE For each new and changed dimension written to the target, the Integration
Service uses the system date to indicate the start of the effective date range for the dimension.--PM_END_DATE. For each dimension being updated, the Integration Service uses the system date
to indicate the end of the effective date range for the dimension.
--PM_VERSION_NUMBER. The Integration Service generates a version number for each rowwritten to the target.
Handling Keys : The Integration Service generates a begin date for each new and changed dimension it inserts into the target,
using the current system date. The end date for these dimensions is NULL.
Each time the Integration Service inserts a changed dimension, it updates the previous version of the dimension inthe target, using the current system date to fill the previously null end date column.
As a result, all current dimension data in the Type 2 Dimension/Effective Date Range target have null values in thePM_END_DATE column. All previous versions of dimension data have a system date in PM_END_DATE toindicate the end of the effective date range for each version.
-
8/8/2019 Slowly Changning Dimension
15/20
Understanding the SCD 2 Mapping
using Effective Date RangeThe Type 2 Dimension/Effective Date Range mapping performs the following tasks:
Selects all rows.
Caches the existing target as a lookup table.
Compares logical key columns in the source against corresponding columns in the target lookup table.
Compares source columns against corresponding target columns if key columns match.
Flags new rows and changed rows.
Creates three data flows: one for new rows, one for changed rows, one for updating existing rows.
Generates a primary key and beginning of the effective date range for new rows.
Inserts new rows to the target.
Generates a primary key and beginning of the effective date range for changed rows.
Inserts changed rows in the target.
Updates existing versions of the changed rows in the target, generating the end of the effective date range toindicate the row is no longer current.
-
8/8/2019 Slowly Changning Dimension
16/20
Data Example:
PM_PRIMARYKEY ITEM STYLES PM_BEGIN_DATE
PM_END_DATE
65000 Sandal 5 9/1/98
Data after first run:
Data after second run:PM_PRIMARYKEY ITEM STYLES PM_BEGIN
_DATEPM_END_DATE
65000 Sandal 5 9/1/98 9/1/99
65283 Sandal 12 9/1/99
-
8/8/2019 Slowly Changning Dimension
17/20
Creating a Type 3 Dimension Mapping
The Type 3 Dimension mapping filters source rows based on user-defined comparisons andinserts only those found to be new dimensions to the target. Rows containing changes to existingdimensions are updated in the target. When updating an existing dimension, the IntegrationService saves existing data in different columns of the same row and replaces the existing datawith the updates. The Integration Service optionally enters the system date as a timestamp foreach row it inserts or updates.
In the Type 3 Dimension target, each dimension contains current dimension data.
When we use this option, the Designer creates two additional fields in the target:
--PM_PRIMARYKEY. The Integration Service generates a primary key for each rowwritten to the target.
--PM_PREV_COLNAME. The Designer generates a previous column corresponding toeach column for which you want historical data. The Integration Service keeps the previousversion of dimension data in these columns.
Handling Keys : In the Type 3 Dimension mapping, the Integration Service generates a primary key value for eachnew row written to the target, incrementing key values by one. Updated rows retain their originalkey values.
-
8/8/2019 Slowly Changning Dimension
18/20
Understanding the SCD 3 Mapping
The Type 3 Dimension mapping performs the following tasks:
Selects all rows.
Caches the existing target as a lookup table.
Compares logical key columns in the source against corresponding columns in the target lookup table.
Compares source columns against corresponding target columns if key columns match.
Flags new rows and changed rows.
Creates two data flows: one for new rows, one for updating changed rows.
Generates a primary key and optionally notes the effective date for new rows.
Inserts new rows to the target.
Writes previous values for each changed row into previous columns and replaces previous values with updatedvalues.
Optionally uses the system date to note the effective date for inserted and updated values.
Updates changed rows in the target.
-
8/8/2019 Slowly Changning Dimension
19/20
Data Example:
PM_PRIMARYKEY ITEM STYLES PM_PREV_ST
YLES
65000 Sandal 5
Data after first run:
Data after second run:
PM_PRIMARYKEY ITEM STYLES PM_VERSION
_NUMBER
65000 Sandal 12 5
-
8/8/2019 Slowly Changning Dimension
20/20
Thank You.