isv innovation presented by isv innovation presented by business intelligence fundamentals: data...

23
Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

Upload: amber-mitchell

Post on 28-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

ISV Innovation

Presented by

Business Intelligence Fundamentals: Data Loading

Ola EkdahlIT Mentors

9/12/08

Page 2: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

2Data Loading

Agenda

1.Dimension Table Load

2.Fact Table Load

3.Working with SCD’s

4.SSIS SCD Task

5.Load Data using Staging Tables

6.Top 10 Best Practices

Page 3: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

3

Data Flow Task

Encapsulates the data flow engine

Extract

Transform

Load

Data Loading

Page 4: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

4

Populating Dimension Tables

Y

Insert newrecord

Update changedcolumn(s)

Expire existingrecord

Transform Correlaterecords

N

N

Y

Type 2change?

Y

Type 1change?

New record?

Dimension source

Data Loading

Page 5: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

5

Populating Fact Tables

Y

Insert newrecord

Insert newdimension record

Lookupdimension key

N

Lookup failed?

Repeat for each dimension key

TransformFactsource

Data Loading

Page 6: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

6

Slowly Changing Dimension Transformation The wizard-based configuration promotes rapid ETL

development Supports

Type 0 (Fixed Attribute) Type 1 (Changing Attribute) Type 2 (Historical Attribute) Inferred member management

Automatically constructs the downstream data flow Handles the majority of slowly changing dimension

scenarios

Data Loading

Page 7: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

7

Populating Dimension Tables

Y

Insert newrecord

Update changedcolumn(s)

Expire existingrecord

Transform Correlaterecords

N

N

Y

Type 2change?

Y

Type 1change?

New record?

Dimension Source

Data Loading

Page 8: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

8Data Cleansing

Page 9: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

LastName update to Valdez-Smythe

SCD Type 19

Existing record is updated

History is not preserved

Data Loading

Page 10: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

SCD Type 210

Existing record is ‘expired’ and new record inserted

History is preserved

Most common form of Slowly Changing Dimension

SalesTerritoryKey update to 10

Data Loading

Page 11: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

11

Configuring the SCD Transformation Step 1 Select the target dimension table

Configure the relationship between the source data and the dimension table

The relationship is established with the business key stored in the dimension table

Data Loading

Page 12: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

12

Configuring the SCD Transformation Step 2 Select the columns and their change type:

Fixed (Type 0)

Changing (Type 1)

Historical (Type 2)

Data Loading

Page 13: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

13

Configuring the SCD Transformation Step 3 Configure the behavior if Fixed attributes change

Configure whether Changing attributes should update the current record or all matching records

Data Loading

Page 14: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

14

Configuring the SCD Transformation Step 4 Configure how Historical attributes identify current

and expired records: Single Boolean column, or

Start and End date columns

It is best practice to store Start and End dates

Data Loading

Page 15: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

15

Configuring the SCD Transformation Step 5 If inferred members are stored in the dimension

table, define how they are identified: When all columns with a change type are null, or

By a single Boolean column

Data Loading

Page 16: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

16

The Slowly Changing Dimension Transformation

Based on your configuration, the wizard completes the downstream data flow

Data Loading

Page 17: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

17Data Cleansing

Page 18: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

18Data Loading

Staging Tables

1. Create a staging table that matches the logical structure of the target table. Load the bulk data into this table. This loading can be much faster (compared to loading directly into the target table) because the staging table has no indexes or constraints on it. More importantly, while the new data is being loaded, the existing data is fully available for all transactions without any impact, because the data load is taking place on a separate staging table.

2. Create constraints and indexes on the staging table that are equivalent to those that exist on the target table. While you create constraints and indexes on the staging table, the existing data is fully available for all transactions without any impact, because this is taking place on the staging table.

3. Execute the ALTER TABLE … SWITCH statement to move the data from the staging table to an empty partition of the target table. This is a metadata-only operation, and is very fast (usually under a second) when there are no long transactions on the target table. This way, all the loading and indexing happens outside the main table, and then the data quickly moves into the main table. One important thing to note here is that the ALTER TABLE … SWITCH operation requires a schema modification (Sch-M) lock on the table. Long running transactions can block the switch operation from acquiring the Sch-M lock and make it wait

Page 19: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

19Data Loading

Partitions

In data warehouse applications, it is very common to partition data on the date dimension. This helps in periodically archiving or deleting the old data as new data comes in, with minimal effect on performance and availability. Depending on data volume, database designers pick yearly, quarterly, monthly, weekly, daily or even hourly partitions.

One way to partition a table is to create one partition for each incremental load. For example, if you load data on a daily basis, create one partition for each day. By using this partitioning approach, you can use the technique described earlier in this paper (load and index data in a staging table, and then use ALTER TABLE … SWITCH) very efficiently. One partition is added for every incremental load, and the number of partitions increases over time.

Page 20: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

Optimize PerformancePartitioning Query processing improvements

Partition-aware seeks

Parallel queryplan strategies

Partition-aligned indexed views Switched together

with the partition

Easy-to-switchpartitions

P1

P2

P3

New Partition

Agg(P1)

Agg(P2)

Agg(P3)

Aggregates for New Partition

Detail DataDay level Indexed View

Month, Year level

Switch new partition

Data Loading

Page 21: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

21Data Cleansing

Page 23: ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08

ISV Innovation

Presented by

23

ISV Innovation

Presented by

www.isvinnovation.com

More recordings available at: