data integration best practices - ningapi.ning.com/.../distudiobestpractices.pdf · data...

63
www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146. Data Integration Best Practices (Healthy Habits for SAS® Data Integration Studio Users) SAS® Professionals Convention 14-16 July 2009

Upload: duongkien

Post on 28-Mar-2018

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

(Healthy Habits for SAS® Data Integration Studio Users)

SAS® Professionals Convention 14-16 July 2009

Page 2: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Abstract:

Version 9 of the SAS System offers tools to help developers and business users manage and organise the wealth of data and processes that face SAS professionals today. SAS® Data Integration Studio benefits from many features that support healthy habits for data integration, but they can only 'be of use' if they are 'being used'.

DI Studio allows customisation of the custom tree, error monitoring, job status handling, data validation, “conformed” data model support, self-documentation, and role assignment. Identification of the benefits behind using these functions is often enough to motivate users into controlled and organised methods of working.

This paper describes examples of best practice for developing data integration suites to ensure quality, efficiency and resilience is built into the heart of your enterprises’ information estate.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 3: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Subjects:

Data Integration Structure

Data Integration Organisation

Capture Control (CCT Tables)

Error Monitoring

Data Validation

Data Protection (Scrambler)

Conformed Modelling

SQL Optimisation

Self Documentation

Role Assignment

Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 4: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Structure

Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base?

Solution: Employ a Data Integration flow structure.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 5: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Structure

Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base?

Solution: Employ a Data Integration flow structure.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Source

Systems

Page 6: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Structure

Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base?

Solution: Employ a Data Integration flow structure.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Source

Systems

Detailed

Data Model

Page 7: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Structure

Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base?

Solution: Employ a Data Integration flow structure.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Source

Systems

Detailed

Data Model

Subject Specific Data Marts

Page 8: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Structure

Challenge: How can you best deliver Business Intelligence from a variety of source systems across a diverse consumer base?

Solution: Employ a Data Integration flow structure.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Source

Systems

Detailed

Data Model

Subject Specific Data Marts

Subject Specific Business Intelligence

Page 9: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Subjects:

Data Integration Structure

Data Integration Organisation

Capture Control (CCT Tables)

Error Monitoring

Data Validation

Data Protection (Scrambler)

Conformed Modelling

SQL Optimisation

Self Documentation

Role Assignment

Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 10: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Organisation

Challenge: How can you keep track of the thousands of jobs typically created in a data integration suite?

Solution: Utilise the custom tree in SAS Data Integration Studio.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 11: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Data Integration Organisation

Create folders for each integration layer.

Sub divide them by:

Jobs

Libraries

Tables

Number the folders preserve order.

Stick to methodology:

(e.g. don’t transform in capture layer)

Page 12: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Subjects:

Data Integration Structure

Data Integration Organisation

Capture Control (CCT Tables)

Error Monitoring

Data Validation

Data Protection (Scrambler)

Conformed Modelling

SQL Optimisation

Self Documentation

Role Assignment

Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 13: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Capture Control

Challenge: How can I perform incremental extracts from several source systems?

Solution: Define Capture Control Tables for each source table.

Status – To ensure smooth running of DI suite.

(Started, Failed, or Success)

From/To Datetimes – To extract against the “last updated” column in the database. Also useful to determine processing times as data increases day by day.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 14: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Capture Control

Send Job Status to dataset

with same name as the job.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 15: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Capture Control

Only extract records

which have updated

since last run.

Source Systems

Conformed Model

Capture Job

Page 16: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Capture Control

Only extract records

which have updated

since last run.

Source Systems

Conformed Model

CoreInfo Tables

Capture Job

Page 17: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Capture Control

Only extract records

which have updated

since last run.

Source Systems

Conformed Model

CoreInfo Tables

Pre Capture Job Post

Page 18: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Capture Control – Pre-Processing

SAS® Professionals Convention 14-16 July 2009

Update dates in CCT table for this source.

(&source_table._CCT)

Is this the first time the job has run

successfully today?

Did the previous run fail, or not finish?

Warn that duplicate facts will occur.

Warn that this is a replacement run.

No

No

Yes

Yes

Data Integration Best Practices

Page 19: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Capture Control – Post-Processing

SAS® Professionals Convention 14-16 July 2009

Update dates in CCT table for this source.

(&source_table._CCT)

Did the job run successfully ?

Update CCT table with Status= “Failed”.

No

Yes

Data Integration Best Practices

Page 20: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Subjects:

Data Integration Structure

Data Integration Organisation

Capture Control (CCT Tables)

Error Monitoring

Data Validation

Data Protection (Scrambler)

Conformed Modelling

SQL Optimisation

Self Documentation

Role Assignment

Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 21: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Error Monitoring

Challenge: How can I keep my production support department informed of job failures/successes ?

Solution: Email job statistics to designated mailbox.

Create User Transform called Email_Stats.

Add Email_Stats transform to each job.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 22: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Error Monitoring

Add Email_Statstransform to Job.

Page 23: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Error Monitoring

Drag Target table to one input.

Drag Email_Stats to other input.(Email_Stats table containsemail addresses of recipients).

Don’t “hard-code” email addresses. What happens when people leave? Different recipients for dev/prod.

Page 24: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Error Monitoring

Email_Statstransform properties.

Only emails if job has failed.

Page 25: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Error Monitoring

Last job in flow always sendsemail to Admin & Support.

Set Last Job to Yes.

Page 26: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Subjects:

Data Integration Structure

Data Integration Organisation

Capture Control (CCT Tables)

Error Monitoring

Data Validation

Data Protection (Scrambler)

Conformed Modelling

SQL Optimisation

Self Documentation

Role Assignment

Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 27: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Validation

Challenge: How can I ensure only clean data gets loaded into the warehouse?

Solution: Use the Data Validation transformation.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 28: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Validation

Challenge: How can I ensure only clean data gets loaded into the warehouse?

Solution: Use the Data Validation transformation.

Use the standard Invalid, Missing, Duplicate tabs.

Employ custom validation and apply a severity rating:

1 = Exclusion

2 = Correction

3 = Improvement

Store exceptions in permanent dataset for further analysis.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 29: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Data Validation

e.g.

Check for

Truncation of

Key columns

Page 30: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Data Validation

1) Create each condition

Page 31: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Data Validation

1) Create each condition

2) Determine validation

Page 32: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Data Validation

1) Create each condition

2) Determine validation

3) Define corrective action if required

Page 33: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Data Validation

1) Create each condition

2) Determine validation

3) Define corrective action if required

4) This gets written to temp dataset ETLS_EXCEPTIONS.

Page 34: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Data Validation

1) Create each condition

2) Determine validation

3) Define corrective action if required

4) This gets written to temp dataset ETLS_EXCEPTIONS.

5) Run %Append_Data_Quality Macro in post-processing.

Page 35: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Data Validation

1) Create each condition

2) Determine validation

3) Define corrective action if required

4) This gets written to temp dataset ETLS_EXCEPTIONS.

5) Run %Append_Data_Quality Macro in post-processing.

6) Use BI tools to investigate Data Quality issues (e.g. Particular source system requires cleansing)

Page 36: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Validation – %Append_Data_Quality Macro Logic.

SAS® Professionals Convention 14-16 July 2009

Append exceptions to permanent tableDQ_Error_Event.

Does ETLS_EXCEPTIONS

exist ?

Halt macro as no errors to process.

No

Yes

Data Integration Best Practices

Page 37: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Validation – Table Properties for DQ_ERROR_EVENT.

SAS® Professionals Convention 14-16 July 2009

Column name Description Type Length

Row_Extraction_Date Date-timestamp when the row was exported or extracted from the source system. Num (8)

Exception_Event_Date Date-timestamp when the exception was identified by the data warehouse processes. Num (8)

Job_Name The name of the ETL job which identified the exception. Char (64)

Table_Name The library and table name which contains the row and column containing the exception. Char (41)

Row_Number The row number containing the exception. Num (8)

Column_Name The column name containing the datum of the exception. Char (32)

Screen_Description The screen (data quality test) description. Char (256)

Exception_Description Standardised description of the exception . Char (256)

Exception_Action Automated data conform action (if any) . Char (256)

Exception_Severity The severity level of the DQ Error Event (1=Exclusion, 2=Correction, 3=Improvement ). Num (8)

Unconformed_ValueN Original value (numeric) before conforming . Num (8)

Conformed_ValueN Conformed (numeric) value . Num (8)

Unconformed_ValueC Original value (character) before conforming . Char (256)

Conformed_ValueC Conformed (character) value . Char (256)

Data Integration Best Practices

Page 38: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Subjects:

Data Integration Structure

Data Integration Organisation

Capture Control (CCT Tables)

Error Monitoring

Data Validation

Data Protection (Scrambler)

Conformed Modelling

SQL Optimisation

Self Documentation

Role Assignment

Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 39: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Scrambling

Challenge: How can I ensure I’m not holding sensitive production data on development/test systems.

Solution: Use Data Scrambling routines in non-production environments.

Often development source systems are created using production data, and warehouses can propagate the risk of breaching the data protection act.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 40: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Data Scrambling – Custom Transform

The %data_scrambler macro allows for columns to be scrambled or passed through normally.

Page 41: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Data Scrambling – Custom transform

Edit Paramters:SelectPass – don’t scramble key fields!

Scramble method:Ranuni FunctionMD5 FunctionTranslate Function

Page 42: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Data Scrambling – What about Production?

%let liveEnvironment = PROD;

%let thisEnvironment= %sysfunc(substr(%sysfunc(upcase(%sysfunc(getoption(METASERVER)))),1,4);

Don’t perform scramble routine if thisEnvironment = liveEnvironment. When runnning in Dev the METASERVER option should be different. Could set up a table with environment value in.

Page 43: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Subjects:

Data Integration Structure

Data Integration Organisation

Capture Control (CCT Tables)

Error Monitoring

Data Validation

Data Protection (Scrambler)

Conformed Modelling

SQL Optimisation

Self Documentation

Role Assignment

Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 44: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Conformed Model

Challenge: How can I track trends in my data when the source systems don’t hold history.

Solution: Use a conformed data model in a warehouse, using slowly changing dimensions where appropriate.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Re-Useable

Dimensions

Fact Tables

Page 45: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Conformed Model

In the Integrate layer use the SCD Type II Loader transform to make use of effective date processing.

Page 46: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Conformed Model

In the Integrate Layer use the Surrogate Key Generator to determine keys for dimension tables.

Page 47: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Subjects:

Data Integration Structure

Data Integration Organisation

Capture Control (CCT Tables)

Error Monitoring

Data Validation

Data Protection (Scrambler)

Conformed Modelling

SQL Optimisation

Self Documentation

Role Assignment

Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 48: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

SQL Optimisation

Challenge: How can I ensure the best possible SQL performance is achieved through my SQL Join transform.

Solution: Use the undocumented _Method option on the SQL procedure to determine processing.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 49: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

SQL Optimisation: _Method Option (SAS Note 33604)

Page 50: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Subjects:

Data Integration Structure

Data Integration Organisation

Capture Control (CCT Tables)

Error Monitoring

Data Validation

Data Protection (Scrambler)

Conformed Modelling

SQL Optimisation

Self Documentation

Role Assignment

Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 51: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Self Documentation

Challenge: How can I ensure the executed warehouse code is documented to an acceptable standard?

Solution: DI Studio self documents the code, based on descriptions in in the job and transform properties.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 52: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Self Documentation

Meaningful Job names

Descriptions of “why” not just “what”.

Page 53: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Self Documentation

Use Notes and Document Attachments.

Page 54: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Self Documentation

Descriptions & Notes are propagated through to the executable code, benefitting production support teams.

Page 55: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Subjects:

Data Integration Structure

Data Integration Organisation

Capture Control (CCT Tables)

Error Monitoring

Data Validation

Data Protection (Scrambler)

Conformed Modelling

SQL Optimisation

Self Documentation

Role Assignment

Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 56: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Role Assignment

Challenge: How can I address who is responsible for which job / entity?

Solution: Use Role Assignment in DI studio.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 57: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Role Assignment

Allocate names and roles where required.

Page 58: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Subjects:

Data Integration Structure

Data Integration Organisation

Capture Control (CCT Tables)

Error Monitoring

Data Validation

Data Protection (Scrambler)

Conformed Modelling

SQL Optimisation

Self Documentation

Role Assignment

Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 59: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Rename Standard Transforms

Challenge: How can I keep track of processing in a job which has a lot of transformations.

Solution: Don’t use the default transform names, but rename the default to something meaningful.

E.g. Rename “SQL Join” to “Merge Agent_Dim with Broker_Dim”

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 60: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Subjects:

Data Integration Structure

Data Integration Organisation

Capture Control (CCT Tables)

Error Monitoring

Data Validation

Data Protection (Scrambler)

Conformed Modelling

SQL Optimisation

Self Documentation

Role Assignment

Rename Standard Transforms

SAS DI Studio Version 3.4 under SAS Intelligence Platform 9.1.3

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 61: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Contributors

• Mick Collington

• Jethro Day

• Steve Morton

• Nick Treadgold

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 62: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Contributors

• Mick Collington

• Jethro Day

• Steve Morton

• Nick Treadgold

Data Integration Developer Group (SAS Professionals)

• Julien Heijster

• John Robertson

http://www.sasprofessionals.net/group/dataintegrationdeveloper/

forum/topics/data-integration-best

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009

Page 63: Data Integration Best Practices - Ningapi.ning.com/.../DIStudioBestPractices.pdf · Data Integration Best Practices SAS ... Run %Append_Data_Quality Macro in post-processing. 6) Use

www.definitivequality.com Copyright © 2009 Definitive Quality Solutions Limited. Registered in England No.:05141146.

Contributors

• Mick Collington

• Jethro Day

• Steve Morton

• Nick Treadgold

Data Integration Developer Group (SAS Professionals)

• Julien Heijster

• John Robertson

http://www.sasprofessionals.net/group/dataintegrationdeveloper/

forum/topics/data-integration-best

SAS.COM

Data Integration Best Practices

SAS® Professionals Convention 14-16 July 2009