etl - data extraction pointers

9
Data Extraction ETL - Step by Step process www.psibertech.com.sg

Upload: psibertech-solutions-pte-ltd

Post on 15-Jul-2015

177 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: ETL - Data extraction pointers

Data ExtractionETL - Step by Step process

www.psibertech.com.sg

Page 2: ETL - Data extraction pointers

ETL Workflow

SQL Server

DB2

Oracle

Flat File

Extract

Transform

Load

Stage

Warehouse

Data Mart

Page 3: ETL - Data extraction pointers

Extraction

Extract is the process of reading data from a database.

• During extraction, the desired data is identified and extracted from many different sources, including database systems and applications.

• Very often, it is not possible to identify the specific subset of interest, therefore more data than necessary has to be extracted, so the identification of the relevant data will be done at a later point in time.

Page 4: ETL - Data extraction pointers

Ways to perform the extraction

• Update notification - if the source system is able to provide a notification that a record has been changed and describe the change, this is the easiest way to get the data.

• Incremental extract - some systems may not be able to provide notification that an update has occurred, but they are able to identify which records have been modified and provide an extract of such records.

• Full extract - some systems are not able to identify which data has been changed at all, so a full extract is the only way one can get the data out of the system. The full extract requires keeping a copy of the last extract in the same format in order to be able to identify changes.

Page 5: ETL - Data extraction pointers

Transform

Transform is the process of converting the extracted data from its previous form into the form it needs to be in, so that it can be used for further analysis.

• Depending on the chosen way of transportation, some transformations can be done during this process, too.

Page 6: ETL - Data extraction pointers

Load

Load is the process of writing the data into the target database.

• During the load step, it is necessary to ensure that the load is performed correctly and with as little resources as possible.

• The target of the Load process is often a database.

• In order to make the load process efficient, it is helpful to disable any constraints and indexes before the load and enable them back only after the load completes.

Page 7: ETL - Data extraction pointers

Fails

• As with every application, there is a possibility that the ETL process fails.

• This can be caused by missing extracts from one of the systems, missing values in one of the reference tables, or simply a connection or power outage.

• Therefore, it is necessary to design the ETL process keeping fail-recovery in mind.

Page 8: ETL - Data extraction pointers

Closing Note

• Designing and maintaining the ETL process is often considered one of the most difficult and resource-intensive portions of a data warehouse project.

• Many data warehousing projects use ETL tools to manage this process.

• Hire the right consultant to manage your ETL process seamlessly.

Page 9: ETL - Data extraction pointers

www.psibertech.com.sg+65 62689551