extraction, transformation, and loading (etl)
DESCRIPTION
Transformation. Extraction, Transformation, and Loading (ETL). Objectives. After completing this lesson, you should be able to do the following: Explain transformation using SQL CREATE TABLE AS SELECT (CTAS) UPDATE MERGE Multitable INSERT - PowerPoint PPT PresentationTRANSCRIPT
6Extraction, Transformation,and Loading (ETL)
Transformation
Copyright © 2005, Oracle. All rights reserved.6-2
Objectives
After completing this lesson, you should be able to do the following:
• Explain transformation using SQL– CREATE TABLE AS SELECT (CTAS)– UPDATE– MERGE– Multitable INSERT
• Describe transformation using table functions and PL/SQL
• Implement DML Error Logging
Copyright © 2005, Oracle. All rights reserved.6-3
Data Transformation
• Data transformations are often the most complex and costly part of the ETL process.
• Transformations can range from simple data conversion to complex scrubbing techniques.
• Many, if not all, transformations can occur within the Oracle database.
• External transformations outside the database are supported (flat files, for instance).
• Data can be transformed in two ways:– Multistage data transformation – Pipelined data transformation
Copyright © 2005, Oracle. All rights reserved.6-4
Multistage Data Transformation
Load into staging table.
NEW_SALES_STEP1
NEW_SALES_STEP2 Convert source
product keys
to warehouse
product keys.
SALES
Flat files
Validate customer
keys (look up in
customer
dimension table).
NEW_SALES_STEP3Insert into SALES
warehouse table.
Copyright © 2005, Oracle. All rights reserved.6-5
Pipelined Data Transformation
SALES
Flat files
Validate customer keys (look up in
customer dimension table).
External table
Convert source
product keys
to warehouse
product keys.
Insert into SALES
warehouse table.
Copyright © 2005, Oracle. All rights reserved.6-6
Transformation Mechanisms
You have the following choices for transforming data inside the database:
• SQL
• PL/SQL
• Table functions
Copyright © 2005, Oracle. All rights reserved.6-7
Transformation Using SQL
After data is loaded into the database, transformations can be executed using SQL operations:
• CREATE TABLE ... AS SELECT
• INSERT /*+APPEND*/ AS SELECT• UPDATE• MERGE• Multitable INSERT
Copyright © 2005, Oracle. All rights reserved.6-8
CREATE TABLE … AS SELECTand UPDATE
DESC sales_activity_directName Null? Type------------ ----- ---------SALES_DATE DATEPRODUCT_ID NUMBERCUSTOMER_ID NUMBERPROMOTION_ID NUMBERAMOUNT NUMBERQUANTITY NUMBER
INSERT /*+ APPEND NOLOGGING PARALLEL(sales) */INTO sales asSELECT product_id, customer_id, TRUNC(sales_date), 3,promotion_id, quantity, amountFROM sales_activity_direct;
Copyright © 2005, Oracle. All rights reserved.6-10
MERGE Statement: Overview
• MERGE statements provide the ability to conditionally UPDATE/INSERT into the database.
• The DELETE clause for UPDATE branch is provided for implicit data maintenance.
• Conditions are specified in the ON clause.
• MERGE statements do an UPDATE if the row exists and an INSERT if it is a new row.
• Using MERGE statements avoids multiple updates.– MERGE statements use a single SQL
statement to complete an UPDATE or INSERT or both.
Copyright © 2005, Oracle. All rights reserved.6-11
Data Warehousing MERGE: Example
MERGE INTO customer C
USING cust_src S
ON (c.customer_id = s.src_customer_id)
WHEN MATCHED THEN
UPDATE SET c.cust_address = s.cust_address
WHEN NOT MATCHED THEN
INSERT ( Customer_id, cust_first_name,…) VALUES (src_customer_id, src_first_name,…);
Copyright © 2005, Oracle. All rights reserved.6-12
Data Maintenance with MERGE/DELETE
MERGE USING product_changes SINTO products D ON (d.prod_id = s.prod_id)WHEN MATCHED THENUPDATE SET d.prod_list_price = s.prod_new_price, d.prod_status = s.prod_newstatusDELETE WHERE (d.prod_status = "OBSOLETE")WHEN NOT MATCHED THENINSERT (prod_id, prod_list_price, prod_status)VALUES (s.prod_id, s.prod_new_price, s.prod_new_status);
Copyright © 2005, Oracle. All rights reserved.6-13
Overview of Multitable INSERT Statements
• Allows the INSERT … AS SELECT statement to insert rows into multiple tables as part of a single DML statement
• Can be used in data warehousing systems to transfer data from one or more operational sources to a set of target tables
• Types of multitable INSERT statements:– Unconditional INSERT– Pivoting INSERT– Conditional ALL INSERT– Conditional FIRST INSERT
Copyright © 2005, Oracle. All rights reserved.6-14
Example of Unconditional INSERT
INSERT ALLINTO sales VALUES (product_id, customer_id, today, 3, promotion_id, quantity_per_day, amount_per_day)INTO costs VALUES (product_id, today, promotion_id, 3, product_cost, product_price)SELECT TRUNC(s.sales_date) AS today, s.product_id, s.customer_id, s.promotion_id, SUM(s.amount) AS amount_per_day, SUM(s.quantity) quantity_per_day, p.prod_min_price*0.8 AS product_cost, p.prod_list_price AS product_priceFROM sales_activity_direct s, products pWHERE s.product_id = p.prod_id AND TRUNC(sales_date) = TRUNC(SYSDATE)GROUP BY TRUNC(sales_date), s.product_id, s.customer_id, s.promotion_id, p.prod_min_price*0.8, p.prod_list_price;
Copyright © 2005, Oracle. All rights reserved.6-15
Example of Conditional ALL INSERT
INSERT ALL
WHEN product_id IN (SELECT product_id
FROM promotional_items) THEN
INTO promotional_sales VALUES(product_id,list_price)
WHEN order_mode = 'online' THEN
INTO web_orders VALUES(product_id, order_total)
SELECT product_id, list_price, order_total, order_mode
FROM orders;
Copyright © 2005, Oracle. All rights reserved.6-16
Example of Pivoting INSERT
INSERT ALL
INTO sales (product_id, customer_id, week, amount)
VALUES (product_id, customer_id, weekly_start_date, sales_sun)
INTO sales (product_id, customer_id, week, amount)
VALUES (product_id, customer_id, weekly_start_date+1, sales_mon)
INTO sales (product_id, customer_id, week, amount)
VALUES (product_id, customer_id, weekly_start_date+2, sales_tue)
INTO sales (product_id, customer_id, week, amount)
VALUES (product_id, customer_id, weekly_start_date+3, sales_wed)
...
SELECT product_id, customer_id, weekly_start_date, sales_sun,
sales_mon, sales_tue, sales_wed, sales_thu, sales_fri, sales_sat
FROM sales_summary;
Copyright © 2005, Oracle. All rights reserved.6-17
Example of Conditional FIRST INSERT
INSERT FIRST
WHEN order_total > 10000 THEN
INTO priority_handling VALUES (id)
WHEN order_total > 5000 THEN
INTO special_handling VALUES (id)
WHEN total > 3000 THEN
INTO privilege_handling VALUES (id)
ELSE
INTO regular_handling VALUES (id)
SELECT order_total , order_id id
FROM orders ;
Copyright © 2005, Oracle. All rights reserved.6-18
Overview of Table Functions
• A table function is a function that can produce a set of rows as output.– Input can be a set of rows.
• Table functions support pipelined and parallel execution of transformations using:– PL/SQL– C programming language– Java
• Table functions are used in the FROM clause of a SELECT statement.
Copyright © 2005, Oracle. All rights reserved.6-19
Creating Table Functions
CREATE OR REPLACE FUNCTION transform(p ref_cur_type)
RETURN table_order_items_type
PIPELINED
PARALLEL_ENABLE( PARTITION p BY ANY) IS
BEGIN
FOR rec IN p LOOP
… -- Transform this record
PIPE ROW (rec);
END LOOP;
RETURN;
END;
Copyright © 2005, Oracle. All rights reserved.6-21
Using Table Functions
SELECT *
FROM TABLE(transform(cursor(SELECT *
FROM
order_items_ext)));
INSERT /*+ APPEND, PARALLEL(order_items) */
INTO order_items
SELECT *
FROM TABLE(transform(cursor(SELECT *
FROM
order_items_ext)));
Copyright © 2005, Oracle. All rights reserved.6-22
DML Error Logging: Overview
• Aborting long-running bulk DML operations is wasteful of time and system resources.
• DML Error Logging allows bulk DML operations to continue processing after a DML error occurs.– Errors are logged to an error-logging table.
• DML Error Logging combines the power and speed of bulk processing with the functionality of row-level error handling.
Copyright © 2005, Oracle. All rights reserved.6-23
DML Error Logging Concepts
• An error-logging table is created and associated with an existing table by using the DBMS_ERRLOG package.
• The LOG ERRORS INTO... clause is added to the bulk DML load statement providing:– Error-logging table name– Statement tag– Reject limit
INSERT INTO bonusesSELECT employee_id, salary*1.1 FROM employeesWHERE commission_pct > .2LOG ERRORS INTO errlog ('my_bad') REJECT LIMIT 10;
EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('bonuses', 'errlog');
Copyright © 2005, Oracle. All rights reserved.6-25
Error-Logging Table
• The error-logging table logically consists of two parts:– Fixed: Columns that describe the error– Variable: Columns that contain data values
corresponding to the row in error
• Column names for the variable part determine the DML table columns logged when an error occurs.
• The columns of the DML table to be logged is the intersection of the column names of the DML table and the error-logging table.
Copyright © 2005, Oracle. All rights reserved.6-27
Error-Logging Table Format
Fixed Portion: Error information columns
User-supplied tag value via DML statement
varchar2(4000) ORA_ERR_TAG$
Type of operation: (I)nsert, (U)pdate, (D)elete
varchar2(2) ORA_ERR_OPTYP$
Row ID of row in error (UPDATE/DELETE)
row ID ORA_ERR_ROWID$
Oracle error message text varchar2(4000) ORA_ERR_MESG$
Oracle error number varchar2(10) ORA_ERR_NUMBER$
Information Data type Column name
Copyright © 2005, Oracle. All rights reserved.6-28
Inserting into a Table with Error Logging
Example:
CREATE TABLE bonuses (emp_id NUMBER, sal NUMBER CHECK(sal > 8000));
EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('bonuses', 'errlog');
INSERT INTO bonuses SELECT employee_id, salary*1.1 FROM employees WHERE commission_pct > .2 LOG ERRORS INTO errlog (’my_bad’) REJECT LIMIT 10;
1
2
3
Copyright © 2005, Oracle. All rights reserved.6-29
Summary
In this lesson, you should have learned how to:
• Explain transformation using SQL– CREATE TABLE AS SELECT (CTAS)– UPDATE– MERGE– Multitable INSERT
• Describe transformation using table functions
• Implement DML Error Logging
Copyright © 2005, Oracle. All rights reserved.6-30
Practice 6: Overview
This practice covers the following topics:
• Loading data using multitable inserts
• Loading data using multitable conditional insert
• Loading data using MERGE• Logging DML load errors