169 etl presentation
DESCRIPTION
etlTRANSCRIPT
-
ETL - Oracle Database Features and PL/SQL Techniques
Boyan Boev
CNsysBGOUG - 2005
-
Agenda
ETL ConceptsETL Process Development - RecommendationsOracle Database ConfigurationPL/SQL TechniquesDatabase FeaturesConclusionBGOUG - 2005
-
What is ETL?
ETL ElementsETL PurposeETL as a Part of Business Intelligence SolutionTypes of ETL Using Tools Custom PL/SQL Scripts BGOUG - 2005
-
ETL Extract, Transform and Load
ExtractPull the Data From the Source
TransformConvert the Input Format to the Target FormatEncode any Values
LoadInsert the Transformed Data to the Target TablesBGOUG - 2005
-
Agenda
ETL ConceptsETL Process Development - RecommendationsOracle Database ConfigurationPL/SQL TechniquesDatabase FeaturesConclusion
BGOUG - 2005
-
Oracle database configuration
Dont use ARCHIVE LOG Mode Reduce the Amount of I/O ActivityUse NOLOGGING DML StatementsALTER TABLE NOLOGGING;INSERT /*+ nologging*/ Change the stage schema data modelIndexesConstraintsTriggersBGOUG - 2005
-
Agenda
ETL ConceptsETL Process Development - RecommendationsOracle Database ConfigurationPL/SQL TechniquesDatabase FeaturesConclusion
BGOUG - 2005
-
ETL recommendations
PL/SQL Techniques
Bulk Binding Collection Types Pipelined Table Functions Package Variables Native CompilationBGOUG - 2005
-
PL/SQL Binding
Types of BindsIN-BindsOUT-Binds
Bind Options Single Row BindsBulk Binds
BGOUG - 2005
-
Single Row Binds
Cursor For LoopExample:BGOUG - 2005DECLARECURSORexample_cursor( p_id_in NUMBER ) IS SELECT * FROM customer_action WHERE cust_id = p_id_in;v_cust_id NUMBER := 1681;BEGINFOR rec INexample_cursor( v_cust_id ) LOOP INSERT INTO cust_action_hist() VALUES( ); END LOOP;END;/
-
BGOUG - 2005Context Switching
-
BGOUG - 2005Bulk Binding IN-Binds - An Array of Values is passed to SQL Engine OUT-Binds - SQL Engine populates a PL/SQL Bind Array
Context Switching once per Batch Instead of Once Per Row
Performance Increase to Up to 15 Times
-
BGOUG - 2005Bulk Bind Operators
Bulk Collect Specifying of the LIMIT clause Be Careful to handle last batchOPEN cursor LOOPFETCH cursor BULK COLLECT INTO LIMIT 100;EXIT WHEN cursor%NOTFOUND;END LOOP;
-
BGOUG - 2005Bulk Bind OperatorsFORALLBulk DML OperatorNot Looping Construct PL/SQL table is Referenced in the statement Handling and Reporting Exceptions VALUES OF , INDICES OF ( New in Oracle 10g )
FORALL i IN 1..20 INSERT INTO emp2 VALUES (enums(i), names(i), median(sals), ...);
-
BGOUG - 2005Collection Types
Associative Arrays( PL/SQL Tables )PL/SQL type only - not a SQL TypeEasy to Use
Nested Tables Shared Type
-
BGOUG - 2005Associative Arrays( PL/SQL Tables )
automatic element allocation no need to initialize
DECLARE TYPE EmpTabTyp IS TABLE OF employees%ROWTYPE INDEX BY PLS_INTEGER; emp_tab EmpTabTyp;BEGIN SELECT * BULK COLLECT INTO emp_tab FROM employees WHERE employee_id = 100;END;/
-
BGOUG - 2005Nested Tables
PL/SQL
Example
DECLARE TYPE nest_tab_t IS TABLE OF NUMBER; nt nest_tab_t := nest_tab_t(); BEGIN FOR i IN 1..100 LOOP nt.EXTEND; nt(i) := i; END LOOP; END;/
-
BGOUG - 2005Nested Tables
Global
Example
CREATE OR REPLACE TYPE email_demo_obj_t AS OBJECT ( email_idNUMBER, demo_codeNUMBER, valueVARCHAR2(30) ); /
CREATE OR REPLACE TYPE email_demo_nt_t AS TABLE OF email_demo_obj_t; /
-
BGOUG - 2005Nested Tables
SQL-Defined Nested Tables Sorted AggregatedIsed in Dynamic In-ListsJoined with SQL TablesJoined with other PL/SQL nested tablesNested Tables Enable Table FunctionsTABLE Operatortell Oracle to treat the Variable like SQL TableCAST OperatorExplicitly Tells Oracle the Data Type to be used to Handle the Operation
-
-
BGOUG - 2005Nested Tables
Global Nested tables example:DECLARE eml_dmo_nt email_demo_nt_t := email_demo_nt_t();BEGIN-- Some logic that populates the nested table eml_dmo_nt.EXTEND(3);eml_dmo_nt(1) := email_demo_obj_t(45, 3, '23');eml_dmo_nt(2) := email_demo_obj_t(22, 3, '41');eml_dmo_nt(3) := email_demo_obj_t(18, 7, 'over_100k');-- Process the data in assending order of email id.FOR r IN (SELECT * FROM TABLE(CAST(eml_dmo_nt AS email_demo_nt_t)) ORDER BY 1)LOOP dbms_output.put_line(r.email_id || ' ' || r.demo_id);END LOOP;END;/
-
BGOUG - 2005Table Functions
Data is Buffered in the Local Variables During Function Execution
Pipelined Table Functions
SELECT * FROM plsql_function;
-
BGOUG - 2005Pipelined Functions - exampleFUNCTION pipelined_demo( data SYS_REFCURSOR ) RETURN demo_nt_t PIPELINED IS CURSOR demo_cursor IS SELECT demo_obj_t(emp_id, value) FROM emp; emp_nt demo_nt_t;BEGIN OPEN email_demo_cur; LOOP EXIT WHEN email_demo_cur%NOTFOUND; FETCH email_demo_cur BULK COLLECT INTO eml_dmo_nt LIMIT 1000; FOR i IN 1..eml_dmo_nt.COUNT LOOP /* Apply some business logic on the object here, and return a row. */ PIPE ROW (eml_dmo_nt(i)); END LOOP; END LOOP; RETURN;END;
-
BGOUG - 2005Package Variables
What are Package Variables
Globally Declared Variables in Package, outside of a Procedure or Function DefinitionValues set by Package Initialization Code
Enable Data Caching
-
BGOUG - 2005Native Compilation
How it works? PL/SQL First is Compiled Down To P-CodeC Source Code is Generated from P-CodeNative Compiler is invoked Creating a 'C' Shared Object LibrarySubsequent Calls to PL/SQL Object are Run by the 'C' Library
NComp and Performance
-
Agenda
ETL ConceptsETL Process Development - RecommendationsOracle Database ConfigurationPL/SQL TechniquesDatabase FeaturesConclusion
BGOUG - 2005
-
ETL recommendations
Database Features and Utilities
External Tables( 9i )Transportable TablespacesDML Exception Handling( 10g R2 )Regular Expressions Support( 10g R1 )SQL*LoaderDirect Path INSERTParallel DML StatementsBGOUG - 2005
-
BGOUG - 2005Database Features
External tablesFile can be Queried as if it is a Real Database Table Can Sort, Aggregate, Filter Rows, etc.External File Can be Queried in ParallelOracle 9i - Read-Only using SQL*Loader EngineOnly read-only access to external filesOracle 10g - Read-Write using Data Pump EngineCan do read/write operations with external files
Transportable Tablespaces Oracle 10g - transport across different platforms
-
BGOUG - 2005
Database Features
DML Exception Handling ( New in 10g R2 ) Catch the Error and Move On: Error Logging ClauseEnabling:
SQL> exec dbms_errlog.CREATE_ERROR_LOG ('ACCOUNTS','ERR_ACCOUNTS');SQL> insert into accounts select * from accounts_ny log errors into err_accounts reject limit 200; SQL> select ORA_ERR_NUMBER$, ORA_ERR_MESG$, ACC_NO from err_accounts;
-
BGOUG - 2005Database Features
Regular Expression Support ( New in 10g R1 )
SQL*Loader
Direct Path Insert SQL*LoaderCREATE TABLE AS SELECT INSERT /*+ APPEND*/ SELECT
Parallel DML Statements
-
BGOUG - 2005Agenda
ETL ConceptsETL Process Development - RecommendationsOracle Database ConfigurationPL/SQL TechniquesDatabase FeaturesConclusion
-
BGOUG - 2005Example
create package with PIPELINED Function .... FUNCTION transform (new_data SYS_REFCURSOR) RETURN email_demo_nt_t PIPELINED PARALLEL_ENABLE ( PARTITION new_data BY ANY ) IS....Transformation is Just a Simple INSERT as SELECT Elegant solution to Parallel , Transactional Co-Processing INSERT /*+ append nologging*/ INTO ... SELECT /*+ parallel(a,4)*/ FROM .. TABLE( CAST( package.pipe_func_name( select * from table_3 ) as TYPE ) a);
-
BGOUG - 2005Conclusion
High Perofmance ETL Solution
-
BGOUG - 2005Recommended Readings Sites
http://www.oracle.com/technology//index.htmlhttp://rittman.nethttp://www.intelligententerprise.com/http://asktom.oracle.comhttp://www.dbazine.com/datawarehousehttp://dbasupport.com/
-
BGOUG - 2005Recommended Readings Books
http://www.amazon.com/exec/obidos/tg/detail/-/0471200247/qid=1129731960/sr=2-1/ref=pd_bbs_b_2_1/104-9260048-7507956?v=glance&s=books
-
BGOUG - 2005Recommended Readings Books
http://www.amazon.com/exec/obidos/tg/detail/-/0764567578/ref=pd_sim_b_1/104-9260048-7507956?%5Fencoding=UTF8&v=glance
-
BGOUG - 2005Recommended Readings Books
http://www.amazon.com/exec/obidos/tg/detail/-/1555583350/qid=1129732390/sr=2-2/ref=pd_bbs_b_2_2/104-9260048-7507956?v=glance&s=books
-
BGOUG - 2005
Questions or Comments
Boyan Boev
: ETL , . , Staging , , Database Features. , ETL .ETL Elements - ETL Extract, transform and load , , , . ETT extraction, transformation and transportationETL Purpose - ETL , , , , , , warehouse.ETL as a Part of Business Intelligence Solution - Data Warehouse , BI Solution , , , Business Intelligence . ETL , - Business Intelligence .
Extract staging . , , ETL . Oracle Databases, -Oracle , .. :- - , - - , ( , , , , ), Transform - , DW. : - - - - - - , , , , . , , , . DW . , , , .LOAD - data warehouse , , , - , .
, Oracle , ETL , 3 : staging PL/SQL feature utilities Oracle Database
, Oracle database, ETL . staging , PL/SQL Features Utilities. , staging . . staging , , , - NOARCHIVELOG MODE. / CPU. , / NOLOGGIND DML Statements, .. , REDO . NOLOGGING NOLOGGING - staging , Oracle , . , staging , DW , . , , , , ARCHIVE LOG . - . REDO . , a NOLOGGING NOLOGGING . , - staging , , (overhead) INSERT, UPDATE DELETE.
, , , ETL , PL/SQL .
PL/SQL :
PL/SQL SQL Statements BINDING. SQL IN OUT. BIND SQL . BIND , , .
BULK Cursor For Loop. , , INSERT . , Oracle PL/SQL . Oracle Database 2 engine- PL/SQL . PL/SQL Engine, SQL Statement SQL Engine. , PL/SQL SQL engines. , , . Engine-.explanation for context switching and perfomanceexplain performance benefitscontext switching schema SQL PL/SQL Engines. BIND , .. SQL Engine-a . , Oracle 8i 2 : BULK COLLECT FORALL, BULK BIND , - .Example: DECLARE TYPE sales_t IS TABLE OF f_sales_detail.sales_id%TYPE INDEX BY BINARY_INTEGER; sales_idssales_t; v_customer_idNUMBER := 1234; max_rowsCONSTANT NUMBER := 10000; CURSOR sales(p_customer_id NUMBER) IS SELECT sales_id FROM f_sales_detail WHERE customer_id = p_customer_id; BEGIN OPEN sales(v_customer_id); LOOP EXIT WHEN sales%NOTFOUND; FETCH sales BULK COLLECT INTO sales_ids LIMIT max_rows;
FORALL i IN 1..sales_ids.COUNT INSERT INTO sales_hist (customer_id, detail_id, process_date) VALUES (v_customer_id, sales_ids(i), sysdate); END LOOP; CLOSE sales; END; /
pipelined stage , nested , SQL. SELECT * FROM plsql_function; PIPELINED TABLE FUNCTION. ETL . , . . ( ETL ) . PIPELINED TABLE FUNCTIONS.
pipelined stage , nested , SQL. SELECT * FROM plsql_function; PIPELINED TABLE FUNCTION. ETL . , . . ( ETL ) . PIPELINED TABLE FUNCTIONS.
, ETL . 20 - , .
CREATE OR REPLACE PACKAGE BODY direct_mkt AS TYPE zip_array IS TABLE OF VARCHAR2(30) INDEX BY BINARY_INTEGER; zip_deref zip_array; PROCEDURE do_insert(f_name VARCHAR2, l_name VARCHAR2, zip NUMBER) IS BEGIN INSERT INTO user_data (f_nm, l_nm, city_nm) VALUES (f_name, l_name, zip_deref(zip)); COMMIT; END; -- Package initialization section. BEGIN FOR rec IN (SELECT zip_code, city FROM dma) LOOP zip_deref(rec.zip_code) := rec.city; END LOOP; END;
Caching Example:CREATE OR REPLACE PACKAGE BODY email_push AS TYPE email_array IS TABLE OF VARCHAR2(30) INDEX BY VARCHAR2(30); TYPE usermail_array IS TABLE OF email_array INDEX BY BINARY_INTEGER; user_emails usermail_array; FUNCTION lookup(p_userid NUMBER, p_email VARCHAR2) RETURN VARCHAR2 IS BEGIN RETURN user_emails(p_userid)(p_email); END;BEGIN FOR rec IN (SELECT user_id, email, l_name FROM user_emails) LOOP user_emails(rec.user_id)(rec.email) := rec.l_name; END LOOP;END; NComp, PL/SQL . PL/SQL Techniques: PL/SQL ETL . BULK BIND , , PL/SQL. , PIPELENED TABLES , . , features utilities , Oracle Database, ETL, ETL .Otnosno INSERT /*+ APPEND*/
Be sure you have disabled all reference constraints before executingthe direct load insert. If you do not, the append hint will be ignored, nowarnings will be issued, and a conventional insert will be used. Plus, theinsert will take a long time if there is a lot of data. Conventional path isused when using the INSERT...with the VALUES clause even if you use theAPPEND hint.
APPEND: SELECT COMMIT , INSERT-a DIRECT PATH , COMMIT SELECT , .
, , , ETL process. , . PIPELINED TABLE , , 1 , DIRECT PATH INSERT, REDO , NESTED TABLEs. , Oracle Database , , utilities features, ETL . ETL. , PL/SQL DW.This book should satisfy those who want a different perspective than the official Oracle documentation. It will cover all important aspects of a data warehouse while giving the necessary examples to make the reading a lively experience. - Tim Donar, Author and Systems Architect for Enterprise Data Warehouses Tuning a data warehouse database focuses on large transactions, mostly requiring what is known as throughput. Throughput is the passing of large amounts of information through a server, network and Internet environment, backwards and forwards, constantly! The ultimate objective of a data warehouse is the production of meaningful and useful reporting, from historical and archived data. The trick is to make the reports print within an acceptable time frame. A data model contains tables and relationships between tables. Tuning a data model involves Normalization and Denormalization. Different approaches are required depending on the application, such as OLTP or a Data Warehouse. Inappropriate database design can make SQL code impossible to tune. Poor data modeling can have a most profound effect on database performance since all SQL code is constructed from the data model. * Takes users beyond basics to critical issues in running most efficient data warehouse applications * Illustrates how to keep data going in and out in the most productive way possible * Focus is placed on Data Warehouse performance tuning