169 etl presentation

37
ETL - Oracle Database Features and PL/SQL Techniques Boyan Boev CNsys BGOUG - 2005

Upload: manjunathaug3

Post on 22-Nov-2015

24 views

Category:

Documents


2 download

DESCRIPTION

etl

TRANSCRIPT

  • ETL - Oracle Database Features and PL/SQL Techniques

    Boyan Boev

    CNsysBGOUG - 2005

  • Agenda

    ETL ConceptsETL Process Development - RecommendationsOracle Database ConfigurationPL/SQL TechniquesDatabase FeaturesConclusionBGOUG - 2005

  • What is ETL?

    ETL ElementsETL PurposeETL as a Part of Business Intelligence SolutionTypes of ETL Using Tools Custom PL/SQL Scripts BGOUG - 2005

  • ETL Extract, Transform and Load

    ExtractPull the Data From the Source

    TransformConvert the Input Format to the Target FormatEncode any Values

    LoadInsert the Transformed Data to the Target TablesBGOUG - 2005

  • Agenda

    ETL ConceptsETL Process Development - RecommendationsOracle Database ConfigurationPL/SQL TechniquesDatabase FeaturesConclusion

    BGOUG - 2005

  • Oracle database configuration

    Dont use ARCHIVE LOG Mode Reduce the Amount of I/O ActivityUse NOLOGGING DML StatementsALTER TABLE NOLOGGING;INSERT /*+ nologging*/ Change the stage schema data modelIndexesConstraintsTriggersBGOUG - 2005

  • Agenda

    ETL ConceptsETL Process Development - RecommendationsOracle Database ConfigurationPL/SQL TechniquesDatabase FeaturesConclusion

    BGOUG - 2005

  • ETL recommendations

    PL/SQL Techniques

    Bulk Binding Collection Types Pipelined Table Functions Package Variables Native CompilationBGOUG - 2005

  • PL/SQL Binding

    Types of BindsIN-BindsOUT-Binds

    Bind Options Single Row BindsBulk Binds

    BGOUG - 2005

  • Single Row Binds

    Cursor For LoopExample:BGOUG - 2005DECLARECURSORexample_cursor( p_id_in NUMBER ) IS SELECT * FROM customer_action WHERE cust_id = p_id_in;v_cust_id NUMBER := 1681;BEGINFOR rec INexample_cursor( v_cust_id ) LOOP INSERT INTO cust_action_hist() VALUES( ); END LOOP;END;/

  • BGOUG - 2005Context Switching

  • BGOUG - 2005Bulk Binding IN-Binds - An Array of Values is passed to SQL Engine OUT-Binds - SQL Engine populates a PL/SQL Bind Array

    Context Switching once per Batch Instead of Once Per Row

    Performance Increase to Up to 15 Times

  • BGOUG - 2005Bulk Bind Operators

    Bulk Collect Specifying of the LIMIT clause Be Careful to handle last batchOPEN cursor LOOPFETCH cursor BULK COLLECT INTO LIMIT 100;EXIT WHEN cursor%NOTFOUND;END LOOP;

  • BGOUG - 2005Bulk Bind OperatorsFORALLBulk DML OperatorNot Looping Construct PL/SQL table is Referenced in the statement Handling and Reporting Exceptions VALUES OF , INDICES OF ( New in Oracle 10g )

    FORALL i IN 1..20 INSERT INTO emp2 VALUES (enums(i), names(i), median(sals), ...);

  • BGOUG - 2005Collection Types

    Associative Arrays( PL/SQL Tables )PL/SQL type only - not a SQL TypeEasy to Use

    Nested Tables Shared Type

  • BGOUG - 2005Associative Arrays( PL/SQL Tables )

    automatic element allocation no need to initialize

    DECLARE TYPE EmpTabTyp IS TABLE OF employees%ROWTYPE INDEX BY PLS_INTEGER; emp_tab EmpTabTyp;BEGIN SELECT * BULK COLLECT INTO emp_tab FROM employees WHERE employee_id = 100;END;/

  • BGOUG - 2005Nested Tables

    PL/SQL

    Example

    DECLARE TYPE nest_tab_t IS TABLE OF NUMBER; nt nest_tab_t := nest_tab_t(); BEGIN FOR i IN 1..100 LOOP nt.EXTEND; nt(i) := i; END LOOP; END;/

  • BGOUG - 2005Nested Tables

    Global

    Example

    CREATE OR REPLACE TYPE email_demo_obj_t AS OBJECT ( email_idNUMBER, demo_codeNUMBER, valueVARCHAR2(30) ); /

    CREATE OR REPLACE TYPE email_demo_nt_t AS TABLE OF email_demo_obj_t; /

  • BGOUG - 2005Nested Tables

    SQL-Defined Nested Tables Sorted AggregatedIsed in Dynamic In-ListsJoined with SQL TablesJoined with other PL/SQL nested tablesNested Tables Enable Table FunctionsTABLE Operatortell Oracle to treat the Variable like SQL TableCAST OperatorExplicitly Tells Oracle the Data Type to be used to Handle the Operation

    -

  • BGOUG - 2005Nested Tables

    Global Nested tables example:DECLARE eml_dmo_nt email_demo_nt_t := email_demo_nt_t();BEGIN-- Some logic that populates the nested table eml_dmo_nt.EXTEND(3);eml_dmo_nt(1) := email_demo_obj_t(45, 3, '23');eml_dmo_nt(2) := email_demo_obj_t(22, 3, '41');eml_dmo_nt(3) := email_demo_obj_t(18, 7, 'over_100k');-- Process the data in assending order of email id.FOR r IN (SELECT * FROM TABLE(CAST(eml_dmo_nt AS email_demo_nt_t)) ORDER BY 1)LOOP dbms_output.put_line(r.email_id || ' ' || r.demo_id);END LOOP;END;/

  • BGOUG - 2005Table Functions

    Data is Buffered in the Local Variables During Function Execution

    Pipelined Table Functions

    SELECT * FROM plsql_function;

  • BGOUG - 2005Pipelined Functions - exampleFUNCTION pipelined_demo( data SYS_REFCURSOR ) RETURN demo_nt_t PIPELINED IS CURSOR demo_cursor IS SELECT demo_obj_t(emp_id, value) FROM emp; emp_nt demo_nt_t;BEGIN OPEN email_demo_cur; LOOP EXIT WHEN email_demo_cur%NOTFOUND; FETCH email_demo_cur BULK COLLECT INTO eml_dmo_nt LIMIT 1000; FOR i IN 1..eml_dmo_nt.COUNT LOOP /* Apply some business logic on the object here, and return a row. */ PIPE ROW (eml_dmo_nt(i)); END LOOP; END LOOP; RETURN;END;

  • BGOUG - 2005Package Variables

    What are Package Variables

    Globally Declared Variables in Package, outside of a Procedure or Function DefinitionValues set by Package Initialization Code

    Enable Data Caching

  • BGOUG - 2005Native Compilation

    How it works? PL/SQL First is Compiled Down To P-CodeC Source Code is Generated from P-CodeNative Compiler is invoked Creating a 'C' Shared Object LibrarySubsequent Calls to PL/SQL Object are Run by the 'C' Library

    NComp and Performance

  • Agenda

    ETL ConceptsETL Process Development - RecommendationsOracle Database ConfigurationPL/SQL TechniquesDatabase FeaturesConclusion

    BGOUG - 2005

  • ETL recommendations

    Database Features and Utilities

    External Tables( 9i )Transportable TablespacesDML Exception Handling( 10g R2 )Regular Expressions Support( 10g R1 )SQL*LoaderDirect Path INSERTParallel DML StatementsBGOUG - 2005

  • BGOUG - 2005Database Features

    External tablesFile can be Queried as if it is a Real Database Table Can Sort, Aggregate, Filter Rows, etc.External File Can be Queried in ParallelOracle 9i - Read-Only using SQL*Loader EngineOnly read-only access to external filesOracle 10g - Read-Write using Data Pump EngineCan do read/write operations with external files

    Transportable Tablespaces Oracle 10g - transport across different platforms

  • BGOUG - 2005

    Database Features

    DML Exception Handling ( New in 10g R2 ) Catch the Error and Move On: Error Logging ClauseEnabling:

    SQL> exec dbms_errlog.CREATE_ERROR_LOG ('ACCOUNTS','ERR_ACCOUNTS');SQL> insert into accounts select * from accounts_ny log errors into err_accounts reject limit 200; SQL> select ORA_ERR_NUMBER$, ORA_ERR_MESG$, ACC_NO from err_accounts;

  • BGOUG - 2005Database Features

    Regular Expression Support ( New in 10g R1 )

    SQL*Loader

    Direct Path Insert SQL*LoaderCREATE TABLE AS SELECT INSERT /*+ APPEND*/ SELECT

    Parallel DML Statements

  • BGOUG - 2005Agenda

    ETL ConceptsETL Process Development - RecommendationsOracle Database ConfigurationPL/SQL TechniquesDatabase FeaturesConclusion

  • BGOUG - 2005Example

    create package with PIPELINED Function .... FUNCTION transform (new_data SYS_REFCURSOR) RETURN email_demo_nt_t PIPELINED PARALLEL_ENABLE ( PARTITION new_data BY ANY ) IS....Transformation is Just a Simple INSERT as SELECT Elegant solution to Parallel , Transactional Co-Processing INSERT /*+ append nologging*/ INTO ... SELECT /*+ parallel(a,4)*/ FROM .. TABLE( CAST( package.pipe_func_name( select * from table_3 ) as TYPE ) a);

  • BGOUG - 2005Conclusion

    High Perofmance ETL Solution

  • BGOUG - 2005Recommended Readings Sites

    http://www.oracle.com/technology//index.htmlhttp://rittman.nethttp://www.intelligententerprise.com/http://asktom.oracle.comhttp://www.dbazine.com/datawarehousehttp://dbasupport.com/

  • BGOUG - 2005Recommended Readings Books

    http://www.amazon.com/exec/obidos/tg/detail/-/0471200247/qid=1129731960/sr=2-1/ref=pd_bbs_b_2_1/104-9260048-7507956?v=glance&s=books

  • BGOUG - 2005Recommended Readings Books

    http://www.amazon.com/exec/obidos/tg/detail/-/0764567578/ref=pd_sim_b_1/104-9260048-7507956?%5Fencoding=UTF8&v=glance

  • BGOUG - 2005Recommended Readings Books

    http://www.amazon.com/exec/obidos/tg/detail/-/1555583350/qid=1129732390/sr=2-2/ref=pd_bbs_b_2_2/104-9260048-7507956?v=glance&s=books

  • BGOUG - 2005

    Questions or Comments

    Boyan Boev

    [email protected]

    : ETL , . , Staging , , Database Features. , ETL .ETL Elements - ETL Extract, transform and load , , , . ETT extraction, transformation and transportationETL Purpose - ETL , , , , , , warehouse.ETL as a Part of Business Intelligence Solution - Data Warehouse , BI Solution , , , Business Intelligence . ETL , - Business Intelligence .

    Extract staging . , , ETL . Oracle Databases, -Oracle , .. :- - , - - , ( , , , , ), Transform - , DW. : - - - - - - , , , , . , , , . DW . , , , .LOAD - data warehouse , , , - , .

    , Oracle , ETL , 3 : staging PL/SQL feature utilities Oracle Database

    , Oracle database, ETL . staging , PL/SQL Features Utilities. , staging . . staging , , , - NOARCHIVELOG MODE. / CPU. , / NOLOGGIND DML Statements, .. , REDO . NOLOGGING NOLOGGING - staging , Oracle , . , staging , DW , . , , , , ARCHIVE LOG . - . REDO . , a NOLOGGING NOLOGGING . , - staging , , (overhead) INSERT, UPDATE DELETE.

    , , , ETL , PL/SQL .

    PL/SQL :

    PL/SQL SQL Statements BINDING. SQL IN OUT. BIND SQL . BIND , , .

    BULK Cursor For Loop. , , INSERT . , Oracle PL/SQL . Oracle Database 2 engine- PL/SQL . PL/SQL Engine, SQL Statement SQL Engine. , PL/SQL SQL engines. , , . Engine-.explanation for context switching and perfomanceexplain performance benefitscontext switching schema SQL PL/SQL Engines. BIND , .. SQL Engine-a . , Oracle 8i 2 : BULK COLLECT FORALL, BULK BIND , - .Example: DECLARE TYPE sales_t IS TABLE OF f_sales_detail.sales_id%TYPE INDEX BY BINARY_INTEGER; sales_idssales_t; v_customer_idNUMBER := 1234; max_rowsCONSTANT NUMBER := 10000; CURSOR sales(p_customer_id NUMBER) IS SELECT sales_id FROM f_sales_detail WHERE customer_id = p_customer_id; BEGIN OPEN sales(v_customer_id); LOOP EXIT WHEN sales%NOTFOUND; FETCH sales BULK COLLECT INTO sales_ids LIMIT max_rows;

    FORALL i IN 1..sales_ids.COUNT INSERT INTO sales_hist (customer_id, detail_id, process_date) VALUES (v_customer_id, sales_ids(i), sysdate); END LOOP; CLOSE sales; END; /

    pipelined stage , nested , SQL. SELECT * FROM plsql_function; PIPELINED TABLE FUNCTION. ETL . , . . ( ETL ) . PIPELINED TABLE FUNCTIONS.

    pipelined stage , nested , SQL. SELECT * FROM plsql_function; PIPELINED TABLE FUNCTION. ETL . , . . ( ETL ) . PIPELINED TABLE FUNCTIONS.

    , ETL . 20 - , .

    CREATE OR REPLACE PACKAGE BODY direct_mkt AS TYPE zip_array IS TABLE OF VARCHAR2(30) INDEX BY BINARY_INTEGER; zip_deref zip_array; PROCEDURE do_insert(f_name VARCHAR2, l_name VARCHAR2, zip NUMBER) IS BEGIN INSERT INTO user_data (f_nm, l_nm, city_nm) VALUES (f_name, l_name, zip_deref(zip)); COMMIT; END; -- Package initialization section. BEGIN FOR rec IN (SELECT zip_code, city FROM dma) LOOP zip_deref(rec.zip_code) := rec.city; END LOOP; END;

    Caching Example:CREATE OR REPLACE PACKAGE BODY email_push AS TYPE email_array IS TABLE OF VARCHAR2(30) INDEX BY VARCHAR2(30); TYPE usermail_array IS TABLE OF email_array INDEX BY BINARY_INTEGER; user_emails usermail_array; FUNCTION lookup(p_userid NUMBER, p_email VARCHAR2) RETURN VARCHAR2 IS BEGIN RETURN user_emails(p_userid)(p_email); END;BEGIN FOR rec IN (SELECT user_id, email, l_name FROM user_emails) LOOP user_emails(rec.user_id)(rec.email) := rec.l_name; END LOOP;END; NComp, PL/SQL . PL/SQL Techniques: PL/SQL ETL . BULK BIND , , PL/SQL. , PIPELENED TABLES , . , features utilities , Oracle Database, ETL, ETL .Otnosno INSERT /*+ APPEND*/

    Be sure you have disabled all reference constraints before executingthe direct load insert. If you do not, the append hint will be ignored, nowarnings will be issued, and a conventional insert will be used. Plus, theinsert will take a long time if there is a lot of data. Conventional path isused when using the INSERT...with the VALUES clause even if you use theAPPEND hint.

    APPEND: SELECT COMMIT , INSERT-a DIRECT PATH , COMMIT SELECT , .

    , , , ETL process. , . PIPELINED TABLE , , 1 , DIRECT PATH INSERT, REDO , NESTED TABLEs. , Oracle Database , , utilities features, ETL . ETL. , PL/SQL DW.This book should satisfy those who want a different perspective than the official Oracle documentation. It will cover all important aspects of a data warehouse while giving the necessary examples to make the reading a lively experience. - Tim Donar, Author and Systems Architect for Enterprise Data Warehouses Tuning a data warehouse database focuses on large transactions, mostly requiring what is known as throughput. Throughput is the passing of large amounts of information through a server, network and Internet environment, backwards and forwards, constantly! The ultimate objective of a data warehouse is the production of meaningful and useful reporting, from historical and archived data. The trick is to make the reports print within an acceptable time frame. A data model contains tables and relationships between tables. Tuning a data model involves Normalization and Denormalization. Different approaches are required depending on the application, such as OLTP or a Data Warehouse. Inappropriate database design can make SQL code impossible to tune. Poor data modeling can have a most profound effect on database performance since all SQL code is constructed from the data model. * Takes users beyond basics to critical issues in running most efficient data warehouse applications * Illustrates how to keep data going in and out in the most productive way possible * Focus is placed on Data Warehouse performance tuning