ooziehugoct2013-131112012637-phpapp01

Upload: jean-tan

Post on 03-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    1/39

    Ooz ie Now and Beyond

    ! PRESENTED BY Mona Chitnis!Hadoop User Group, Yahoo Sunnyvale, October 16, 2013

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    2/39

    Team In Action

    2 Yahoo Confidential & Proprietary

    !Alejandro Abdelnur! Mohammad Islam! Rohini Palaniswamy! Robert Kanter! Virag Kothari! Mona Chitnis! Ryota Egashira! Michelle Chiang! Bowen Zhang

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    3/39

    OVERVIEW

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    4/39

    4 Yahoo Confidential & Proprietary

    Why Oozie?

    The Problem The Need

    ! Doing something on the grid oftenrequired multiple steps

    ! MapReduce job! Pig job! Streaming job! HDFS operation (mkdir, chmod, etc)

    ! Workflow scheduler with better support forgrid jobs (native integration with Hadoop)

    ! orchestrate dependency between jobs! execute at specific time or on data

    availability

    ! retry jobs in the event of failures(reliable)

    ! Multiple ad-hoc solutions existed! custom job control! shell scripts! cron

    ! Common framework for communicationand execution of production process

    ! sync (clocked dataset) awareness! async (unspecified freq) data

    awareness

    ! Cost of building and running apps werehigh

    ! development and applicationsengineering

    ! support, operations, and hardware

    ! Horizontally scalable and extensiblesystem

    ! Open-source! Workflows to couple resources instead

    of having a monolithic code base

    A server-based workflowscheduling system to

    manage Hadoop jobs

    Overview

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    5/39

    5 Yahoo Confidential & Proprietary

    Oozie A Workflow Engine

    ! Oozie executes workflow defined as DAG of jobs! The job type includes MapReduce, Pig, Hive, shell script, custom Java code

    etc.

    ! Introduced in Oozie 1.x

    startM/Rjob

    M/Rjob

    decision

    fork

    Pigjob

    M/Rjob

    join

    end JavaFS

    job

    ENOUGH

    MORE

    Control-flow nodes(start, kill, end | fork, join, decision)

    Action nodes(map reduce, pig, hive, distcp, java, fs, sub-workflow, shell, ssh, email)

    kill

    OK

    ERROR

    Overview

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    6/39

    Example M/R Action

    JT and NN

    Mapper

    Reducer

    Queue Name

    Input Directory

    Output Directory

    6 Yahoo Confidential & Proprietary

    Overview

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    7/39

    7 Yahoo Confidential & Proprietary

    Workflow State Transitions

    Source: Chicago HUG, Dec 2012

    Overview

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    8/39

    8 Yahoo Confidential & Proprietary

    Oozie (Coordinator) A Scheduler

    ! Oozie executes workflow based on! time dependency (frequency)! data dependency

    ! Introduced in 2.x

    HDFS/ HCat

    Oozie Server

    OozieClient

    OozieWorkflow

    WS API Oozie

    Coordinator

    CheckData Availability

    Overview

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    9/39

    9 Yahoo Confidential & Proprietary

    Oozie (Bundle) A Pipeline Framework

    ! Users can define and execute a bundle of coordinator apps! large scale data processing (inter-related coordinators)! operability and manageability of pipelines

    ! User can start/stop/suspend/resume/rerun in the bundle level! Introduced in 3.x, bundles are optional

    HDFS/ HCat

    Oozie Server

    OozieClient

    OozieWorkflow

    WS API

    OozieCoordinator

    CheckData Availability

    Bundle

    Overview

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    10/39

    10 Yahoo Confidential & Proprietary

    Layers of Abstraction in Oozie

    !""#$

    &'()"*

    !""#$

    &'()"*

    !""#$

    &'()"*

    !""#$

    &'()"*

    +, -". +, -". +, -".

    /01

    -".

    234

    -".

    /01

    -".

    234

    -".

    !"#$%& 1. Bundle

    !""#$ -". !""#$ -".

    2. Coordinator

    +, -".

    3. Workflow

    Overview

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    11/39

    11 Yahoo Confidential & Proprietary

    Architectural Overview

    Oozie (Java Web-App)

    Security

    WS CallbackWS API

    DAG Engine

    Oracle DB

    Commands

    Command

    Queue start rerunsubmitCommand

    ExecutorThread Pool

    RecoveryDaemon Thread

    Action Executors

    M/R fsPig

    pluggable, to

    support additional

    action types

    Instrumentation

    WFstore

    WFlib

    sub-wf

    executed

    Asynchronously

    via Command Queue

    resume killsuspend

    info

    start

    action

    end

    action

    check

    action

    callback

    signal

    job

    notification

    Web Services (JSON/REST API)

    Overview

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    12/39

    12 Yahoo Confidential & Proprietary

    Oozie Security, Multi-tenancy and Scalability

    Oozie

    Server

    Hadoop Cluster

    YARN

    RM

    LauncherMapper

    ActualM/R Job

    1Auth.

    End User(Kerberos, Y! specific)

    2Create

    Launcher Job(super-user)

    3ExecuteUser Job(doAs)

    5Async Callback

    4Response

    Overview

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    13/39

    USE CASES

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    14/39

    14 Yahoo Confidential & Proprietary

    Use Case 1: Time Triggers

    Execute your workflow every 15 minutes

    00:15 00:30 00:45 01:00

    Use Cases and Common Patterns

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    15/39

    15 Yahoo Confidential & Proprietary

    Use Case 2: Time and Data Triggers

    Materialize your workflow every hour, but only run them when the inputdata is ready (that is loaded to the grid every hour)

    01:00 02:00 03:00 04:00

    Hadoop

    Input DataExists?

    Use Cases and Common Patterns

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    16/39

    16 Yahoo Confidential & Proprietary

    Use Case 2: Time and Data Triggers

    hdfs://bar:9000/app/logs/${YEAR}/${MONTH}/${DAY}/${HOUR}

    ${current(0)}

    hdfs://bar:9000/usr/abc/logsprocessor-wf

    inputData${dataIn(inputLogs)}

    Use Cases and Common Patterns

    Dataset Definition

    Input Events Definitionwith time of coordinator action materialized (created)

    Action Definition

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    17/39

    17 Yahoo Confidential & Proprietary

    Use Case 3: Rolling Window

    00:15 00:30 00:45 01:00

    01:00

    01:15 01:30 01:45 02:00

    02:00

    Access 15 minute datasets and roll them up into hourly datasets

    Use Cases and Common Patterns

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    18/39

    18 Yahoo Confidential & Proprietary

    Use Case 4: Sliding Window

    Access last 24 hours of data, and roll them up every hour

    01:00 02:00 03:00 24:00

    24:00

    02:00 03:00 04:00+1 day

    01:00

    +1 day

    01:00

    03:00 04:00 05:00+1 day

    02:00

    +1 day

    02:00

    Use Cases and Common Patterns

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    19/39

    ! 17 clusters! 13,000 jobs/server day

    ! 2.8 M jobs/month! 16% of all Hadoop jobs

    ! 75 products! 2,000+ projects

    ! 255 monthly users! 5.4 M compute hrs/month

    ! 770,000 workflows! Between 1-8 actions! Avg. 4 actions/workflow

    ! 250 coordinator jobs/day! 67% of Oozie jobs kicked

    thru coordinator

    Proven Scale and Multi-tenancy

    19 Yahoo Confidential & Proprietary

    Where are We Today

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    20/39

    20 Yahoo Confidential & Proprietary

    Mix Of Job Types For Workflows

    39%

    29%

    28%

    4%

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    Jobs

    Pig MapReduce Java Other

    SAMPLE USE OF JOB TYPES

    Pig ! Data processing/ filtering! Aggregation

    MapReduce! Publishing data (HDFS/HCat)

    Java ! Legacy code and logicOthers ! Distcp and shell

    ! Data copy/ transfer

    Where are We Today

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    21/39

    FEATURE DEEP-DIVE

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    22/39

    22 Yahoo Confidential & Proprietary

    Existing Features (Oozie 3.x)

    ! HBase access through Oozie, via credentials! HCatalog access through Oozie, via credentials! Email action! DistCp action (intra as well as inter-cluster copy)! Shell action (run any script e.g. perl, python, hadoop CLI)! Workflow dry-run & Fork-Join validation! Bulk monitoring (REST API)! Coordinator EL functions for parameterized workflows!

    Job DAG

    Whats New in Oozie

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    23/39

    HBase Credentials

    23 Yahoo Confidential & Proprietary

    ! Add in workflow.xml! Add a section of "credentials". The type is "hbase.! Specify the java action to use the credentials.! Put hbase-site.xml in oozie application path. And use in workflow.xml to put hbase-site.xml in the distributed cache. A copy of the

    hbase-site.xml can be found in gateway:/home/gs/conf/hbase/hbase-site.xml.

    ! Put jars "guava-*.jar, zookeeper-*.jar, hbase-*.jar, protobuf-java-*.jar in workflow lib dir

    ! Make sure you are using Oozie XSD version 0.3 and above for the tag."#$%&'($#)*++ ,*-./0'$$)#'0 1-(,2/03%45$$64.5#$%&'($#57890:

    ";%.?*2.)24=.81-(O>?*2.)24=.81-("A'4(.:

    "AP*H*:

    ! Refer to http://twiki.corp.yahoo.com/view/CCDI/UseHbaseCred

    Whats New in Oozie

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    24/39

    Oozie 4.0

    24 Yahoo Confidential & Proprietary

    HCatalog Integration

    Job Notifications

    SLA Monitoring

    1

    2

    3

    Whats New in Oozie

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    25/39

    HCatalog Integration

    ! Oozie now supports HCatalog datasets, in addition to HDFS! Query HCat server directly -OR-! Receive partition created notifications

    ! With HDFS datasets, poll NameNode to check data availability! Delay! Single source

    Oozie NameNode

    /data/click/2013/03/10/data/click/2013/03/11/data/click/2013/03/12

    .

    HDFS

    data exists?

    data exists?

    .

    Whats New in Oozie

    25 Yahoo Confidential & Proprietary

    1

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    26/39

    ! HCat - metastore has info about HDFSdatasets, locations and file formats.

    ! Using HCat loader and storer, dataset can beconsumed uniformly using Pig, Hive and

    Map/Reduce in Oozie, using the database,

    table, partition abstraction.

    ! Oozie notified on partition availability via JMSmessages, to trigger workflows immediately

    ! Use JARs hcatalog-core.jar, webhcat-java-client.jar, hive-common.jar, hive-exec.jar,

    hive-metastore.jar, hive-serde.jar andlibfb303.jar in workflow lib

    ! Docs -http://oozie.apache.org/docs/4.0.0/DG_HCatalogIntegration.html

    ";$$%;*=);$$% L G.,.%*=. '$$B ?*%^

    2=$%. ] 4,=$ FRbcZXcZdML8RbcZXcZdZYL[\F cI`ef

    $%G8*+*;>.8>;*=*($G8+4G8K]*=I=$%.%TFRbcZXcZdXYNZ`Z`beFV^

    26 Yahoo Confidential & Proprietary

    Latest Oozie 4.0 FeaturesHCatalog Integration

    Whats New in Oozie

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    27/39

    With HCatalog + NotificationsHigh-level Diagram

    HCatalog

    Data Producer HDFS

    Update metadata(ALTER TABLE click ADD PARTITION(data=2013/03/12)location hdfs://data/click/2013/03/12)

    /data/click/2013/03/12

    Produce data (distcp, pig, M/R..)

    Whats New in Oozie

    27 Yahoo Confidential & Proprietary

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    28/39

    With HCatalog + NotificationsHigh-level Diagram

    Oozie

    Message Bus(e..g, ActiveMQ)

    HCatalog

    2. Register Topic

    Data Producer HDFS

    1. Query/Poll Partition

    Whats New in Oozie

    28 Yahoo Confidential & Proprietary

    Wh t N i O i

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    29/39

    With HCatalog + NotificationsHigh-level Diagram

    Oozie

    Message Bus(e..g, ActiveMQ)

    HCatalog

    3. Push notification

    2. Register Topic

    4. Notify New Partition

    Data Producer HDFSProduce data (distcp, pig, M/R..)

    /data/click/2013/03/12

    1. Query/Poll Partition

    Start workflow

    Update metadata(ALTER TABLE click ADD PARTITION(data=2013/03/12)location hdfs://data/click/2013/03/12)

    Whats New in Oozie

    29 Yahoo Confidential & Proprietary

    Wh t N i O i

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    30/39

    ! Notification event sent on jobs status change! Messages sent on the configured JMS-

    compliant message broker

    ! Users should write message listeners to listenon select topics (e.g. username)

    ! To filter more, apply JMS selectors onmessages.

    ! E.g. user, jobid, app-type, status, msg-type (JOBor SLA).

    ! Docs -http://oozie.apache.org/docs/4.0.0/

    DG_JMSNotifications.html

    Filter desired app-types for notification:"+%$+.%=@:

    ",*-.:((L>*M/*$*M2N*?&9%?).*.&*

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    31/39

    ! Oozie can actively track SLAs on Jobs! Start-time, End-time, Duration

    ! Event Status! START_MET, START_MISS! END_MET, END_MISS! DURATION_MET, DURATION_MISS

    ! At any time, the SLA processing stage will reflect:! Not_Started

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    32/39

    SLA Monitoring Dashboard

    32 Yahoo Confidential & Proprietary

    Whats New in Oozie

    Demo

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    33/39

    Checking Oozie Job

    33 Yahoo Confidential & Proprietary

    1. CLI (yoozie_client)

    $ oozie job -oozie http://localhost:11000/oozie -info 14-20090525161321-oozie-joe----------------------------------------------------------------------------------------------------------------

    Workflow Name : map-reduce-wf

    App Path : hdfs://localhost:8020/user/joe/workflows/map-reduce

    Status : SUCCEEDED

    Run : 0

    User : joeGroup : users

    Created : 2009-05-26 05:01

    Started : 2009-05-26 05:01

    Ended : 2009-05-26 05:01

    Actions

    ---------------------------------------------------------------------------------------------------------------------

    Action Name Type Status Transition External Id External Status Error Code Start End------------------------------------------------------------------------------------------------------------------------------------------------------hadoop1 map-reduce OK end job_200904281535_0254 SUCCEEDED - 2009-05-26 05:01 2009-05-26 05:01------------------------------------------------------------------------------------------------------------------------------------------------------

    Demo

    Demo

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    34/39

    Checking / Debugging Oozie Jobs

    34 Yahoo Confidential & Proprietary

    2. Web-Console

    e.g. http://my-oozie-server:4080/oozie

    Docs - https://cwiki.apache.org/confluence/display/OOZIE/Map+Reduce+Cookbook

    Demo

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    35/39

    What else is out there?

    Oozie at ASF

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    36/39

    36 Yahoo Confidential & Proprietary

    Oozie vs. Other Workflow Systems

    Champion Yahoo! (now ASF) LinkedIn Spotify

    ApacheAffiliation

    TLP License only License only

    Language Java Java Python

    AdoptionHigh, part of all standard Hadoopdistributions

    Low Low

    CodeComplexity

    High (>100K lines) Medium (< 50K lines) Low (

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    37/39

    37 Yahoo Confidential & Proprietary

    The Next Release

    !Scalability and performance improvements to handle higher loads

    ! More 1 and 5 min frequency jobs! High Availability with Load Balancing! Flexible Cron-Based Scheduling! Handling cluster Rolling upgrades for Hadoop 2.0

    Roadmap

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    38/39

    Q & A

  • 8/12/2019 ooziehugoct2013-131112012637-phpapp01

    39/39