oozie at yahoo! jun 3rd 2014
DESCRIPTION
by Ryota Egashira and Purshotam Shah (Yahoo)TRANSCRIPT
Oozie at Yahoo
Purshotam Shah, Ryota EgashiraOozie Meetup 06/03 2014
Table of Contents
Yahoo Confidential & Proprietary
▪Scale at Yahoo!
▪Scale and Performance
▪Features - Usability
▪High availability and Load Balancing
▪Customer Asks
Scale at Yahoo!
▪Busiest cluster
› 1 million+ workflows per month
› 45 - 55K workflows per day
› 40 - 50K coord actions per day
› 800 - 900 coordinators (5m, 15m, 30m, hourly, daily and weekly)
› 30 - 40 bundles
▪Most complex bundle - 230 coordinators
▪Most complex workflow - 85 forks
▪Video Transcoding - 100-300 workflows per min
Scale and Performance▪ Database
› CLOB to BLOB to compress and store inline
› Remove unnecessary hadoop config stored in protoActionConf
› Select only needed columns instead of loading whole row
› Partition tables by created time (in Oracle)
▪ Other
› Huge improvements to materialization of coordinator actions
› Reduce Launcher overhead
• Merge the number of small files created per action to one sequence file
• Launcher libraries shipped only once to HDFS
• Uber mode launcher with Hadoop 2.x
› Synchronously execute commands without queueing to speed up action transition
› Automatically killing abandoned coordinator job
› gzip compression for Rest API
Features - Usability
▪UI improvements
› Active Jobs, Custom Global Filters, Child Jobs for Pig/Hive actions
▪Faster log streaming with more filters
▪Updating coordinator definition on the fly
▪Rerun workflows without having to specify all properties again
▪Mark coordinator and actions as ignored
▪Sharelib Enhancements
› Update on the fly without failing jobs
› Command to list different sharelib available
› Specify directories using metafile instead of single share lib directory
High Availability and Load Balancing
▪HCat integration▪SLA▪Sharelib▪Server-server authentication▪Distributed sequence
Customer Asks
▪Coordinator dependency management
› Ability to view dependencies and rerun part of a pipeline
▪Better error handling and automatic retries
▪Ability to Suspend/Turn off SLA alerting
▪One-click launcher log viewing
▪Zero downtime
Rohini PalaniswamyMona ChitnisMichelle ChiangPurshotam ShahRyota EgashiraOlga L. Natkovich
Yahoo Oozie Team