a multi colored yarn
TRANSCRIPT
Vinod Kumar VavilapalliApache Hadoop PMC, Co-founder of YARN project Hortonworks Inc
A Multi-Colored YARN
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About.html
Apache Hadoop PMC, ASF Member 9 years of only Hadoop
– Finally the job-adverts asking for “10 years of Hadoop experience” have validity
’Rewritten’ the Hadoop processing side – Became Apache Hadoop YARN
With me today– Billie Rinaldi: VP Apache Accumulo, Apache Slider PMC, ASF Member– Jayush Luniya: Apache Ambari PMC– Vadim Vaks: Kickass field guy (Sr. Solutions Architect)
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform Today
Layers that enable applications and higher order frameworks
It’s all about data!
Still a single colored yarn
Apache Hadoop YARN pretty good at jobs, queries, short running apps
– We will continue doing this
Admins and admin tools (Ambari) takes care of statically provisioned services
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform Today
Platform Services
StorageResource
Management SecurityManagement
Monitoring
AlertsGovernance
MR Tez Spark …
Run everything in a single secure, multi-tenant, elastic Hadoop YARN cluster– An ongoing journey
Adding new ‘stuff’ to this stack is an involved effort
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Evolution of user focus
A need for reuse, composition and to keep building ‘upwards’ Applications & services & more complex combinations - Assembly
IOT ApplicationsApache Metron
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IOT ApplicationsApache Metron
• Simplified deployment of an assembly– Ready to go packages– Discovery– Resource/capacity planning
• Management / monitoring / metrics of assemblies!– “Start / stop” my business app end-to-end– “Tell me what’s happening with my business application”– “I don’t care whether HBase RegionServer is down or not, is my assembly healthy?”
• Scale up/down the entire app!– “I got more input coming in, I don’t care how you scale individual pieces, but do scale the entire machinery”
Emerging needs of the platform
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why on YARN?
Manual plumbing is very tiresome, not repeatable Assemblies - similar to apps & services, but N x harder (because there are N services to
grapple with) Why not static allocations?
– Machines die– Jobs (MapReduce, Spark) are tolerant of faults, but static services aren’t!– Upfront capacity planning– Cannot react to hardware or utilization changes without manual intervention– Elasticity is a manual operation
This is fundamentally the same resource-management problem that YARN is built to address!
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why on YARN? Contd..
The Apache Hadoop ecosystem knows Data services the best – YARN is data-first! Big Data use-cases don’t stop at Hadoop services and apps
– Hive for all data, summary in traditional on-demand DB for driving analysts– Extracting results from HDP and hosting report servers, interactive Uis like Apache Zeppelin
Users don’t care about this separation– Big Data is already a huge cluster on one side– Asking for another infrastructure & needing separate management of this other stuff is
burdensome– Unified solution >> Silos
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform Next
A colorful, multi-threaded yarn For use-cases of various colors
Today’s applications better Simplified long running applications Bring your app easily
https://www.flickr.com/photos/happyskrappy/15699919424
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Packaging
Containers– Lightweight mechanism for packaging and resource isolation– Popularized and made accessible by Docker– Can replace VMs in some cases– Or more accurately, VMs got used in places where they didn’t
need to be Native integration ++ in YARN
– Support for “Container Runtimes” in LCE: YARN-3611– Process runtime– Docker runtime
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
APIs
Applications need simple APIs Need to be deployable “easily”
Simple REST API layer fronting YARN– https://issues.apache.org/jira/browse/YARN-4793– [Umbrella] Simplified API layer for services and beyond
Spawn services & Manage them
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Platform++
YARN itself is evolving to support services and complex apps– https://issues.apache.org/jira/browse/YARN-4692– [Umbrella] Simplified and first-class support for services in YARN
Scheduling– Application priorities: YARN-1963– Affinity / anti-affinity: YARN-1042– Services as first-class citizens: Preemption, reservations etc
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Platform++ Contd
Application & Services upgrades– ”Do an upgrade of my Spark / HBase apps with minimal impact to end-users”– YARN-4726
Simplified discovery of services via DNS mechanisms: YARN-4757 YARN Federation – to infinity and beyond: YARN-2915 Easier container sizing models: Resource profiles: YARN-3926
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Framework++
Platform is only as good as the tools
A native YARN framework– https://issues.apache.org/jira/browse/YARN-4692– [Umbrella] Native YARN framework layer for services and
beyond
Slider supporting a DAG of apps:– https://issues.apache.org/jira/browse/SLIDER-875
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User facing and operational experience
Modern YARN web UI - YARN-3368 Enhanced shell interfaces
Metrics: Timeline Service V2 – YARN-2928 Application & Services monitoring, integration with other systems
First class support for YARN hosted services in Ambari– https://issues.apache.org/jira/browse/AMBARI-17353
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use-cases.. Assemble!
Platform Services
StorageResource
Management SecurityServiceDiscovery Management
Monitoring
Alerts
Holiday Assembly
HBase
WebServer
IOT Assembly
Kafka Storm HBase Solr
Governance
MR Tez Spark …