the hadoop guarantee: keeping analytics running on time
TRANSCRIPT
Twitter Tag: #briefr The Briefing Room
Reveal the essential characteristics of enterprise software, good and bad
Provide a forum for detailed analysis of today’s innovative technologies
Give vendors a chance to explain their product to savvy analysts
Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr The Briefing Room
Topics
September: HADOOP 2.0
October: DATA MANAGEMENT
November: ANALYTICS
Twitter Tag: #briefr The Briefing Room
The Holy Grail of Hadoop
Ø Mixed Workloads!
Ø Deep visibility into the cluster
Ø Ability to define & meet SLAs
Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is Chief Analyst at The Bloor Group
[email protected] @robinbloor
Twitter Tag: #briefr The Briefing Room
Pepperdata
Pepperdata offers a platform for managing and optimizing Hadoop clusters
The platform monitors and balances resources across multiple workloads and/or clusters in real time
Pepperdata provides an interactive dashboard with real-time visualizations and reports on hardware usage
Twitter Tag: #briefr The Briefing Room
Guest: Sean Suchter
Sean Suchter, Cofounder, CEO of Pepperdata Sean was the founding GM of Microsoft’s Silicon Valley Search Technology Center, where he led the integration of Facebook and Twitter content into Bing search. Prior to Microsoft, Sean managed the Yahoo Search Technology team, the first production user of Hadoop. Sean joined Yahoo through the acquisition of Inktomi, and holds a B.S. in Engineering and Applied Science from Caltech.
©2015 Pepperdata
Sean Suchter, CEO & Cofounder
September 15, 2015
Pepperdata:Bringing Predictability & Reliability to Hadoop
©2015 Pepperdata
Market Reality
• Unreliability of Hadoop • Growing skills gap• Multitude of vendors & tools in ecosystem
Unpredictable jobs Bottlenecks, missed SLAs
Poor visibility Lengthy troubleshooting, “flying blind”
Inefficient cluster allocation Overbuilding, costs
Many organizations state that big data is high priority for them, but many will fail to see a competitive advantage due to issues such as:
©2015 Pepperdata
Mature deployments have increasing requirements
• Multi-tenancy (multiple workloads, multiple tenants)
• Internal deployments of Hadoop-as-a-Service
• Guaranteed SLAs
Organizations today demand
©2015 Pepperdata
Node-level metrics
YARN
Node-level metrics
Pepperdata
Real-time metrics by queue, user, job, task
Allocate resources dynamically (maximize utilization)
Control hardware usage (priority jobs complete on time)
Schedule jobs; pre-allocate memory, CPU
Prevent rogue jobs from harming high-priority jobs
When jobs are scheduled
Once jobs are running
During & after job runtime
You need more than YARN
©2015 Pepperdata
No human can make the thousands of decisions a second necessary for dynamic, real-time hardware resource management.
Time and sweat won’t solve the problem
©2015 Pepperdata
Pepperdata lets enterprises rely on Hadoop
• Provide mission critical applications in multi-tenant environments
• Monitor and control hardware usage dynamically and in real time
• Enable SLAs, increase throughput, and improve visibility
Companies can now:
The Biological Analog
u Our human control system works at different speeds: • Internal systems – Enteric nervous system • Instant external reflex – Spinal cord • Fast external response – Motor systems • Considered response – The brain
u Swift external response is predictive analytics & triggers
u Considered response is analytics
Hadoop Evolution
HDFS & MapReduce
HDFS YARNSpark
HDFS YARNMapReduce
Serial Single Batch
Serial Multiple Batch
Serial Multiple Microbatch
The Spark Dynamic
u Spark has become the de facto vehicle for many distinct Hadoop projects: analytics and data integration
u It can do “microbtach streaming,” but it is not ideal for very low latency applications
u It has in-memory capability (=100x in memory, 10x on disk)
u Speed of development
u Spark SQL
So What’s Missing?
u Resource allocation
u Resource management by “job”
u Dynamic prioritization of workloads
u Real-time monitoring
u Service management: performance and throughput feedback and controls
u Capacity planning
Operational Control
Hadoop has the potential to be the “scale-out OS” for data as soon as
it can manage its resources
u How easy is Pepperdata to implement? What’s the process?
u What is (roughly) the most complex environment in respect to workloads where Pepperdata is deployed? Please describe.
u What is the Pepperdata proposition in respect to ROI?
u Are there any competing products?
u Which specific companies/products do you complement?
u Is there any Hadoop distribution that you prefer? If so, why?
Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
September: HADOOP 2.0
October: DATA MANAGEMENT
November: ANALYTICS