unraveling hadoop meltdown mysteries
DESCRIPTION
As powerful and flexible as Hadoop is, jobs still sometimes fail or thrash unpredictably. Pepperdata co-founder and CEO Sean Suchter, one of the first commercial users of Hadoop in the early days at Yahoo, will give real-world examples of Hadoop meltdowns complete with metrics and what we can learn from them. He'll also show how to automatically increase Hadoop cluster throughput through fine-grained job hardware usage visibility.TRANSCRIPT
Meltdown MysteriesSean Suchter
Disks are thrashing!
Solution
• Make job author aware of surprising behavior.
• Modify job code & settings to be nicer to disks.
Nodes are dying!
Initial diagnosis…• Nodes abruptly started swapping and
becoming non-responsive. (Required physical power cycling)
• Job submitters report “I didn’t change anything”
• Question: What’s doing this to the cluster?
Cause & solution• While the job didn’t change, its input data did.
• Stop that user’s jobs immediately.
• Better use of capacity scheduler virtual memory controls.
• Use Pepperdata protection to limit physical memory as well.
Take-away
• You see problems at the node level.
• You see the root causes at the task level.
Pepperdata meetup tomorrow!
• War Stories from the Hadoop Trenches
• Allen Wittenauer (Apache Hadoop committer and former LinkedIn)
• Eric Baldeschwieler (former Hortonworks CEO / CTO)
• Todd Nemet (Looker; former Altiscale, ClearStory Data, Cloudera)
• 6pm Wed 6/25
• Firehouse Brewery, 111 S Murphy, Sunnyvale
• http://www.meetup.com/pepperdata/