Hadoop Performance at LinkedIn

Download Hadoop Performance at LinkedIn

Post on 11-Nov-2014

9.411 views

Category:

Technology

4 download

Embed Size (px)

DESCRIPTION

This is part of a presentation I did at Intel a month or so ago. Some of the content has been removed due to NDA, etc.

TRANSCRIPT

<ul><li> 1. Grid OperationsHadoop Performance at LinkedInAllen WittenauerGrid Computing Architect2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 2. 2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 3. I have never seen a Hadoop cluster that was legitimately CPU bound. -- Milind Bhandarkar -- Milind Bhandarkar -- Milind Bhandarkar2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 4. X5650 - 6 Core @ 2.67 MHz2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 5. X5650 - 6 Core @ 2.67 MHz2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 6. I have only seen one Hadoop cluster that was legitimately CPU bound. -- Milind Bhandarkar -- Milind Bhandarkar -- Milind Bhandarkar2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 7. Why do we have such high CPU usage?2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 8. We do a lot of Graph Theory.2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 9. Ticket to Ride Ticket To Ride is a registered trademark of Days of Wonder 2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS </li> <li> 10. Social Graph2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS </li> <li> 11. 2nd Degree Connection2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS </li> <li> 12. We under-commit our memory.2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 13. Our Hadoop Software Needs... The Plan... Tasks 2 GB of RAM = 1 GB of JVM Heap, .5-1GB for non-heap (Typically) 1 Super Active Threads TaskTracker 1.5 GB of RAM = 1 GB of JVM Heap, .5GB for non-heap 1-4 Super Active Threads DataNode 1.5 GB of RAM = 1 GB of JVM Heap, .5GB for non-heap 1-4 Super Active Threads RAM: 3GB + (task count * 2GB) + OS needs Threads: 8 + (task count) + OS needs2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS </li> <li> 14. Our Hadoop Software Needs... The Reality Task Counts Westmere (5650): 6 Cores+HT = 12 Tasks Sandy Bridge (2640): 6 Cores+HT = 14 Tasks Most of our tasks leave at most .5 GB free = combined -&gt; very large buffer &amp; cache2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS </li> <li> 15. We dont have as many disks per node.2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 16. Typical Hadoop Node Out in the Wild Most users dont know their actual needs Vendor advice... play it safe! Significantly more memory For the future! Badly written code Significantly more disk Hadoop is IO intensive! Greater task locality! Greater performance...but is it worth the cost...2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS </li> <li> 17. What Happens With Fewer Disks? Physical footprint requirements are smaller Linux buffers &amp; caches are more efficient More per disk Fewer to manage Spindle count DOES matter... but the price/perf isnt there for our workflows. From a few years ago &amp; based on store.sun.com prices (so not real)... Nodes/Cores RAM/Bus Disks Time In Minutes HW Cost* 3/24 16/half 8 254.98 $37827 3/24 24/full 8 244.50 $38817 3/24 16/half 4 257.38 $21456 3/24 24/full 4 246.82 $22986 6/48 16/half 4 126.98 $429122012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS </li> <li> 18. LinkedIn Node Configuration No RAID controller More cost for negative perf when doing JBOD 6 Drives Still fits in 1U w/SATA drives ~same perf as 8 drives Less metal = cheaper cost2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS </li> <li> 19. Rack Level View If we assume we can use 40u in a rack then: More CPUs Just as many HDs More Network Potentially more RAM2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS </li> <li> 20. We care about file system tuning.2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 21. LinkedIn Hadoop Disk/File Systems noatime Enabled writeback Enabled Each Disk (except root) Partitions: Swap MapReduce Spill Space HDFS Delayed Commits Why write once when you can do ganged writes more efficiently?2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS </li> <li> 22. We care about job tuning.2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 23. LinkedIn Job Tuning Guidelines All jobs get reviewed prior to going to production. Task times should be between 5-15 minutes. Jobs should have less than 10,000 tasks. Jobs should be smart about # of files and the size of those files generated.2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS </li> <li> 24. ... and the result?2012 LinkedIn Corporation. All Rights Reserved. </li> <li> 25. Why is LinkedIn Running so Hot? We do a lot of non-MapReduce work. RAM buffers and caches allow us to offset a lot of disk IO. We audit our jobs. As a result, our CPUs are actually busy.2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS </li> <li> 26. 2012 LinkedIn Corporation. All Rights Reserved. BUSINESS OPERATIONS </li> </ul>