Tackling Big Data with Hadoop

Download Tackling Big Data with Hadoop

Post on 17-May-2015

1.809 views

Category:

Technology

7 download

DESCRIPTION

An introduction to Hadoop, present at Vermont Code Camp 2011.

TRANSCRIPT

  • 1. TACKLING BIG DATA WITH HADOOP David HowellSunday, September 11, 11

2. WHAT IS BIG DATA?Sunday, September 11, 11 3. WHAT IS BIG DATA? Google web crawlSunday, September 11, 11 4. WHAT IS BIG DATA? stream of Twitter messagesSunday, September 11, 11 5. WHAT IS BIG DATA? Annoying Farmville requests on FacebookSunday, September 11, 11 6. WHAT IS BIG DATA? terabyte-scale data setsawkward to work with using traditional toolsSunday, September 11, 11 7. WHAT IS BIG DATA?requires distributed computingSunday, September 11, 11 8. MEDIUM DATAdozens to hundreds of gigabytes still awkward to work with using traditional toolsSunday, September 11, 11 9. MAP-REDUCEhttp://labs.google.com/papers/mapreduce.htmlSunday, September 11, 11 10. Sunday, September 11, 11 11. Sunday, September 11, 11 12. COUNTING AT SCALESunday, September 11, 11 13. function map_1(t, search_phrase) emit(search_phrase, 1)sort and shufe function reduce_1(search_phrase, counts) total = 0 for count in counts total += count emit(search_phrase, total) function map_2(search_phrase, total) emit(total, search_phrase)sort and shufe function reduce_2(total, search_phrases) for search_phrase in search_phrases emit(search_phrase, total)Sunday, September 11, 11 14. mapshufe reducecat IN | sort | uniq -c > OUT mapshufe reduceawk {print $2,$1} OUT | sort > FINALSunday, September 11, 11 15. WHY BOTHER?Sunday, September 11, 11 16. HADOOPSunday, September 11, 11 17. DISTRIBUTED COMPUTINGPLATFORMSunday, September 11, 11 18. TOOLS IN THE PLATFORM Map-Reduce APIs Higher Level APIs Java Hive C++Cascading UNIX pipes PigSunday, September 11, 11 19. THE ORIGIN STORYSunday, September 11, 11 20. WHOS USING IT?Sunday, September 11, 11 21. HADOOP How does it work?Sunday, September 11, 11 22. Sunday, September 11, 11 23. Sunday, September 11, 11 24. Sunday, September 11, 11 25. Sunday, September 11, 11 26. DEMO!Sunday, September 11, 11 27. YOUR DATA PLATFORM ad hocunstructuredprototyping experiment data-drivencuriosityplaySunday, September 11, 11 28. LEARN MOREhttp://hadoop.apache.org/ http://www.cloudera.com/ Hadoop: The Denitive Guide @dehowelldave@poorlytrainedape.comhttp://github.com/dehowell/hadoop-crypto-demoSunday, September 11, 11