million monkeys user group

18
1 Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 Million Monkeys Jesse Anderson | Curriculum Developer and Instructor November 2012

Upload: jesse-anderson

Post on 06-Dec-2014

361 views

Category:

Documents


3 download

DESCRIPTION

Million Monkeys presentation given to Silicon Mountain Technology Group on 11-12-2012.

TRANSCRIPT

Page 1: Million Monkeys User Group

1

Headline Goes HereSpeaker Name or Subhead Goes Here

DO NOT USE PUBLICLY PRIOR TO 10/23/12Million Monkeys

Jesse Anderson | Curriculum Developer and InstructorNovember 2012

Page 2: Million Monkeys User Group

2

About Me

• Cloudera - Educational Services Team• Twitter - @jessetanderson• Blog and more info: http://www.jesse-anderson.com• Screencasts on Pragmatic Programmers: Buy It Now on

http://www.jesse-anderson.com• President – Northern Nevada Software Developers Group

Page 3: Million Monkeys User Group

3

About Cloudera

• Cloudera is “The commercial Hadoop company”• Founded by leading experts on Hadoop from Facebook, Google,

Oracle and Yahoo• Provides consulting and training services for Hadoop users• Staff includes committers to virtually all Hadoop projects

Page 4: Million Monkeys User Group

4

Introduction

• Infinite Monkey Theorem• Hadoop• Million Monkeys Algorithm• Business Case

Page 5: Million Monkeys User Group

Infinite Monkey Theorem

5

“A million monkeys on a million typewriters will eventually recreate Shakespeare

Page 6: Million Monkeys User Group

6

Exponential Growth (aka Big Data)

Odds of finding a group of characters is 1 in 26 raised to the power of

the number of contiguous characters

1 in 26n

Contiguous Characters Combinations

8 208,827,064,576

9 5,429,503,678,976

10 141,167,095,653,376

Page 7: Million Monkeys User Group

7

Hadoop

• Apache Project• Reliable, Scalable, Distributed Computing• Software Framework• MapReduce• Distributed File System (HDFS)• Other projects

Page 8: Million Monkeys User Group

8

MapCreate or process the input data

Page 9: Million Monkeys User Group

9

ReduceProcess data from Map into something usable

Page 10: Million Monkeys User Group

10

Data Flow

Page 11: Million Monkeys User Group

11

Million Monkeys Algorithm

Page 12: Million Monkeys User Group

12

Business Case

Page 13: Million Monkeys User Group

13

Hadoop Scalability

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200

20

40

60

80

100Percent of Linear Scalability

RDBMSHadoop

Perc

ent

RDBMS = Relational DatabaseNodes

Page 14: Million Monkeys User Group

14

Scaling does not require massive re-engineering

and complete rewrites of code

Business Value of Scalability

Adding more computers to cluster gets a

predictable increase in computational power and

storage

$$$SAVETIMESAVE

Page 15: Million Monkeys User Group

15

Going Viral (and taking over the world)

26,000 unique visits from 119 countries in one day

Covered internationally in BBC, Wall Street Journal, Wired and Slashdot

Page 16: Million Monkeys User Group

16

Next Steps

• Books• Hadoop: The Definitive Guide - Tom White• Hadoop Operations - Eric Sammer

• Cloudera Training• Developer, Admin, Hive and Pig, HBase, Essentials

• CDH• Cloudera's Apache Distribution Including Hadoop• Open Source• VM Image

Page 17: Million Monkeys User Group

17

Conclusion

• MapReduce breaks up problem efficiently• No code changes to scale• Incredible scalability• Enables previously impossible tasks

Page 18: Million Monkeys User Group

18