apache hadoop by shah

20
APACHE HADOOP SHAH HUSSAIN 1213313318

Upload: shah-hussain

Post on 17-Jul-2015

61 views

Category:

Technology


0 download

TRANSCRIPT

APACHE HADOOP

SHAH HUSSAIN

1213313318

DATA IS EVERYWHERE

DATA IS IMPORTANT

What is Hadoop?

Motivation of Hadoop

• How do you scale up applications?– Run jobs processing 100’s of terabytes of data

– Takes 11 days to read on 1 computer

• Need lots of cheap computers– Fixes speed problem (15 minutes on 1000

computers), but…

– Reliability problems• In large clusters, computers fail every day

• Cluster size is not fixed

• Need common infrastructure– Must be efficient and reliable

Motivation of Hadoop

• Open Source Apache Project

• Hadoop Core includes:

– Distributed File System - distributes data

– Map/Reduce - distributes application

• Written in Java

• Runs on

– Linux, Mac OS/X, Windows, and Solaris

– Commodity hardware

Fun Fact of Hadoop

"The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. Kids are good at generating such. Googol is a kid’s term."

---- Doug Cutting, Hadoop project creator

History of Hadoop

Apache Nutch

Doug Cutting

“Map-reduce”2004

“It is an important technique!”

Extended

The great journey begins…

Nowadays…

• When you visit yahoo, you are interacting with data processed with Hadoop!

Nowadays…• Yahoo! has ~20,000 machines running Hadoop

• The largest clusters are currently 2000 nodes

• Several petabytes of user data (compressed, unreplicated)

• Yahoo! runs hundreds of thousands of jobs every month

Applications…

• Who use Hadoop?

• Amazon

• AOL

• Facebook

• Fox interactive media

• Google

• IBM

• New York Times

• PowerSet (now Microsoft)

• Quantcast

• Rackspace/Mailtrust

• Veoh

• Yahoo!

References• http://hadoop.apache.org/

• http://en.wikipedia.org/wiki/Apache_Hadoop

• https://github.com/apache/hadoop

• http://www.cloudera.com/content/cloudera/en/about/hadoop-and-big-data.html

Questions?

THANK YOU!