technical presentation on hadoop
TRANSCRIPT
ABID MERCHANT ZAID KHAN
Technical Presentation
on
What is ?
“Hadoop” is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
Other Notable users
New York Times
Baidu
eHarmony
Rackspace
in the real world.
Telecommunications
Data Warehousing
Market Research Forecasting
Social Networking
Natural Language Processing (NLP)
Image Video Processing
Academic Research
Financial Analysis
…
‘s History Inspired by Big Table and MapReduce papers circa. 2004.
Created By Doug Cutting.
Originally built to support distribution for Nutch Search Engine.
Named after a stuff elephant.
What is NOT ?
It isn’t a relational database... an online transaction processing
system... a structured data store of any kind!
Components of :
Hadoop Libraries HDFS
YARN MapReduce
Why is important ?
Challenges of using :
There’s a widely acknowledged talent gap. (it can be difficult for entry level programmers who don’t have sufficient skills to be productive with MapReduce)
Data Security.
Full fledged data management and governance.
References: http://www.sas.com/en_us/insights/big-
data/hadoop.html
http://searchcloudcomputing.techtarget.com/definition/Hadoop
http://wiki.apache.org/hadoop/