technical presentation on hadoop

Post on 11-Apr-2017

206 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ABID MERCHANT ZAID KHAN

Technical Presentation

on

What is ?

“Hadoop” is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

Other Notable users

New York Times

Baidu

eHarmony

Rackspace

in the real world.

Telecommunications

Data Warehousing

Market Research Forecasting

Social Networking

Natural Language Processing (NLP)

Image Video Processing

Academic Research

Financial Analysis

‘s History Inspired by Big Table and MapReduce papers circa. 2004.

Created By Doug Cutting.

Originally built to support distribution for Nutch Search Engine.

Named after a stuff elephant.

What is NOT ?

It isn’t a relational database... an online transaction processing

system... a structured data store of any kind!

Components of :

Hadoop Libraries HDFS

YARN MapReduce

Why is important ?

Challenges of using :

There’s a widely acknowledged talent gap. (it can be difficult for entry level programmers who don’t have sufficient skills to be productive with MapReduce)

Data Security.

Full fledged data management and governance.

References: http://www.sas.com/en_us/insights/big-

data/hadoop.html

http://searchcloudcomputing.techtarget.com/definition/Hadoop

http://wiki.apache.org/hadoop/

top related