hadoop technology

Hadoop is a framework for running applications on large clusters built of

commodity hardware. The Hadoop framework transparently provides applications

both reliability and data motion. Hadoop implements a computational paradigm

named Map/Reduce, where the application is divided into many small fragments

of work, each of which may be executed or reexecuted on any node in the cluster.

In addition, it provides a distributed file system (HDFS) that stores data on the

compute nodes, providing very high aggregate bandwidth across the cluster. Both

Map/Reduce and the distributed file system are designed so that node failures are

automatically handled by the framework.

ABSTRACT

Problem Statement:

The amount total digital data in the world has exploded in recent years.

This has happened primarily due to information (or data) generated by various

enterprises all over the globe. In 2006, the universal data was estimated to be 0.18

zettabytes in 2006, and is forecasting a tenfold growth by 2011 to 1.8 zettabytes.

The problem is that while the storage capacities of hard drives have

increased massively over the years, access speeds—the rate at which data can be

read from drives have not kept up. One typical drive from 1990 could store 1370

MB of data and had a transfer speed of 4.4 MB/s, so we could read all the data

from a full drive in around 300 seconds. In 2010, 1 Tb drives are the standard

hard disk size, but the transfer speed is around 100 MB/s, so it takes more than

two and a half hours to read all the data off the disk.3

Solution Proposed:

Parallelisation:

A very obvious solution to solving this problem is parallelisation. The

input data is usually large and the computations have to be distributed across

hundreds or thousands of machines in order to finish in a reasonable amount of

time. Reading 1 Tb from a single hard drive may take a long time, but on

parallelizing this over 100 different machines can solve the problem in 2 minutes.

Apache Hadoop is a framework for running applications on large cluster built of

commodity hardware. The Hadoop framework transparently provides applications

both reliability and data motion.

It solves the problem of Hardware Failure through replication.

Redundant copies of the data are kept by the system so that in the event of failure,

there is another copy available. (Hadoop Distributed File System)

The second problem is solved by a simple programming model-

MapReduce. This programming paradigm abstracts the problem from data

read/write to computation over a series of keys. Even though HDFS and

MapReduce are the most significant features of Hadoop. 4

Who Uses Hadoop?

Data Everywhere

Examples of Hadoop in action

•In the Telecommunication industry

•In the Media

•In the Technology Industry

Hadoop Architecture

•Two Main Components :

-Distributed File System

-MapReduce Engine

Hadoop (MapReduce) is one of the very powerful frameworks that enable easy

development on data-intensive application. It objective is help building a

supplication with high scalability with thousands of machines. We can see

Hadoop is very suitable to data-intensive background application and perfect fit to

our project‟s requirements. Apart from running application in parallel, Hadoop

provides some job monitoring features similar to Azure. If any machine crash,

the data could be recovered by other machines, and it will take up the jobs

automatically. When we put Hadoop into cloud, we also see the convenience in

setting up Hadoop. With a few command lines, we can allocate any number of

clusters to run Hadoop, this may save lot of time and effort. We found the

combination of cloud and Hadoop is surely a common way to setup large scale

application with lower cost, but higher elastic property.

Conclusion

Resources

http://hadoop.apache.org

http://developer.yahoo.com/hadoop

http://www.cloudera.com/resources

hadoop technology

hadoop mapreduce

data motion

hadoop framework

data readwrite

mb of data

input data

universal data

hadoop architecture

Technology

implementing hadoop software technology

sas and hadoop technology: overview · 2017-12-12 · about...

hadoop 101 - big data technology

hadoop - strategy and technology - sas can treat hadoop just...

bigdata & hadoop - technology latinoware 2016

hadoop 1.0 vs hadoop 2.0

sas® and hadoop technology: deployment scenarios

hadoop operations powered by ... hadoop (hadoop summit 2014...

distributed computing with apache hadoop: technology...

2. hadoop -...

hadoop installation guide | hadoop configuration

analyzing hadoop with hadoop

big data, hadoop, map-reduce · 20.12.2016 2 contents big...

cutting edge hadoop technology and...

cloud computing technology in scientific research...hadoop...

hadoop technology ppt

hadoop present - open enterprise hadoop

configuration for hadoop wps configuration for hadoop ·...

introduction to spark - information technology · hadoop vs...

optimizing java and apache hadoop for intel architecture...