apache hadoop - taking a big leap in big data

3
Apache Hadoop Apache Hadoop – Taking a Big Leap In Big Data SPEC INDIA

Upload: sara-stanford

Post on 29-Jul-2015

29 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Apache Hadoop - Taking a Big Leap In Big Data

Apache Hadoop

Apache Hadoop – Taking a Big Leap

In Big Data SPEC INDIA

Page 2: Apache Hadoop - Taking a Big Leap In Big Data

Apache Hadoop – Taking a Big

Leap In Big Data

Data has been piling up in organizations since a number of years but since some time, because of the prevailing fervor

behind ‘Big Data’ and ‘Business Intelligence’, there is awareness and availability of valued information and accurate

storage of data to organizations, which is why they are happily storing their heaps of data and extracting desired

information in required format.

One such open source software for distributed processing of large chunks of data is Apache Hadoop, primarily targeted

towards reliable and robust computing. It is a framework which uses a simplistic approach to target storage and

computation needs ranging from a single server to multiple machines. Apache Hadoop is famously used for research as

well as production. It has been widely popular as the standard for storage, processing and analysis of hundreds of

terabytes of data. It offers distributed parallel processing across servers, which are normally prevalent in the industry.

An apt solution to capture and reveal the valuable information from the otherwise useless information bulks, Apache

Hadoop is the helping hand to enterprises to increase their business efficiency and ROI through instant availability of

useful information.

Modules Handled by Hadoop

Page 3: Apache Hadoop - Taking a Big Leap In Big Data

• Hadoop Common

A set of common utilities which assist the remaining Hadoop modules and supports the Hadoop

subprojects. It includes FileSystem, RPC and serialization libraries.

• Hadoop Distributed File System (HDFS)

It is a distributed file system which gives access to application data and spans across all the nodes in a

Hadoop cluster for data storage, to link them into one big file system. It gets reliability by data

replication across multiple nodes. It is a Java based file system that gives scalable and reliable data

storage.

• Hadoop YARN

A framework utilized for job scheduling and resource management of clusters, its basic motto is to split

up the two roles of the JobTracker, namely, resource management and job scheduling into separate

areas.

• Hadoop MapReduce

A system for parallel processing of large data sets. It acts as a framework that gets into the work

assignment to the nodes in a particular cluster. It is a software framework to write applications

processing large amounts of data, easily, on multiple nodes of hardware with utmost reliability and

scalability.

Why Hadoop?

With n number of Big Data frameworks available in the industry today, there are certain pointers why Hadoop has been

widely accepted and is gaining popularity. Let us have a glance through.

Scalability: Addition of new nodes is much easier without making much changes in the data formats. Hence, the

computing solution becomes quite scalable.

Reduced cost of ownership: Since Hadoop brings parallel computing onto commodity servers, there is a remarkable

decrease in the cost and hence it becomes affordable to organizations.

Flexibility: Any kind of data can be absorbed, be it structured or unstructured, be it from a single source or multiple.

Since there is no schema in Hadoop, data sources can be combined as required to give out necessary output.

Fault tolerance: The framework of Hadoop is built in such a way that whenever you lose a node, the system redirects

the assigned work onto another location of the data and the processing does not stop and there is no loss of data.

SPEC INDIA has been involved with a variety of BI and Big Data services and has served a large clientele all around the

world. We have been working with Hadoop, MongoDB for Big Data services and with Pentaho, Jaspersoft, Tableau as BI

tools. We are global certified partners with Pentaho. As for Hadoop, we provide services like Hadoop Cluster Setup,

Sqoop Integration with Hadoop Cluster to Export HDFS Data to MySQL, Analysis of website back links using Apache

Hadoop and Map-Reduce. We would glad to serve any of your BI and Big Data requirement.