an introduction to apache hadoop

16
Introduction of Apache Hadoop Presenter: Prem Chand Mali, Mindfire Solutions Date: 30/01/2014

Upload: mindfire-solutions

Post on 10-May-2015

250 views

Category:

Technology


2 download

DESCRIPTION

Apache Hadoop is a framework for running applications on large cluster built of commodity hardware.

TRANSCRIPT

Page 1: An  Introduction to Apache Hadoop

Introduction of Apache Hadoop

Presenter: Prem Chand Mali, Mindfire SolutionsDate: 30/01/2014

Page 2: An  Introduction to Apache Hadoop

Presenter: Prem Chand Mali, Mindfire Solutions

About MeSCJP/OCJP - Oracle Certified Java ProgrammerMCP:70-480 - Specialist certification in HTML5

with JavaScript and CSS3 Exam

Skills : Java, Swings, Springs, Hibernate, JavaFX, Jquery, prototypeJS, ExtJS.

Connect Me : https://www.facebook.com/prem.c.mali http://www.linkedin.com/in/premmali https://twitter.com/prem_mali https://plus.google.com/106150245941317924019/about/p/pub

Contact Me : [email protected] / [email protected] mfsi_premchandm

Page 3: An  Introduction to Apache Hadoop

Agenda

Presenter: Prem Chand Mali, Mindfire Solutions

History

What is Apache Hadoop

Why Apache Hadoop

HDFS

MapReduce

Q & A

Page 4: An  Introduction to Apache Hadoop

History• Nutch Crawler based search

• GFS and Map Reduce paper published. • Yahoo! hired Doug Cutting and given dedicated team.

Presenter: Prem Chand Mali, Mindfire Solutions

Page 5: An  Introduction to Apache Hadoop

What is Apache Hadoop ?• Apache Hadoop is an open-source software framework that supports data-intensive distributed applications licensed under the Apache v2 license. It supports running applications on large clusters of commodity hardware.

• Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework.

• Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.

Presenter: Prem Chand Mali, Mindfire Solutions

Page 6: An  Introduction to Apache Hadoop

What is Apache Hadoop ?• The Apache Hadoop framework is composed of the following modules :

– Hadoop Distributed File System (HDFS) - a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster.

– Hadoop MapReduce - a programming model for large scale data processing.– Hadoop Common - contains libraries and utilities needed by other Hadoop

modules– Hadoop YARN - a resource-management platform responsible for managing

compute resources in clusters and using them for scheduling of users' applications.

Presenter: Prem Chand Mali, Mindfire Solutions

Page 7: An  Introduction to Apache Hadoop

Why Apache Hadoop ?• State of Data

– 90% of data in past three years.– Type of data

• Unstructured• Semi-structured• Relational

– Relation world can handle GB of data.• Distributed • Scalable• Flexible• Fault tolerant• Intelligent

Presenter: Prem Chand Mali, Mindfire Solutions

Page 8: An  Introduction to Apache Hadoop

HDFS• HDFS is the primary distributed storage used by Hadoop applications. It consist of following two type of components.

– NameNode

– DataNode • HDFS, is well suited for distributed storage and distributed processing using commodity hardware.

• Hadoop supports shell-like commands to interact with HDFS directly.

Presenter: Prem Chand Mali, Mindfire Solutions

Page 9: An  Introduction to Apache Hadoop

HDFS

Presenter: Prem Chand Mali, Mindfire Solutions

Page 10: An  Introduction to Apache Hadoop

MapReduce

Presenter: Prem Chand Mali, Mindfire Solutions

• MapReduce if combination of following three things.

– Map

– Shuffle

– Reduce • It done it's job through Job Tracker and Task Tracker

Page 11: An  Introduction to Apache Hadoop

MapReduce

Presenter: Prem Chand Mali, Mindfire Solutions

Page 12: An  Introduction to Apache Hadoop

MapReduce

Presenter: Prem Chand Mali, Mindfire Solutions

Page 13: An  Introduction to Apache Hadoop

MapReduce

Presenter: Prem Chand Mali, Mindfire Solutions

Page 14: An  Introduction to Apache Hadoop

Presenter: Prem Chand Mali, Mindfire Solutions

Question and Answer

Page 15: An  Introduction to Apache Hadoop

Thank you

Presenter: Prem Chand Mali, Mindfire Solutions

Page 16: An  Introduction to Apache Hadoop

www.mindfiresolutions.com

https://www.facebook.com/MindfireSolutions

http://www.linkedin.com/company/mindfire-solutions

http://twitter.com/mindfires

Presenter: Prem Chand Mali, Mindfire Solutions