data loading in hadoop

24
Slide 1 © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Data Loading Techniques in Hadoop

Upload: skillspeed

Post on 16-Jul-2015

238 views

Category:

Technology


0 download

TRANSCRIPT

Slide 1© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Data Loading Techniques in Hadoop

Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Session Objectives

ᗍ Introduction to Big Data and Hadoop

ᗍ Understanding Hadoop Clusters

ᗍ Data loading Techniques in Hadoop

ᗍ BIG Data & Hadoop Job Market

ᗍ Big Data & Hadoop Course Details

ᗍ Webinar by Skillspeed

Get Started with BIG Data & Hadoop

Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Big Data and its Challenges

Get Started with BIG Data & Hadoop

Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Big Data and its Challenges

Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications

Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information

It’s very difficult to manage such huge data……

Get Started with BIG Data & Hadoop

Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Who Generates Big Data?

Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?

Today, it is becoming a problem for all of us to manage such BIG DATA…. Get Started with BIG Data & Hadoop

Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop can be used for easy processing of such huge Data…..

We will answer how?

Before that let’s understand what is Hadoop?Get Started with BIG Data & Hadoop

Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop and its Characteristics

Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model

It is an Open-source Data Management technology with scale-out storage and distributed processing

Hadoop Characteristics

Flexible

Reliable

Economical

Scalable Get Started with BIG Data & Hadoop

Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop Ecosystem

Flume Sqoop

Import Or Export

Unstructured or Semi-Structured data Structured Data

Apache Oozie (Workflow)

HDFS(Hadoop Distributed File System)

Pig LatinData Analysis

HiveDW System

MapReduce Framework HBase

OtherYARN

Frameworks (MPI,GIRAPH)

YARNCluster Resource Management

Get Started with BIG Data & Hadoop

Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop Cluster and Data Loading Techniques

Get Started with BIG Data & Hadoop

Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop Terminal Commands

Checking hdfs command:

Get Started with BIG Data & Hadoop

Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop Terminal Commands

Checking Hadoop command:

Get Started with BIG Data & Hadoop

Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop Terminal Commands

Listing hdfs directories:

sbin directory of Hadoop installation:

Get Started with BIG Data & Hadoop

Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Data Loading in Hadoop

Data Loading

Using Flume

Using Sqoop

Using Hadoop Copy Commands

MySQL Oracle SQL

Server

HDFS

Get Started with BIG Data & Hadoop

Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop Copy Options

distcp: Distributed copy to move data between clusters, used for backup and recovery:

put: Copy file(s) from local file system to destination file system. It can also read from “stdin” and writes to destination file system:

hadoop dfs –put weather.txt hdfs://<target Namenode>

hadoop dfs –copyFromLocal weather.txt hdfs://<target Namenode>

hadoop distcp hdfs://<source NN> hdfs://<target NN>

Get Started with BIG Data & Hadoop

Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Demonstration

Hive

Get Started with BIG Data & Hadoop

Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

What is Expected?

In this section, we will discuss the questions on HDFS and MapReduce that is asked during the interview

This will help you analyze the importance of the topics under study!

Get Started with BIG Data & Hadoop

Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

What is the use of Namenode in HDFS?

What is DataNode in HDFS?

What is Job Tracker in HDFS?

What is MapReduce?

How does an Hadoop application look like on their basic components?

And many more…………….

The Top 5 Interview Questions

Get Started with BIG Data & Hadoop

Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Job Trends – Hadoop

Get Started with BIG Data & Hadoop

Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Why SkillSpeed?

Course Curriculum

from Industry Experts

Instructor Led Live Virtual

Sessions

Lifetime access to Course

Content via LMS

100% Placement Assistance

24x7 Support

Get Started with BIG Data & Hadoop

Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Course Topics

Module 1

Introduction to Big Data and Hadoop

Module 2

HDFS Internals, Hadoop Configurations and

Data Loading

Module 3

Introduction to Map Reduce

Module 4

Advanced Map Reduce Concepts

Module 5

Introduction to Pig

Module 6

Advanced Pig and Introduction to Hive

Module 7

Advanced Hive Concepts

Module 8

Extending Hive and HBase Introduction

Module 9

Advanced HBase and Oozie Introduction

Module 10

Project Set-up Discussion

Get Started with BIG Data & Hadoop

Slide 21© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Corporate Partners

Get Started with BIG Data & Hadoop

Slide 22© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Lines open 24/7

To know more about the course, Please contact:

IND +91-90660-20904 USA 1866-607-6547 (Toll Free)

Or reach us at

[email protected]

Contact us..

Get Started with BIG Data & Hadoop

Slide 23© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Image References

Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots

http://iconizer.net/en/search/1/collection:Practika

http://findicons.com/icon/66444/user_group

http://www.virtualizor.com/tour

https://accounts.it.et.byu.edu/

http://www.clipartsfree.net/tag/server.html

http://www.gopixpic.com/16/time-clock-icon-png-download

http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/

http://www.lincs.fr/research/areas/big-data/

http://www.counsellingpages.co.uk/

http://langfordsconsultancy.com/langfords-training-support-package/

http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html

http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010