big time: introducing hadoop on azure

55
Big Dat a

Upload: yaniv-rodenski

Post on 01-Nov-2014

524 views

Category:

Documents


1 download

DESCRIPTION

Introduction to HDInsight service (aka Hadoop on Azure)

TRANSCRIPT

Page 1: Big time: Introducing Hadoop on Azure

Big Data

Page 2: Big time: Introducing Hadoop on Azure

The problem is simple

• While the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives have not kept up.

• One typical drive from 1990 could store 1,370 MB of data and had a transfer speed of 4.4 MB/s

Page 3: Big time: Introducing Hadoop on Azure

• so you could read all the data from a full drive in around five minutes.

• Over 20 years later, one terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk.

Page 4: Big time: Introducing Hadoop on Azure

ParallelGo

Page 5: Big time: Introducing Hadoop on Azure

Cloud computing changes the way applications grow

http://journals.worldnomads.com/davidsgibson/photo/22804/664941/USA/Elephant-shaped-cloud!

Page 6: Big time: Introducing Hadoop on Azure

Yaniv Rodenski Senior Consultant, Sela Grouphttp://blogs.microsoft.co.il/blogs/roadanTwitter: @YRodenski

[email protected]

BIG-TIME:Introducing Hadoop on Azure

David GinzburgBig Data infrastructure consultantTwitter: @David_Ginzburg

[email protected]

Page 7: Big time: Introducing Hadoop on Azure

1

34

AGENDA

2

Page 8: Big time: Introducing Hadoop on Azure

Apache™ Hadoop™

Page 9: Big time: Introducing Hadoop on Azure

Apache™ Hadoop™

Page 10: Big time: Introducing Hadoop on Azure

Hadoop Distributed File System (HDFS)

HDFS Client

Page 11: Big time: Introducing Hadoop on Azure

Hadoop Distributed File System (HDFS)

HDFS Client

Page 12: Big time: Introducing Hadoop on Azure

Hadoop Distributed File System (HDFS)

HDFS Client

Page 13: Big time: Introducing Hadoop on Azure

MapReduce via WordCount

Hello World

Hello Azure

Goodbye Cruel World

1

1

1

1

1

1

1

1

1

1

1

1

1

1

2

2

1

1

1

Page 14: Big time: Introducing Hadoop on Azure

A new way to MapReduce

DEMO

Page 15: Big time: Introducing Hadoop on Azure

Hadoop MapReduce Processing

Input Split

Input Split

Input Split

Merge

Page 16: Big time: Introducing Hadoop on Azure

Hadoop MapReduce Processing

Job Client

Page 17: Big time: Introducing Hadoop on Azure

MapReduce TMI

Input Split

Partition, Sort,

and spill to disk

Buffer

Fetch

Page 18: Big time: Introducing Hadoop on Azure

MapReduce TMI

Sort

Output

Map Outpu

t

Map Outpu

t

Map Outpu

t

Map Outpu

t

Merge result

Merge result

Page 19: Big time: Introducing Hadoop on Azure

Partitioners

Page 20: Big time: Introducing Hadoop on Azure

Combiners

Page 21: Big time: Introducing Hadoop on Azure

The TeraSort Use case

••

Page 22: Big time: Introducing Hadoop on Azure

The TeraSort Use case

Page 23: Big time: Introducing Hadoop on Azure

Beginners Pitfalls

••

Page 24: Big time: Introducing Hadoop on Azure

Beginners Pitfalls

••

Page 25: Big time: Introducing Hadoop on Azure

Distinct Values Problem Statement

:// . . /2012/02/01/ -http highlyscalable wordpress com mapreduce/patterns

Page 26: Big time: Introducing Hadoop on Azure

Distinct Values Problem Statement

:// . . /2012/02/01/ -http highlyscalable wordpress com mapreduce/patterns

Page 27: Big time: Introducing Hadoop on Azure

Distinct Values Problem Statement

:// . . /2012/02/01/ -http highlyscalable wordpress com mapreduce/patterns

Page 28: Big time: Introducing Hadoop on Azure

Distinct Values Problem Statement

:// . . /2012/02/01/ -http highlyscalable wordpress com mapreduce/patterns

Page 29: Big time: Introducing Hadoop on Azure

Administrating Hadoop in the real world

DEMO

Page 30: Big time: Introducing Hadoop on Azure

Why did Microsoft choose Hadoop?

Page 31: Big time: Introducing Hadoop on Azure

Hadoop on Azure

Page 32: Big time: Introducing Hadoop on Azure

Using hadooponazure.com

DEMO

Page 33: Big time: Introducing Hadoop on Azure

Windows Azure Compute

Azure Role

Supporting service

Application

Configuration

Page 34: Big time: Introducing Hadoop on Azure

Hadoop on Azure Roles

Azure Role

Monitoring service (RdAdmin)

Hadoop services

Configuration

Page 35: Big time: Introducing Hadoop on Azure

Hadoop MapReduce Processing

Head Node

Name Node

Worker Node

Data Node

Worker Node

Data Node

Worker Node

Data Node

Worker Node

Data Node

Fabric Controller

Worker Node

Data Node

Worker Node

Data Node

Worker Node

Data Node

Page 36: Big time: Introducing Hadoop on Azure

Hadoop MapReduce Processing

Head Node

Name Node

Worker Node

Data Node

Worker Node

Data Node

Worker Node

Data Node

Worker Node

Data Node

Fabric Controller

Worker Node

Data Node

Worker Node

Data Node

Worker Node

Data Node

Page 37: Big time: Introducing Hadoop on Azure

Hadoop MapReduce Processing

Head Node

Name Node

Worker Node

Data Node

Worker Node

Data Node

Worker Node

Data Node

Worker Node

Data Node

Fabric Controller

Worker Node

Data Node

Worker Node

Data Node

Worker Node

Data Node

Worker Node

Data Node

Page 38: Big time: Introducing Hadoop on Azure

The Head Node Template

••

Page 39: Big time: Introducing Hadoop on Azure

The Worker Node Template

Page 40: Big time: Introducing Hadoop on Azure

Node VM Templates

HEAD NODE WORKER NODE

VM Template Extra Large Medium

Cores 8 2

Memory 14 GB 3.5 GB

HD 2 TB 489 GB

Page 41: Big time: Introducing Hadoop on Azure

Cloud Storage

Page 42: Big time: Introducing Hadoop on Azure

High Availability on Azure

Fabric Controller

Head Node

Name Node

Head Node

Name Node

Azure Storage

Page 43: Big time: Introducing Hadoop on Azure

Elastic MapReduce

Page 44: Big time: Introducing Hadoop on Azure

Elastic MapReduce

Storage Client

Amazon S3

Head Node

Jobtracker

Worker Node

Tasktracker

Worker Node

Tasktracker

Worker Node

Tasktracker

Azure Storage

Page 45: Big time: Introducing Hadoop on Azure

Elastic MapReduce

Storage Client

Amazon S3

Head Node

Jobtracker

Worker Node

Tasktracker

Worker Node

Tasktracker

Worker Node

Tasktracker

Azure Storage

Head Node

Jobtracker

Worker Node

Tasktracker

Worker Node

Tasktracker

Worker Node

Tasktracker

Page 46: Big time: Introducing Hadoop on Azure

Elastic MapReduce

Storage Client

Amazon S3

Azure Storage

$$ $ $ $$ $ $ $

Page 47: Big time: Introducing Hadoop on Azure

Using Elastic MapReduce

DEMO

Page 48: Big time: Introducing Hadoop on Azure

Azure Blob Considerations

Page 49: Big time: Introducing Hadoop on Azure

Storage Size Limitations

Page 50: Big time: Introducing Hadoop on Azure

IsotopeJS

Page 51: Big time: Introducing Hadoop on Azure

Using the JavaScript interactive console

DEMO

Page 52: Big time: Introducing Hadoop on Azure

Using Hive

DEMO

Page 53: Big time: Introducing Hadoop on Azure

Summary

Page 54: Big time: Introducing Hadoop on Azure

Q & A

Page 55: Big time: Introducing Hadoop on Azure

Resources

http://bit.ly/roadan My Blog

Apache™ Hadoop™http://hadoop.apache.org

http://www.hadooponazure.com

Hadoop on Azure

Tom Whitehttp://shop.oreilly.com/product/9780596521981.do

Hadoop: The Definitive Guide

http://www.windowsazure.com/en-us/develop/overviewWindows Azure Developer center

Thanks!Yaniv Rodenski

Twitter: @YRodenski