hadoop on azure 101 what is the big deal?

29
Hadoop on Azure 101 What is the Big Deal? Dennis Mulder Solution Architect – Global Windows Azure Center of Excellence Microsoft Corporation

Upload: tangia

Post on 24-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Hadoop on Azure 101 What is the Big Deal?. Dennis Mulder Solution Architect – Global Windows Azure Center of Excellence Microsoft Corporation. Agenda. Why Big Data? Understanding the Basics Microsoft and Hadoop. Why Big Data ?. 1.8 ZETTABYTES. Of Information will be created in 2011 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hadoop  on Azure 101 What is the Big Deal?

Hadoop on Azure 101 What is the Big Deal?Dennis MulderSolution Architect – Global Windows Azure Center of ExcellenceMicrosoft Corporation

Page 2: Hadoop  on Azure 101 What is the Big Deal?

Agenda

Why Big Data?Understanding the

BasicsMicrosoft and Hadoop

Page 3: Hadoop  on Azure 101 What is the Big Deal?

Why Big Data?

Page 4: Hadoop  on Azure 101 What is the Big Deal?
Page 5: Hadoop  on Azure 101 What is the Big Deal?
Page 6: Hadoop  on Azure 101 What is the Big Deal?
Page 7: Hadoop  on Azure 101 What is the Big Deal?

Example Scenarios

Page 8: Hadoop  on Azure 101 What is the Big Deal?

The Potential: Solving Specific Industry ProblemseCommerce: mining web logs: collaborative filtering, user experience optimisation…Manufacturing: detecting trends and anomalies in sensor data: predicting and understanding faultsCapital Markets: joining market and external data: correlation detection for investment strategy identification, risk calculations…Retail Banking: historical transaction mining: fraud detection, customer segmentation…

Industry-specific data-sets leveraged to improve decision making and generate new revenue streams

Page 9: Hadoop  on Azure 101 What is the Big Deal?

OPERATIONAL DATA

Traditional E-Commerce Data Flow

NEW USER REGISTRY

NEW PURCHASE

NEW PRODUCT

Excess Data

LogsETL Some Data

Data Warehouse

Page 10: Hadoop  on Azure 101 What is the Big Deal?

OPERATIONAL DATA

New E-Commerce Big Data Flow

Raw Data“Store it All” Cluster

Raw Data“Store it All” Cluster

NEW USER REGISTRY

NEW PURCHASE

NEW PRODUCT

Data Warehouse

Logs

Logs

How much do views for certain products increase when our TV ads run?

Page 11: Hadoop  on Azure 101 What is the Big Deal?

Understanding the Basics Move the Compute to the Data

Page 12: Hadoop  on Azure 101 What is the Big Deal?

FIRST, STORE THE DATA

Server

ServerServer

So How Does It Work?

Files

Server

Page 13: Hadoop  on Azure 101 What is the Big Deal?

SECOND, TAKE THE PROCESSING TO THE DATA

So How Does It Work?

// Map Reduce function in JavaScriptvar map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {

if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());

}context.write(key, sum);};

ServerServer

ServerServer

RUNTIMECode

Page 14: Hadoop  on Azure 101 What is the Big Deal?

MapReduce – Workflow

Page 15: Hadoop  on Azure 101 What is the Big Deal?

18Map tasks

53705 $65

53705 $30 53705 $15

54235 $75 54235 $22

02115 $15 02115 $15

44313 $10 44313 $25

44313 $55

5 53705 $15 6 44313 $10

5 53705 $65 0 54235 $22

9 02115 $15 6 44313 $25

3 10025 $95 8 44313 $55

2 53705 $30 1 02115 $15

4 54235 $75 7 10025 $60

Mapper

Mapper

4 54235 $75 7 10025 $60

2 53705 $30 1 02115 $15

10025 $60

5 53705 $65 0 54235 $22

5 53705 $15 6 44313 $10

3 10025 $95 8 44313 $55

9 02115 $15 6 44313 $25

10025 $95

Scenario: Get sum sales grouped by zipCodeDa

taNo

de3

Data

Node

2Da

taNo

de1

Blocks of the Sales file in HDFS

GroupBy

GroupBy

(custId, zipCode, amount)

One output bucket per reduce task

Map

Page 16: Hadoop  on Azure 101 What is the Big Deal?

Reducer

Reducer

Reduce tasks

Reducer

53705 $65

54235 $75 54235 $22

10025 $95 44313 $55

10025 $60

Map

per

53705 $30 53705 $15

02115 $15 02115 $15

44313 $10 44313 $25

Map

per

53705 $65

53705 $30

53705 $15

44313 $10 44313 $25

10025 $95 44313 $55

10025 $60

54235 $75 54235 $22

02115 $15 02115 $15

Sort

Sort

Sort

53705 $65

53705 $30

53705 $15

44313 $10 44313 $25 44313 $55

10025 $95 10025 $60

54235 $75 54235 $22

02115 $15 02115 $15

SUM

SUM

SUM

10025 $155 44313 $90

53705 $110

54235 $97

02115 $30

Done!Sh

uffle

Reduce

Page 17: Hadoop  on Azure 101 What is the Big Deal?

Hadoop

Page 18: Hadoop  on Azure 101 What is the Big Deal?

Hadoop Architecture

Page 19: Hadoop  on Azure 101 What is the Big Deal?

Traditional RDBMS vs. MapReduce

TRADITIONAL RDBMS MAPREDUCEData Size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

DBA Ratio 1:40 1:3000

Reference: Tom White’s Hadoop: The Definitive Guide

Page 20: Hadoop  on Azure 101 What is the Big Deal?

The Hadoop EcosystemETL Tools BI Reporting RDBMS

Reference: Tom White’s Hadoop: The Definitive Guide

Page 21: Hadoop  on Azure 101 What is the Big Deal?

Microsoft and Hadoop

Page 22: Hadoop  on Azure 101 What is the Big Deal?

Hadoop on Azure Azure Blob

Storage

Name Node

Data Node

Data Node

Data Node

Data Node

S3

HDFS

On Premise Enterprise Content• Transactional DBs• On Prem logs• Internal sensors

Cloud Enterprise Content• Generated in Azure

3rd Party Content• Azure Datamarket

• Generated/stored elsewhere

• Public content• Delivered online

Azure Blob

Storage

SQL Azure

Application end point

What does Hadoop in the Cloud mean?

Where is HDFS?Where is my data stored?Azure Blob Storage vs. HDFS

Page 23: Hadoop  on Azure 101 What is the Big Deal?

Detailed OfferingsHive ODBC Driver & Hive Add-in for ExcelIntegration with Microsoft PowerPivot

Hadoop based distribution for Windows Server & AzureStrategic Partnership with Hortonworks

JavaScript framework for HadoopRTM of Hadoop connectors for SQL Server and PDW

Page 25: Hadoop  on Azure 101 What is the Big Deal?

Deploying and Interacting With a Hadoop Cluster on Azuredemo

Page 26: Hadoop  on Azure 101 What is the Big Deal?

Hadoop on WindowsInsights to all users by activating new types of data

Integrate with Microsoft Business Intelligence

Choice of deployment on Windows Server + Windows AzureIntegrate with Windows Components (AD, Systems Center)Easy installation and configuration of Hadoop on WindowsSimplified programming with . Net & Javascript integration Integrate with SQL Server Data Warehousing

Diffe

rent

iatio

n

Page 27: Hadoop  on Azure 101 What is the Big Deal?

Summary Hadoop is about massive compute and massive data The code is brought to the data Map -> Split the work Reduce -> Combine the results Relational databases vs Hadoop?

Wrong question - Serve different needs

Page 28: Hadoop  on Azure 101 What is the Big Deal?

Resourceshttp://www.hadooponazure.com/

http://hadoop.apache.org/

Page 29: Hadoop  on Azure 101 What is the Big Deal?

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION

IN THIS PRESENTATION.