hadoop on azure 101 what is the big deal?
Post on 24-Feb-2016
40 Views
Preview:
DESCRIPTION
TRANSCRIPT
Hadoop on Azure 101 What is the Big Deal?Dennis MulderSolution Architect – Global Windows Azure Center of ExcellenceMicrosoft Corporation
Agenda
Why Big Data?Understanding the
BasicsMicrosoft and Hadoop
Why Big Data?
Example Scenarios
The Potential: Solving Specific Industry ProblemseCommerce: mining web logs: collaborative filtering, user experience optimisation…Manufacturing: detecting trends and anomalies in sensor data: predicting and understanding faultsCapital Markets: joining market and external data: correlation detection for investment strategy identification, risk calculations…Retail Banking: historical transaction mining: fraud detection, customer segmentation…
Industry-specific data-sets leveraged to improve decision making and generate new revenue streams
OPERATIONAL DATA
Traditional E-Commerce Data Flow
NEW USER REGISTRY
NEW PURCHASE
NEW PRODUCT
Excess Data
LogsETL Some Data
Data Warehouse
OPERATIONAL DATA
New E-Commerce Big Data Flow
Raw Data“Store it All” Cluster
Raw Data“Store it All” Cluster
NEW USER REGISTRY
NEW PURCHASE
NEW PRODUCT
Data Warehouse
Logs
Logs
How much do views for certain products increase when our TV ads run?
Understanding the Basics Move the Compute to the Data
FIRST, STORE THE DATA
Server
ServerServer
So How Does It Work?
Files
Server
SECOND, TAKE THE PROCESSING TO THE DATA
So How Does It Work?
// Map Reduce function in JavaScriptvar map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {
if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());
}context.write(key, sum);};
ServerServer
ServerServer
RUNTIMECode
MapReduce – Workflow
18Map tasks
53705 $65
53705 $30 53705 $15
54235 $75 54235 $22
02115 $15 02115 $15
44313 $10 44313 $25
44313 $55
5 53705 $15 6 44313 $10
5 53705 $65 0 54235 $22
9 02115 $15 6 44313 $25
3 10025 $95 8 44313 $55
2 53705 $30 1 02115 $15
4 54235 $75 7 10025 $60
Mapper
Mapper
4 54235 $75 7 10025 $60
2 53705 $30 1 02115 $15
10025 $60
5 53705 $65 0 54235 $22
5 53705 $15 6 44313 $10
3 10025 $95 8 44313 $55
9 02115 $15 6 44313 $25
10025 $95
Scenario: Get sum sales grouped by zipCodeDa
taNo
de3
Data
Node
2Da
taNo
de1
Blocks of the Sales file in HDFS
GroupBy
GroupBy
(custId, zipCode, amount)
One output bucket per reduce task
Map
Reducer
Reducer
Reduce tasks
Reducer
53705 $65
54235 $75 54235 $22
10025 $95 44313 $55
10025 $60
Map
per
53705 $30 53705 $15
02115 $15 02115 $15
44313 $10 44313 $25
Map
per
53705 $65
53705 $30
53705 $15
44313 $10 44313 $25
10025 $95 44313 $55
10025 $60
54235 $75 54235 $22
02115 $15 02115 $15
Sort
Sort
Sort
53705 $65
53705 $30
53705 $15
44313 $10 44313 $25 44313 $55
10025 $95 10025 $60
54235 $75 54235 $22
02115 $15 02115 $15
SUM
SUM
SUM
10025 $155 44313 $90
53705 $110
54235 $97
02115 $30
Done!Sh
uffle
Reduce
Hadoop
Hadoop Architecture
Traditional RDBMS vs. MapReduce
TRADITIONAL RDBMS MAPREDUCEData Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
DBA Ratio 1:40 1:3000
Reference: Tom White’s Hadoop: The Definitive Guide
The Hadoop EcosystemETL Tools BI Reporting RDBMS
Reference: Tom White’s Hadoop: The Definitive Guide
Microsoft and Hadoop
Hadoop on Azure Azure Blob
Storage
Name Node
Data Node
Data Node
Data Node
Data Node
S3
HDFS
On Premise Enterprise Content• Transactional DBs• On Prem logs• Internal sensors
Cloud Enterprise Content• Generated in Azure
3rd Party Content• Azure Datamarket
• Generated/stored elsewhere
• Public content• Delivered online
Azure Blob
Storage
SQL Azure
Application end point
What does Hadoop in the Cloud mean?
Where is HDFS?Where is my data stored?Azure Blob Storage vs. HDFS
Detailed OfferingsHive ODBC Driver & Hive Add-in for ExcelIntegration with Microsoft PowerPivot
Hadoop based distribution for Windows Server & AzureStrategic Partnership with Hortonworks
JavaScript framework for HadoopRTM of Hadoop connectors for SQL Server and PDW
Microsoft Big Data Solution
Power View Excel with PowerPivot Embedded BIPredictive Analytics
APPsLOBCRMERP
Microsoft EDW
SSAS SSRS
Devices CrawlersSensors Bots
Hadoop On Windows Server
Hadoop On Windows Azure
Deploying and Interacting With a Hadoop Cluster on Azuredemo
Hadoop on WindowsInsights to all users by activating new types of data
Integrate with Microsoft Business Intelligence
Choice of deployment on Windows Server + Windows AzureIntegrate with Windows Components (AD, Systems Center)Easy installation and configuration of Hadoop on WindowsSimplified programming with . Net & Javascript integration Integrate with SQL Server Data Warehousing
Diffe
rent
iatio
n
Summary Hadoop is about massive compute and massive data The code is brought to the data Map -> Split the work Reduce -> Combine the results Relational databases vs Hadoop?
Wrong question - Serve different needs
Resourceshttp://www.hadooponazure.com/
http://hadoop.apache.org/
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION
IN THIS PRESENTATION.
top related