esynergy - windows azure: introduction to big data and hadoop

24
Welcome to Windows Azure by Microsoft. eSynergy Solutions was the Gold sponsor for the UK Microsoft Open Source Cloud event held in June 2012.

Upload: weareesynergy

Post on 22-Jan-2015

438 views

Category:

Technology


1 download

DESCRIPTION

eSynergy Solutions was the Gold sponsor for the UK Microsoft Open Source Cloud event held in June 2012.

TRANSCRIPT

Page 1: eSynergy - Windows Azure: Introduction to big data and hadoop

Welcome to Windows Azure by Microsoft.

eSynergy Solutions was the Gold sponsor for the UK Microsoft Open Source Cloud event held in June 2012.

Page 2: eSynergy - Windows Azure: Introduction to big data and hadoop
Page 3: eSynergy - Windows Azure: Introduction to big data and hadoop

Introduction to Big Data and Hadoop

Page 4: eSynergy - Windows Azure: Introduction to big data and hadoop
Page 5: eSynergy - Windows Azure: Introduction to big data and hadoop

Defining Big Data

Volume Velocity Variety

Page 6: eSynergy - Windows Azure: Introduction to big data and hadoop

The world of data is changing

10x increase every five years

85% from new

data types

Cheap Distributed Storage and Processing

4.3connected devices per adult

Dataexplosion

By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent.

– Gartner, Mark Beyer, “Information Management in the 21st Century”

Easy Accessibility of Extended Data

Page 7: eSynergy - Windows Azure: Introduction to big data and hadoop
Page 8: eSynergy - Windows Azure: Introduction to big data and hadoop

SOCIAL & WEB ANALYTICS

LIVE DATA FEEDS

ADVANCED ANALYTICS

How do I optimize my fleet based on weather and traffic patterns?

How do I better predict future outcomes?

What’s the social sentiment for my brand or products

New questions are being asked by the business:

Page 9: eSynergy - Windows Azure: Introduction to big data and hadoop

OPERATIONAL DATA

Traditional E-Commerce Data Flow

NEW USER REGISTRY

NEW PURCHASE

NEW PRODUCT

Excess Data

Logs

ETL Some Data

Data Warehouse

Page 10: eSynergy - Windows Azure: Introduction to big data and hadoop

OPERATIONAL DATA

New E-Commerce Big Data Flow

Raw Data“Store it All” Cluster

Raw Data“Store it All” Cluster

NEW USER REGISTRY

NEW PURCHASE

NEW PRODUCT

Data Warehouse

Logs

Logs

How much do views for certain products increase when our TV ads run?

Page 11: eSynergy - Windows Azure: Introduction to big data and hadoop

Big data creates New Business Opportunities

Revenue Growth

Increases ad revenue by processing 3.5 billion events per day

MassiveVolumes

Processes 464 billion rows per quarter, with average query time under 10 secs.

1Businesses Innovation

Measures and ranks online user influence by processing more than 1 billion signals per day

CloudConnectivity

Connects across 15 social networks via the cloud for data and API access

Operational Efficiencies

Identify faults in gas turbines before they happen

GE

Near Real-TimeInsight

Receive signals from turbines and compare to normal signals and to ones when fault subsequently occured

1. Klout Case Study: http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/Klout/Data-Services-Firm-Uses-Microsoft-BI-and-Hadoop-to-Boost-Insight-into-Big-Data/710000000129

Page 13: eSynergy - Windows Azure: Introduction to big data and hadoop

Hadoop - the Basics

Page 14: eSynergy - Windows Azure: Introduction to big data and hadoop

FIRST, STORE THE DATA

Server

ServerServer

So How Does It Work?

Files

Server

Page 15: eSynergy - Windows Azure: Introduction to big data and hadoop

SECOND, TAKE THE PROCESSING TO THE DATA

So How Does It Work?

// Map Reduce function in JavaScript

var map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {

if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};

var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());

}context.write(key, sum);};

ServerServer

ServerServer

RUNTIME

Code

Page 16: eSynergy - Windows Azure: Introduction to big data and hadoop

Hadoop Architecture

Page 17: eSynergy - Windows Azure: Introduction to big data and hadoop

MapReduce – Workflow

Page 18: eSynergy - Windows Azure: Introduction to big data and hadoop

The Hadoop Ecosystem

ETL Tools BI Reporting RDBMS

Reference: Tom White’s Hadoop: The Definitive Guide

Page 19: eSynergy - Windows Azure: Introduction to big data and hadoop

Traditional RDBMS vs. MapReduce

TRADITIONAL RDBMS MAPREDUCE

Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

DBA Ratio 1:40 1:3000

Reference: Tom White’s Hadoop: The Definitive Guide

Page 20: eSynergy - Windows Azure: Introduction to big data and hadoop

Microsoft and Hadoop

Page 21: eSynergy - Windows Azure: Introduction to big data and hadoop

Hadoop on WindowsInsights to all users by activating new types of data

Integrate with Microsoft Business Intelligence

Hive ODBC Driver & Hive Add-in for Excel

Choice of deployment on Windows Server + Windows Azure

Integrate with Windows Components (AD, Systems Center)Easy installation and configuration of Hadoop on Windows

Simplified programming with . Net & Javascript integration

Integrate with SQL Server Data Warehousing

Diff

ere

nti

ati

on

Page 23: eSynergy - Windows Azure: Introduction to big data and hadoop

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION

IN THIS PRESENTATION.

Page 24: eSynergy - Windows Azure: Introduction to big data and hadoop

www.esynergy-solutions.co.uk

0207 444 4080

[email protected]