sql, nosql , bigdata , tables, blobs and more… what’s a developer to do ?

Post on 09-Feb-2016

36 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

SQL, noSQL , BigData , Tables, Blobs and more… What’s a developer to do ?. David Campbell Technical Fellow. Overview. Describe the Landscape & How to Decide Explain “Big Data” Map/Reduce Drill-Down Answer Questions. Audience Participation…. Life Was Simple. “Forms Over Data”. - PowerPoint PPT Presentation

TRANSCRIPT

SQL, noSQL, BigData, Tables, Blobs and more… What’s a developer to do?

David CampbellTechnical Fellow

A.Describe the Landscape & How to Decide

B.Explain “Big Data”C.Map/Reduce Drill-DownAnswer Questions

Overview

Audience Participation…

Life Was Simple

“Forms Over Data”

Device / CloudMulti-dimensional ExperiencesSocial IntegrationRapid EvolutionVolatile Scale

Not anymore…

A Storage Zoo…

The Result

Rapid Development and Evolution• Persistence Ignorance• Schema Evolution / Dynamic Schema

Friction Free Scaling• O(1) Management Scale• Partition Ignorance• HA & Resilience

Maximize Return on Available Data• Audience Analytics• Recommendations

What do Developers Want?

?

Data ModelConsistency ModelCluster ModelQuery ModelView Model

How do we make sense of this?

A Conceptual Model

It’s Simple – Really!

Smart Choice = Separation & Composition

Entity Framework Code First Migrations

The Cost of ConsistencyCo

st~{

frict

ion,

per

form

ance

, ava

ilabi

lity,

…}

System Implementation Level ----Data Model Level ----

Machine Rack Data Center InternetAt

tribu

teEn

tity

Shar

d Data

base Da

taba

se

ACID consistency within members (shards)

Eventual consistency across members

SQL Azure DB Federations

M1 M2 M3 M4 M5

Root

Takeaway: How to ChooseConceptual Model Drives Smart ChoicesYou can mix and match – baby & bathwater, etc.TNSTAAFL

You are now smarter than most bloggers on this topic!

Azure OfferingsAzure Blob Storage

Elastic Inexpensive storageAzure Tables

Elastic Key/Attribute storageAzure Caching

Elastic Key/Object cacheAzure SQL Database

Elastic RDBMS with sharding capabilities

Explaining “Big Data”

Awash in “Ambient Data”Free to acquireCheap to store“Information Production”Turns Ambient Data into InformationInsight GenerationTurns Information into Insights & Actions

What is “Big Data” really about?

Top Level Value Flow

Ambient Data

Information Production

Insights & Actions

Data Acquisition Cost $0

$1.10

$1,000 $1,000,000,000

$0.00

From: $1B/TB To: ~$0/TB

Data Storage Cost $0

Source: http://www.littletechshoppe.com/ns1625/winchest.html

$December 1981 -

$660M/TBAugust 2010 -

$100/TB

From: $660,000,000/TB To: $100/TB in 30 years

The Big Dataflow…

Digital Shoebox

SourceSourc

eSource

SourceSourceSource

SourceSourceSourceInformationProduction

Traditional Systems• Data Warehouses /

Marts• Cubes• …

Emergent Systems• Deep data mining• Machine Learning• Near real-time

prediction• …

Time

Standard Data Analytics Lifecycle

Questio

n

Collect

the da

ta

Build a

logica

l mod

el

Build a

physi

cal m

odel

Load t

he da

ta

TuneAnsw

er the

quest

ion

Often weeks to months

Lifecycle of a Question

QuestionWorth asking again?

Make it repeatable

Bring it to production

Validation

Different Questio

n

Not interesting

Personal Example - GPSSource T1

T2

T3

T4

T5

• Tree of transforms and filters• Cleansing often happens in transformed

domain• E.g. Where I slept each night…

• Can produce higher level information• [DwellAtHome],[RouteToWork],

[DwellAtWork] = ‘Commute to work’• Using higher level information:

• Commute duration f(leavingTime)

Commute Time as f(leaveTime)

Event & State Correlation

2011-06-10 06:18:26, 2011-06-10 06:16:18, 0.04 2011-06-10 06:21:18, 2011-06-09 08:27:50, 21.89 2011-06-10 06:24:37, 2011-06-09 07:43:58, 22.68 2011-06-10 06:26:48, None, 0.00 2011-06-10 06:29:37, 2011-06-09 06:53:34, 23.60 2011-06-10 06:34:41, 2011-06-09 12:00:25, 18.57 2011-06-10 06:39:52, 2011-06-09 17:44:54, 12.92 2011-06-10 06:43:18, 2011-06-09 14:28:49, 16.24

Dwell geolocation

Outlook statistics

+

=How much email do I send from home vs. at work?

Developer Friendly Information Production MachineSimple to UnderstandSimple to Develop ForInherently Scalable

What’s the deal with Hadoop and other Map/Reduce systems?

Map / Reduce Systems

EYNTK about MapReduce on One Slide Map

Map

Map

Map

Reduce

Reduce

1 2 3 4 5

1. MapReduce framework splits input up into groups of data2. MapReduce framework calls your Map function – Map(input)

a) Your Map function processes input and returns 0 or more (key,value) pairs3. MapReduce framework collates keys (“Shuffle”)4. MapReduce framework calls your Reduce function – Reduce(key, []values)

a) Your Reduce function processes values and returns a result5. MapReduce framework writes your result to the filesystem

HDInsightHadoop on Windows {Azure, Server, Laptop}Hortonworks HDP distribution.NET Map/Reduce APILinq to Hive

Let’s Look at Some Code…

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

top related