raghu ramakrishnan - globaltaxevent.com · raghu ramakrishnan cto ... statistical methods for...

41
A World of Data Raghu Ramakrishnan CTO for Data, Technical Fellow Microsoft

Upload: vannga

Post on 18-Apr-2018

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

A World of Data

Raghu Ramakrishnan

CTO for Data, Technical Fellow Microsoft

Page 2: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive
Page 3: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Content Optimization Agrawal et al., CACM 56(6):92-101 (2013) Content Recommendation on Web Portals

Key Features

Package Ranker (CORE)

Ranks packages by expected CTR based on

data collected every 5 minutes

Dashboard (CORE)

Provides real-time insights into performance by

package, segment, and property

Mix Management (Property)

Ensures editorial voice is maintained and user

gets a variety of content

Package rotation (Property)

Tracks which stories a user has seen and

rotates them after user has seen them for a

certain period of time

Key Performance Indicators

Lifts in quantitative metrics

Editorial Voice Preserved

Recommended links News Interests Top Searches

Page 4: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Estimate P(response | user, item, context)

Statistical Methods for Recommender Systems, Agarwal and Chen, CUP, 2016

Page 5: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Modeling Overview

Offline Modeling • Exploratory data analysis • Regression, feature selection, collaborative filtering (factorization) • Seed online models & explore/exploit methods at good initial points • Reduce the set of candidate items

Online Learning • Online regression models, time-series models • Model the temporal dynamics • Provide fast learning for per-item models

Explore/Exploit • Multi-armed bandits • Find the best way of collecting real- time user feedback (for new items)

Large amount of historical data

(user event streams)

Near real-time user feedback

Page 6: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Store any data relations

Do any analysis SQL queries

Hive,

At any speed Batch

Hive

At any scale … elastic!

Anywhere

Data to Intelligent

Action

Page 7: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Windows

SMSG

Live Ads

CRM/Dynamics Windows Phone

Xbox Live

Office365

STB Malware Protection Microsoft Stores

STB Commerce Risk

Messenger LCA

Exchange

Yammer Skype

Bing

data managed: EBs

cluster sizes: 10s of Ks

# machines: 100s of Ks

daily I/O: >100 PBs

# internal developers: 1000s

# daily jobs: 100s of Ks

Page 8: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Azure Data Lake

“1st party = 3rd party”

Hadoop and OSS

Page 9: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Cloud

Data

Intelligence IaaS

PaaS

SaaS

Relational

Document

Data Lake

In-Memory

Operational Analytics

(Algorithms, IoT…)

Our Axes of Innovation

Page 10: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Azure

Page 11: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

34 Azure

regions

2x

as AWS

Page 12: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

More certifications than any other cloud provider

Industry leader for customer advocacy and privacy protection

Unique data residency guarantees

Page 13: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

+

Applications

Management

App Frameworks

Databases & Middleware

Infrastructure

Linux

Page 14: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

• Amoeba Rayon

• Status: shipping in Apache Hadoop 2.6

• Mercury and Yaq

• Status: Now in Apache Hadoop trunk!

• Federation

• Status: prototype and JIRA

• Framework-level Pooling • Enable frameworks that want to take over resource allocation to support millisecond-

level response and adaptation times • Status: spec

Microsoft Contributions to OSS Apache YARN

Page 15: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Hybrid Management + Security

Log analytics Automation Backup DR and data protection Security

Page 16: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

17

3 6

0

MQ

le

ad

er

qu

ad

ran

ts

Competitor 1 Competitor 2 Competitor 3

Page 17: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive
Page 18: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Cortana Intelligence

Suite

Page 19: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Data and Analytics – 3 Pillars

SQL 2016 Server

Azure DB

Azure DW

SQL server R services

On-prem and cloud

(Windows, Linux)

Cortana Intelligence

Suite Azure Big Data and Analytics Hadoop, Data Lake, Machine

learning, PowerBI, Data Factory, Streaming,

Perceptual Intelligence

On-prem connectivity

Microsoft

R server Analytics

Hadoop

Teradata

On-prem and cloud

(Windows, Linux)

Page 20: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Forrester Wave

Big Data Hadoop

Cloud Solutions

Q2 2016

Page 21: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive
Page 22: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive
Page 23: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Personalized Offers

Page 24: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Web Logs, Omniture logs

On-Premise SQL Server

(customer and product data)

In-Store Activity with

Kinect sensors

Social Data

Diagnostic streaming

Event hubs

Machine Learning

Stream Analytics

Azure DataLake

Data Factory: Move Data, Orchestrate, Schedule, and Monitor

HDInsight HDInsight Machine Learning

Azure SQL Data Warehouse

Power BI

INGEST PREPARE ANALYZE PUBLISH

Stream Analytics

CONSUME DATA SOURCES

Cortana

Web/LOB Dashboards

Page 25: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Churn Prediction

Tacoma Public Schools wanted to leverage data to predict student dropout risks to increase graduation rates

Page 26: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Personalized Healthcare

• Data from sensors and devices such as blood-pressure cuffs and activity trackers

Cortana Analytics dashboard where registered nurses have a singular view of each customer’s personalized care plan

Many of the solutions currently

on the market give physicians

access to raw data; that’s not as

useful as actionable intelligence

to help them make a diagnosis.

When you start looking at tools

such as ImagineCare that have

intelligence built in, I think that’s a

big deal for providers.

DR. ETHAN BERKE

Medical Director for Clinical Design and

Innovation

Page 27: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Service Analytics

• To replace manual threshold method to monitor the dynamics telemetry data to detect anomalies intelligently.

• To detect the small trending or level changes early in order to start timely investigations and actions to prevent potential incidents;

• To learn automatically from both historical and real-time data to scale the monitoring.

• SQL Azure uses Anomaly Detection models to track hundreds different service exceptions that won’t be able to tracked just by setting thresholds.

Page 28: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Revenue Forecasting

This helps us triangulate internal forecasts and gives us more confidence in the forward looking revenue ranges we provide to Wall Street.

VANDANA GANGAWAR

Senior Director,

Microsoft Central Finance Planning and Operations

Page 29: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Azure Data Lake Store

Fully managed cloud data store designed for analytics

Supports HDFS compliant analytics applications and tools

Petabyte files, unlimited account size

High throughput for analytics performance

Low latency ingestion with read as you write

AAD-based authentication, access auditing

File and folder-level ACLs, Encryption at rest

Page 30: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Azure Data Lake Analytics An elastic analytics service built on Apache YARN that processes all data, at any size

Pay PER QUERY & Scale PER QUERY

- No need to create a cluster

No limits to SCALE

Includes U-SQL, unifying the benefits of

SQL with the expressive power of C#

- In future: Hive, Spark

Optimized to work with ADL STORE

FEDERATED QUERY across Azure data

sources

ENTERPRISE GRADE role-based access

control and auditing

Page 31: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive
Page 32: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Azure HDInsight—Linux and Windows

Managed, Monitored, Supported • Cluster customization – Install your favorite project

• Harness existing .Net & Java skills to write

customer extensions

• Supports broad ecosystem of ISVs

(Hadoop and Traditional)

Full Apache Hadoop • Batch – MapReduce, PIG, Hive, Spark

• Stream Processing and Analytics – Storm,

SparkStreaming

• Interactive SQL – Hive (Tez), and SparkSQL

• Table Serving – Hbase

• Machine Learning – SparkML, Mahout

Page 33: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Azure HDInsight

Batch MapReduce, PIG, Hive, Spark

Interactive SQL Hive (Tez), SparkSQL

Stream Analytics Storm, SparkStreaming

Machine Learning SparkML, Mahout

Table Serving Hbase

Exploratory Visualization Jupyter, Zeppelin

Interactive SQL SQL DW

Stream Analytics Azure Stream Analytics

Machine Learning Azure ML

Table Serving Azure SQL DB

Exploratory Visualization Power BI T

he B

est

of

Had

oo

p

Page 34: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

High-performance open source R plus:

Enterprise Scale & Performance

– Scales from workstations to large clusters

– Scales to large data sizes

– Growing portfolio of Parallelized algorithms

Secure, Scalable R Deployment/Operationalization

Write Once Deploy Anywhere for multiple platforms

IDE for data scientists and developers

Enterprise Class Support

DistributedR

DeployR DevelopR

ScaleR

ConnectR

Page 35: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

Code Portability Across Platforms

Azure VM Azure HDI, Spark Azure ML PowerBI Office 365 …

Linux Windows

Teradata, SQL Server

Hortonworks Cloudera MapR

In the Cloud

Workstations & Servers

EDW

Hadoop

DistributedR

ScaleR

ConnectR

DevelopR

Page 36: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

SQL Server 2016: Everything Built-In

The above graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Microsoft. Gartner does not endorse any

vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research

organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Consistent experience from on-premises to cloud

In-memory across all workloads

TPC-H non-clustered 10TB

Oracle is #4 #2

SQL Server

#1

SQL Server

#3

SQL Server

built-in built-in built-in built-in built-in

0 1

4

0 0 3

34

29

22

15

5

22

6

43

20

69

18

49

3

-80

-70

-60

-50

-40

-30

-20

-10

0

2010 2011 2012 2013 2014 2015

SQL Server Oracle MySQL2 SAP HANA

TPC-H non-clustered results as of 04/06/15, 5/04/15, 4/15/14 and 11/25/13, respectively. http://www.tpc.org/tpch/results/tpch_perf_results.asp?resulttype=noncluster

at massive scale

National Institute of Standards and Technology Comprehensive Vulnerability Database update 5/4/2015

Microsoft Tableau Oracle

$120

$480

$2,230

Self-service BI per user

Page 37: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

In-Database Advanced Analytics No need to move the data

Open source R with in-memory & massive scale – multi-threading & massive parallel processing

Data Scientist Interact directly with data

R built-in to SQL Server

Data Developer/DBA Manage data and

analytics together

Example Solutions

• Sales forecasting

• Warehouse efficiency

• Predictive maintenance

Extensibility

? R

R Integration

Relational data

Analytic Library

T-SQL interface

010010

100100

010101

New R scripts

010010

100100

010101

010010

100100

010101

010010

100100

010101

• Credit risk protection

010010

100100

010101

Microsoft Azure Marketplace

Real-time operational analytics without moving the data

NEW

NEW

End-to-end mobile BI Advanced Analytics Mission critical OLTP

Page 38: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive
Page 39: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive
Page 40: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive

SMS (Twilio)

Skype Consumer

… … …

Groupme

Active Directory Bot Securely access people in a company (+ files, topics, data) from anywhere, via conversation

Intelligence: Cognitive Services

Image:

Face, Age,

Gender,

Emotion

Academic

Knowledge

Conversation channels

Language

Understanding

Slack

o365 apis

…beginning with conversation, people & bots

Page 41: Raghu Ramakrishnan - globaltaxevent.com · Raghu Ramakrishnan CTO ... Statistical Methods for Recommender Systems ... National Institute of Standards and Technology Comprehensive