introducing azure sql data warehouse

27
Introducing Azure SQL Data Warehouse James Serra Big Data Evangelist Microsoft

Upload: james-serra

Post on 06-Aug-2015

465 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Introducing Azure SQL Data Warehouse

Introducing Azure SQL Data WarehouseJames SerraBig Data EvangelistMicrosoft

Page 2: Introducing Azure SQL Data Warehouse

Parallel Data Warehouse

v1

Data Allegro product on Windows &

SQL. First DW appliance by

MSFT in partnership

with Dell and HP

Microsoft Acquired

Data Allegro

Company viewed as

most efficient way to bring MPP

to SQL Server world

Analytics Platform

System (APS)

Introduction of Hadoop

region within appliance and new

naming to reflect

broader Big Data

capabilities

SQL DW Service

Introduction of Azure SQL DW Service based on APS’s MPP capabilities

Fast Track Data

Warehouse Launch

DW Reference

Architectures based on

SMP DW best practices

offered with leading H/W

Partners

Parallel Data Warehouse

v2

Re-architected

Product delivering new form

factors and greatly

improved price/perfor

mance.

Microsoft & Data Warehouse

2008 20132010 201520142011

Page 3: Introducing Azure SQL Data Warehouse

5

Data sources

OLTP ERP CRM LOB

ETL

Data warehouse

BI and analytics

Increasing data volumes

1

Non-relational data

Devices

Web Sensors

Social

New data sources and types

2

Cloud-born data

3

People intensive

44

The traditional data warehouse

Page 4: Introducing Azure SQL Data Warehouse

INFRASTRUCTURE

DATA MANAGEMENT & PROCESSING

DATA ENRICHMENT AND FEDERATED QUERY

BI & ANALYTICS

Self-service CollaborationCorporate PredictiveMobile

Extract, transform, load

Single query model Data quality Master data

management

Non-relationalRelational Analytical Streaming Internal & External

Data sources

OLTP ERP CRM LOB

Non-relational data

Devices

Web Sensors

Social

Modern data warehouse defined

Page 5: Introducing Azure SQL Data Warehouse

Parallelism

• Uses many separate CPUs running in parallel to execute a single program

• Shared Nothing: Each CPU has its own memory and disk (scale-out)

• Segments communicate using high-speed network between nodes

MPP - Massively Parallel

Processing

• Multiple CPUs used to complete individual processes simultaneously

• All CPUs share the same memory, disks, and network controllers (scale-up)

• All SQL Server implementations up until now have been SMP• Mostly, the solution is housed on a shared SAN

SMP - Symmetric

Multiprocessing

Page 6: Introducing Azure SQL Data Warehouse

SQL DW Logical Architecture (overview)

“Compute” nodeBalanced storage

SQL

“Compute” nodeBalanced storage

SQL

“Compute” nodeBalanced storage

SQL

“Compute” nodeBalanced storage

SQL

DMS

DMS

DMS

DMS

Compute Node – the “worker bee” of SQL DW• Runs Azure SQL Server DB• Contains a “slice” of each database• CPU is saturated by storage

Control Node – the “brains” of the SQL DW• Also runs Azure SQL Server DB• Holds a “shell” copy of each database

• Metadata, statistics, etc• The “public face” of the appliance

Data Movement Services (DMS)• Part of the “secret sauce” of SQL DW• Moves data around as needed• Enables parallel operations among the

compute nodes (queries, loads, etc)

“Control” nodeSQL

DMS

Page 7: Introducing Azure SQL Data Warehouse

SQL DW Logical Architecture (overview)

“Compute” nodeBalanced storage

SQL“Control” nodeSQL

“Compute” nodeBalanced storage

SQL

“Compute” nodeBalanced storage

SQL

“Compute” nodeBalanced storage

SQL

DMS

DMS

DMS

DMS

DMS

1) User connects to the appliance (control node) and submits query

2) Control node query processor determines best *parallel* query plan

3) DMS distributes sub-queries to each compute node

4) Each compute node executes query on its subset of data

5) Each compute node returns a subset of the response to the control node

6) If necessary, control node does any final aggregation/computation

7) Control node returns results to user

Queries running in parallel on a subset of the data, using separate pipes effectively making the pipe larger

Page 8: Introducing Azure SQL Data Warehouse

Introducing Azure SQL DW ServiceA relational data warehouse-as-a-service, fully managed by Microsoft. Industries first elastic cloud data warehouse with enterprise-grade capabilities.Support your smallest to your largest data storage needs while handling queries up to 100x faster.

Market Leading Price & Performance

Simple billing compute & storage

Pay for what you need, when you need it with dynamic pause

Bring DW to the Cloud without rewriting

Elastic scale & performance

Scales to petabytes of data

Massively Parallel Processing

Instant-on compute scales in seconds

Query Relational / Non-Relational

Get started in minutes

Integrated with Azure ML, PowerBI & ADF

Enterprise Ready

Powered by the Cloud

Page 9: Introducing Azure SQL Data Warehouse

Rapidly deploy and gain insights

Provision Load Query

SQL DW clusterin minutes

Ship DisksAzure StorageHDInsightMigration Tools

All data typesAnalytics withPower BI + ML

Automate workflow via Azure Data Factory

Page 10: Introducing Azure SQL Data Warehouse

Real-Time Elasticity

Resize in <1 Minute

On-Demand Compute

Spin up for heavy workloads, cycle down for daily activity.

Buy time to insight based on what you need, when you need it.

Choose the combo of compute and storage that meets your needs.

From Any Size to Any Size

Page 11: Introducing Azure SQL Data Warehouse

When Paused, Pay only for Storage

Use it only when you need it – no reloading / restoring of data

Save Costs with Dynamic Pause and Resume• When paused, cloud-scale storage is min

cost.• Policy-based (i.e. Nights/weekends)• Automate via PowerShell/REST API• Data remains in place

Page 12: Introducing Azure SQL Data Warehouse

SQL DW: Building on SQL DB Foundation

Elastic, Petabyte Scale DW Optimized

99.99% uptime SLA, Geo-restore

Azure Compliance (ISO, HIPAA, EU, etc.)

True SQL Server Experience;Existing Tools Just Work

SQL DW

SQL DB

Service Tiers

Page 13: Introducing Azure SQL Data Warehouse

Data Warehouse Unit (DWU)Simply buy the query performance you need, not just hardware

Quantified by workload objectives: how fast rows are scanned,

loaded, copied

Measure of Power

Transparency

First DW service to offer compute power on demand, independent

of storageOn Demand

Scan Rate 3.36M row/sec

Loading Rate 130K row/sec

Table Copy Rate

350K row/sec

100 DWU*

Scan 1B rows*

100 DWU = 297 sec

400 DWU = 74 sec

800 DWU = 37 sec

1,600 DWU = 19 sec

* Preliminary estimates; actual results may change

Page 14: Introducing Azure SQL Data Warehouse

What is Hadoop?

Microsoft Confidential

19

Distributed, scalable system on commodity HW

Composed of a few parts:

HDFS – Distributed file system

MapReduce – Programming model

Other tools: Hive, Pig, SQOOP, HCatalog, HBase, Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie, ZooKeeper, Flume, Storm

Main players are Hortonworks, Cloudera, MapR

WARNING: Hadoop, while ideal for processing huge volumes of data, is inadequate for analyzing that data in real time (companies do batch analytics instead)

Core Services

OPERATIONAL SERVICES

DATASERVICES

HDFS

SQOOP

FLUME

NFS

LOAD & EXTRACT

WebHDFS

OOZIE

AMBARI

YARN

MAP REDUCE

HIVE &HCATALOGPIG

HBASEFALCON

Hadoop Cluster

compute&

storage . . .

. . .

. .compute

&storage

.

.

Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware

Page 15: Introducing Azure SQL Data Warehouse

Query unstructured data via Polybase/T-SQL

• Enables query capabilities across common Hadoop distributions (HDP & Cloudera) and Hadoop file formats in Azure storage.

• Allows leveraging existing SQL skills and BI tools

• Supports multiple non-relational file formats

• Improved time-to-insights & simplified ETL

Polybase for querying & managing non-relational Hadoop and relational data

SQL DW Instance

Scale out compute

Hadoop VMs /Azure StoragePolyBa

se

Page 16: Introducing Azure SQL Data Warehouse

Use cases where PolyBase simplifies using Hadoop dataBringing islands of Hadoop data together

High performance queries against Hadoop data(Predicate pushdown)

Archiving data warehouse data to Hadoop (move)(Hadoop as cold storage)

Exporting relational data to Hadoop (copy)(Hadoop as backup, analysis, on-

prem use)Importing Hadoop data into data warehouse (copy)

(Hadoop as staging area, sandbox, Data Lake)

Page 17: Introducing Azure SQL Data Warehouse

Business Services Company Before/After resultsSMP vs APS

54ximprovement loading data(48 hours vs 53 minutes)

1.4 TB/hr loading time (7 billion rows) (1.21TB in 53:20)25x, 193x,

query performance improvement (4 days and 6 hours vs 32 minutes)

Microsoft BI tools work unchanged

Double DWU’s will double performance

9.4x compression (7 billion rows) (1.7TB to 179GB)

26

Page 18: Introducing Azure SQL Data Warehouse

Stream Analytics

TransformIngest

Example overall data flow and Architecture

Web logs

Present & decide

IoT, Mobile Devices etc.

Social Data

Event Hubs HDInsight

Azure Data Factory

Azure SQL DB

Azure Blob Storage

Azure Machine Learning

(Fraud detection etc.)

Power BI

Web dashboards

Mobile devices

DW / Long-term storage

Predictive analytics

Event & data producers

Azure SQL DW

Page 19: Introducing Azure SQL Data Warehouse

Easily bring your DW to the cloud

Import/ Export

Migration Accelerator

ExpressRoute

Seamless migration from a variety of on-premises and cloud sources

Quick, consistent, and stable loading for migration

Integrated migration tools and support for all workflows

Ship massive volumes of data on physical media.

Make your migration to Azure’s simple and fully managed

Transfer with private, expedited connections to Azure.

Page 20: Introducing Azure SQL Data Warehouse

Broad SQL Server PartnerEcosystem

+ Leverage Azure ML, HDInsight, PowerBI, ADF, and more.

+ Industry’s broadest ecosystem of DW partners, including Tableau, Informatica, Attunity, and SAP.

Streamlined deployment with Azure Portal.

Deep tool integration with top partners including:

• Single-click configuration

• Optimized data movement

• Logical pushdown

Azure SQL DW

Azure ML

Azure Event Hub

Azure Stream Analytics

Azure HDInsight

Power BI

Microsoft

Page 21: Introducing Azure SQL Data Warehouse

Market-Leading Price/Performance • Best On-Demand

Price/Performance

‐ Advantages in elasticity and pause to reduce customer cost

• SQL DW start small, can grow to PB+

• Pay for performance by scaling compute against storage

100GB 1TB 2TB

Azure SQL DW

Redshift

1+PB

SQL DW

High Pe

rf

SQL DW High Capacity

SQL DW Balanced

Perf

orm

ance

Redshift Dense Compute

Redshift Dense Storage

Page 22: Introducing Azure SQL Data Warehouse

How does SQL DW differ from Redshift?

Hour to days to resize; read-only with

perf degradationElasticity True grow, shrink, or pause with

minimal downtime.

Amazon Redshift Azure SQL DW

No

Pause/Resume

Yes!

Fixed compute/storage ratioSimplicityPay for the performance you need

with independently scalable compute

and storage.

Hybrid No. AWS only. Yes. Azure and on-premises.

No support for indexes, stored procs,

SQL UDFs, partitioning, constraints

Compatibility

True SQL Support.

Page 23: Introducing Azure SQL Data Warehouse

Automatic Backup and Geo-Restore

Auto backups, every 4 hours, in Azure Storage (“disaster recovery”) and geo-replicated (“high availability”)

On-demand backups in Azure Storage, user can enable geo-replication

REST API, PowerShell or Azure Portal

Scheduled exports for long-term retention

Near-Online backup/restore based on storage snapshots

Backups retention policy:

• Auto backups, up to 35 days

• On-demand backups retained indefinitely

Recover from data deletion or alteration or disaster

Geo- replicated

Restore from backup

SQL Data Warehouse Backupssabcp01bl21

Azure Storagesabcp01bl21

Page 24: Introducing Azure SQL Data Warehouse
Page 25: Introducing Azure SQL Data Warehouse

Timeline

Feb/Mar CY15Private Preview PolyBase for Big Data

integration Enterprise-Ready TSQL First Wave of SQL Partner

Certifications

Private Preview

Public Preview GA

Summer 2015Public Preview• Dynamic Pause and Resume• Integration with Azure Service

Platform (CloudML, ADF, HDInsight, SQL-IP)

• Integration with Broad Ecosystem of SQL Partners

• Geo-restore• Service/Appliance Hybrid

Late 2015/Early 2016Compliance• ISO-Certified, PCI• HIPAA, BAA, & EU Model

Clause

Service Level Agreement• 99.99% SLA

Microsoft Confidential—Preliminary Information. Dates and capabilities subject to change. Microsoft makes no warranties, express or implied.

Page 26: Introducing Azure SQL Data Warehouse

Summary: Azure SQL DW Service

A relational data warehouse-as-a-service, fully managed by Microsoft.

Industries first elastic cloud data warehouse with enterprise-grade capabilities.

Support your smallest to your largest data storage needs while handling queries up to 100x faster.

Elastic scale & performance

Powered by the Cloud

Market Leading Price &

Performance

Scales to petabytes of data

Massively Parallel Processing

Instant-on compute scales in seconds

Query Relational / Non-Relational

Get started in minutes

Integrated with Azure ML, PowerBI & ADF

Enterprise Ready

Simple billing compute & storage

Pay for what you need, when you need it with dynamic pause

Bring DW to the Cloud without rewriting

Page 27: Introducing Azure SQL Data Warehouse

Questions?

James [email protected]