introducing azure sql data warehouse
Post on 06-Aug-2015
465 Views
Preview:
TRANSCRIPT
Introducing Azure SQL Data WarehouseJames SerraBig Data EvangelistMicrosoft
Parallel Data Warehouse
v1
Data Allegro product on Windows &
SQL. First DW appliance by
MSFT in partnership
with Dell and HP
Microsoft Acquired
Data Allegro
Company viewed as
most efficient way to bring MPP
to SQL Server world
Analytics Platform
System (APS)
Introduction of Hadoop
region within appliance and new
naming to reflect
broader Big Data
capabilities
SQL DW Service
Introduction of Azure SQL DW Service based on APS’s MPP capabilities
Fast Track Data
Warehouse Launch
DW Reference
Architectures based on
SMP DW best practices
offered with leading H/W
Partners
Parallel Data Warehouse
v2
Re-architected
Product delivering new form
factors and greatly
improved price/perfor
mance.
Microsoft & Data Warehouse
2008 20132010 201520142011
5
Data sources
OLTP ERP CRM LOB
ETL
Data warehouse
BI and analytics
Increasing data volumes
1
Non-relational data
Devices
Web Sensors
Social
New data sources and types
2
Cloud-born data
3
People intensive
44
The traditional data warehouse
INFRASTRUCTURE
DATA MANAGEMENT & PROCESSING
DATA ENRICHMENT AND FEDERATED QUERY
BI & ANALYTICS
Self-service CollaborationCorporate PredictiveMobile
Extract, transform, load
Single query model Data quality Master data
management
Non-relationalRelational Analytical Streaming Internal & External
Data sources
OLTP ERP CRM LOB
Non-relational data
Devices
Web Sensors
Social
Modern data warehouse defined
Parallelism
• Uses many separate CPUs running in parallel to execute a single program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
• Segments communicate using high-speed network between nodes
MPP - Massively Parallel
Processing
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)
• All SQL Server implementations up until now have been SMP• Mostly, the solution is housed on a shared SAN
SMP - Symmetric
Multiprocessing
SQL DW Logical Architecture (overview)
“Compute” nodeBalanced storage
SQL
“Compute” nodeBalanced storage
SQL
“Compute” nodeBalanced storage
SQL
“Compute” nodeBalanced storage
SQL
DMS
DMS
DMS
DMS
Compute Node – the “worker bee” of SQL DW• Runs Azure SQL Server DB• Contains a “slice” of each database• CPU is saturated by storage
Control Node – the “brains” of the SQL DW• Also runs Azure SQL Server DB• Holds a “shell” copy of each database
• Metadata, statistics, etc• The “public face” of the appliance
Data Movement Services (DMS)• Part of the “secret sauce” of SQL DW• Moves data around as needed• Enables parallel operations among the
compute nodes (queries, loads, etc)
“Control” nodeSQL
DMS
SQL DW Logical Architecture (overview)
“Compute” nodeBalanced storage
SQL“Control” nodeSQL
“Compute” nodeBalanced storage
SQL
“Compute” nodeBalanced storage
SQL
“Compute” nodeBalanced storage
SQL
DMS
DMS
DMS
DMS
DMS
1) User connects to the appliance (control node) and submits query
2) Control node query processor determines best *parallel* query plan
3) DMS distributes sub-queries to each compute node
4) Each compute node executes query on its subset of data
5) Each compute node returns a subset of the response to the control node
6) If necessary, control node does any final aggregation/computation
7) Control node returns results to user
Queries running in parallel on a subset of the data, using separate pipes effectively making the pipe larger
Introducing Azure SQL DW ServiceA relational data warehouse-as-a-service, fully managed by Microsoft. Industries first elastic cloud data warehouse with enterprise-grade capabilities.Support your smallest to your largest data storage needs while handling queries up to 100x faster.
Market Leading Price & Performance
Simple billing compute & storage
Pay for what you need, when you need it with dynamic pause
Bring DW to the Cloud without rewriting
Elastic scale & performance
Scales to petabytes of data
Massively Parallel Processing
Instant-on compute scales in seconds
Query Relational / Non-Relational
Get started in minutes
Integrated with Azure ML, PowerBI & ADF
Enterprise Ready
Powered by the Cloud
Rapidly deploy and gain insights
Provision Load Query
SQL DW clusterin minutes
Ship DisksAzure StorageHDInsightMigration Tools
All data typesAnalytics withPower BI + ML
Automate workflow via Azure Data Factory
Real-Time Elasticity
Resize in <1 Minute
On-Demand Compute
Spin up for heavy workloads, cycle down for daily activity.
Buy time to insight based on what you need, when you need it.
Choose the combo of compute and storage that meets your needs.
From Any Size to Any Size
When Paused, Pay only for Storage
Use it only when you need it – no reloading / restoring of data
Save Costs with Dynamic Pause and Resume• When paused, cloud-scale storage is min
cost.• Policy-based (i.e. Nights/weekends)• Automate via PowerShell/REST API• Data remains in place
SQL DW: Building on SQL DB Foundation
Elastic, Petabyte Scale DW Optimized
99.99% uptime SLA, Geo-restore
Azure Compliance (ISO, HIPAA, EU, etc.)
True SQL Server Experience;Existing Tools Just Work
SQL DW
SQL DB
Service Tiers
Data Warehouse Unit (DWU)Simply buy the query performance you need, not just hardware
Quantified by workload objectives: how fast rows are scanned,
loaded, copied
Measure of Power
Transparency
First DW service to offer compute power on demand, independent
of storageOn Demand
Scan Rate 3.36M row/sec
Loading Rate 130K row/sec
Table Copy Rate
350K row/sec
100 DWU*
Scan 1B rows*
100 DWU = 297 sec
400 DWU = 74 sec
800 DWU = 37 sec
1,600 DWU = 19 sec
* Preliminary estimates; actual results may change
What is Hadoop?
Microsoft Confidential
19
Distributed, scalable system on commodity HW
Composed of a few parts:
HDFS – Distributed file system
MapReduce – Programming model
Other tools: Hive, Pig, SQOOP, HCatalog, HBase, Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie, ZooKeeper, Flume, Storm
Main players are Hortonworks, Cloudera, MapR
WARNING: Hadoop, while ideal for processing huge volumes of data, is inadequate for analyzing that data in real time (companies do batch analytics instead)
Core Services
OPERATIONAL SERVICES
DATASERVICES
HDFS
SQOOP
FLUME
NFS
LOAD & EXTRACT
WebHDFS
OOZIE
AMBARI
YARN
MAP REDUCE
HIVE &HCATALOGPIG
HBASEFALCON
Hadoop Cluster
compute&
storage . . .
. . .
. .compute
&storage
.
.
Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
Query unstructured data via Polybase/T-SQL
• Enables query capabilities across common Hadoop distributions (HDP & Cloudera) and Hadoop file formats in Azure storage.
• Allows leveraging existing SQL skills and BI tools
• Supports multiple non-relational file formats
• Improved time-to-insights & simplified ETL
Polybase for querying & managing non-relational Hadoop and relational data
SQL DW Instance
Scale out compute
Hadoop VMs /Azure StoragePolyBa
se
Use cases where PolyBase simplifies using Hadoop dataBringing islands of Hadoop data together
High performance queries against Hadoop data(Predicate pushdown)
Archiving data warehouse data to Hadoop (move)(Hadoop as cold storage)
Exporting relational data to Hadoop (copy)(Hadoop as backup, analysis, on-
prem use)Importing Hadoop data into data warehouse (copy)
(Hadoop as staging area, sandbox, Data Lake)
Business Services Company Before/After resultsSMP vs APS
54ximprovement loading data(48 hours vs 53 minutes)
1.4 TB/hr loading time (7 billion rows) (1.21TB in 53:20)25x, 193x,
query performance improvement (4 days and 6 hours vs 32 minutes)
Microsoft BI tools work unchanged
Double DWU’s will double performance
9.4x compression (7 billion rows) (1.7TB to 179GB)
26
Stream Analytics
TransformIngest
Example overall data flow and Architecture
Web logs
Present & decide
IoT, Mobile Devices etc.
Social Data
Event Hubs HDInsight
Azure Data Factory
Azure SQL DB
Azure Blob Storage
Azure Machine Learning
(Fraud detection etc.)
Power BI
Web dashboards
Mobile devices
DW / Long-term storage
Predictive analytics
Event & data producers
Azure SQL DW
Easily bring your DW to the cloud
Import/ Export
Migration Accelerator
ExpressRoute
Seamless migration from a variety of on-premises and cloud sources
Quick, consistent, and stable loading for migration
Integrated migration tools and support for all workflows
Ship massive volumes of data on physical media.
Make your migration to Azure’s simple and fully managed
Transfer with private, expedited connections to Azure.
Broad SQL Server PartnerEcosystem
+ Leverage Azure ML, HDInsight, PowerBI, ADF, and more.
+ Industry’s broadest ecosystem of DW partners, including Tableau, Informatica, Attunity, and SAP.
Streamlined deployment with Azure Portal.
Deep tool integration with top partners including:
• Single-click configuration
• Optimized data movement
• Logical pushdown
Azure SQL DW
Azure ML
Azure Event Hub
Azure Stream Analytics
Azure HDInsight
Power BI
Microsoft
Market-Leading Price/Performance • Best On-Demand
Price/Performance
‐ Advantages in elasticity and pause to reduce customer cost
• SQL DW start small, can grow to PB+
• Pay for performance by scaling compute against storage
100GB 1TB 2TB
Azure SQL DW
Redshift
1+PB
SQL DW
High Pe
rf
SQL DW High Capacity
SQL DW Balanced
Perf
orm
ance
Redshift Dense Compute
Redshift Dense Storage
How does SQL DW differ from Redshift?
Hour to days to resize; read-only with
perf degradationElasticity True grow, shrink, or pause with
minimal downtime.
Amazon Redshift Azure SQL DW
No
Pause/Resume
Yes!
Fixed compute/storage ratioSimplicityPay for the performance you need
with independently scalable compute
and storage.
Hybrid No. AWS only. Yes. Azure and on-premises.
No support for indexes, stored procs,
SQL UDFs, partitioning, constraints
Compatibility
True SQL Support.
Automatic Backup and Geo-Restore
Auto backups, every 4 hours, in Azure Storage (“disaster recovery”) and geo-replicated (“high availability”)
On-demand backups in Azure Storage, user can enable geo-replication
REST API, PowerShell or Azure Portal
Scheduled exports for long-term retention
Near-Online backup/restore based on storage snapshots
Backups retention policy:
• Auto backups, up to 35 days
• On-demand backups retained indefinitely
Recover from data deletion or alteration or disaster
Geo- replicated
Restore from backup
SQL Data Warehouse Backupssabcp01bl21
Azure Storagesabcp01bl21
Timeline
Feb/Mar CY15Private Preview PolyBase for Big Data
integration Enterprise-Ready TSQL First Wave of SQL Partner
Certifications
Private Preview
Public Preview GA
Summer 2015Public Preview• Dynamic Pause and Resume• Integration with Azure Service
Platform (CloudML, ADF, HDInsight, SQL-IP)
• Integration with Broad Ecosystem of SQL Partners
• Geo-restore• Service/Appliance Hybrid
Late 2015/Early 2016Compliance• ISO-Certified, PCI• HIPAA, BAA, & EU Model
Clause
Service Level Agreement• 99.99% SLA
Microsoft Confidential—Preliminary Information. Dates and capabilities subject to change. Microsoft makes no warranties, express or implied.
Summary: Azure SQL DW Service
A relational data warehouse-as-a-service, fully managed by Microsoft.
Industries first elastic cloud data warehouse with enterprise-grade capabilities.
Support your smallest to your largest data storage needs while handling queries up to 100x faster.
Elastic scale & performance
Powered by the Cloud
Market Leading Price &
Performance
Scales to petabytes of data
Massively Parallel Processing
Instant-on compute scales in seconds
Query Relational / Non-Relational
Get started in minutes
Integrated with Azure ML, PowerBI & ADF
Enterprise Ready
Simple billing compute & storage
Pay for what you need, when you need it with dynamic pause
Bring DW to the Cloud without rewriting
Questions?
James Serrajserra@microsoft.com
top related