azure sql dwh - meetupfiles.meetup.com/19858010/08_2016-azure sql dwh - data platform meetup.pdfwhat...

50
Azure SQL DWH Big data-as-a-service by

Upload: others

Post on 12-Jan-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Azure SQL DWH

Big data-as-a-service by

Page 2: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

About me

4/4/2016 Azure SQL DWH

Shy Engelberg, CTO @

Email : [email protected]

Phone : 054-771-711-5

Twitter : @ShyEngelberg

2 |

Page 3: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Agenda

4/4/2016 Azure SQL DWH

SQL DWH introduction

Architecture

Creating a DWH

Loading data

Tools

Summary

3 |

Page 4: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Objectives

4/4/2016 Azure SQL DWH

Know what Azure SQL Data warehouse is

Know how Azure SQL Data warehouse works

Know how to create and connect to Azure

SQL Data warehouse

Know the basics tools and methods to get

started with developing

Identify scenarios that this solution might suit.

4 |

Page 5: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

The data warehouse fairytale

Once upon a time, data warehouse was an

appliance who required fixed combinations of

storage and compute,

often underutilizing

expensive resources.

Meaning

monstrous

hardware was

lying unused.

Page 6: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

What is Azure SQL Data warehouse

4/4/2016 Azure SQL DWH6 |

an enterprise-class, distributed database

capable of processing massive volumes of

relational and non-relational data.

It is the industry's first cloud data warehouse

that combines proven SQL capabilities with

the ability to grow, shrink, and pause in

seconds.

Now on GA! (July 14th)

Page 7: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

What is Azure SQL Data warehouse

4/4/2016 Azure SQL DWH7 |

an enterprise-class,

capable of of

and data.

It is the industry's first data warehouse

that combines proven with

the ability to in

seconds.

Page 8: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

What is Azure SQL Data warehouse

4/4/2016 Azure SQL DWH8 |

– Azure PaaS

– an MPP

– up to PBs

– a relational DB that can

query also non-relational data

– based on the product we know and

love

– use what you need, when

you need it.

Page 9: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

What is Azure SQL Data warehouse

4/4/2016 Azure SQL DWH9 |

Easily deploys in seconds.

Pay for query performance only when you

need it (or you can pause it completely)

Fully managed service, removes the hassle

of software patching, maintenance, back-ups.

Page 10: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

What is Azure SQL Data warehouse

4/4/2016 Azure SQL DWH10 |

SQL Data Warehouse uses Microsoft’s

massively parallel processing (MPP)

architecture. You pay for time-to-insight, not

hardware. (details are a few slides ahead)

Page 11: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

What is Azure SQL Data warehouse

4/4/2016 Azure SQL DWH11 |

Using PolyBase, leverage Transact-SQL to

query seamlessly across both relational data in

a relational database and non-relational data in

common Hadoop formats.

Page 12: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

What is Azure SQL Data warehouse

4/4/2016 Azure SQL DWH12 |

SQL Data Warehouse is based on the proven

relational database engine of SQL Server and

includes the features you expect, including

stored procedures, UDF’s, partitioning, indexes,

and collations.

If you already know Transact-SQL, its easy to

transfer your knowledge to SQL Data

Warehouse.

Page 13: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

What is Azure SQL Data warehouse

4/4/2016 Azure SQL DWH13 |

You can grow or shrink compute power in

minutes. Take full advantage of storage at

cloud scale, and apply query compute based

on changing performance needs.

When compute is paused, you pay only for

storage.

Page 14: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture

4/4/2016 Azure SQL DWH14 |

At its core, SQL Data Warehouse uses

Microsoft’s massive parallel processing

(MPP) architecture, originally designed to run

some of the largest on-premises enterprise

data warehouses.

Page 15: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture

4/4/2016 Azure SQL DWH15 |

At its core, SQL Data Warehouse uses

Microsoft’s

architecture, originally designed to run

some of the largest on-premises enterprise

data warehouses.

Page 16: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – MPP

The coordinated processing of a program by

multiple processors working on different parts

of the program.

Each processor has its own operating system

and memory.

Page 17: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

The MPP way

Mission-

process

a lot of

data

The SMP way

Page 18: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Scalefor better

performance

The SMP way The MPP way

Page 19: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – MPP

Page 20: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – MPP

Breaks a large queries across nodes for simultaneous processing.

Every node is “working” on a local subset of the data.

Capable of higher data ingestion rates through parallelization.

Scale horizontally by adding nodes, rather than moving to a server with more CPUs or higher storage capacity.

Unlike SMP – there is no single bottleneck.

Page 21: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – Azure SQL Data warehouse

SQL Data Warehouse independently scales

compute and storage.

This concept is what allows us the ability to

pause compute, scale performance in

seconds, and pay only for the performance

we need.

Page 22: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – Azure SQL Data warehouse

SQL Data Warehouse

.

This concept is what allows us the ability to

, and

we need.

Page 23: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – Azure SQL Data warehouse

Data management service

Control node (MPP engine)

Control DBs TempDBMaster

SQL Server

Azure blob storage

Data management service

Compute node 2

User Data

SQL

Server

Data management service

Compute node 1

User Data

SQL

Server

Page 24: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – Azure SQL Data warehouse

Data management service

Control node (MPP engine)

Control DBs TempDBMaster

SQL Server

• “Controls" the system.

• It is the front end that interacts with all

applications and connections.

• powered by SQL Database, and

connecting to it looks and feels the

same.

• Under the surface, the Control node

coordinates all of the data movement

and computation required to run parallel

queries on your distributed data.

• When you submit a TSQL query to SQL

Data Warehouse, the Control node

transforms it into separate queries that

will run on each Compute node in

parallel.

Page 25: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – Azure SQL Data warehouse

Data management service

Control node (MPP engine)

Control DBs TempDBMaster

SQL Server

Azure blob storage

Data management service

Compute node 2

User Data

SQL

Server

Data management service

Compute node 1

User Data

SQL

Server

Page 26: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – Azure SQL Data warehouse

User Data

SQL

Server

Data management service

Compute node 1

User Data

SQL

Server

SQL Databases which process your query steps and manage your data.

The Compute nodes are the workers that run the parallel queries on your data.

After processing, they pass the results back to the Control node.

To finish the

query, the

Control node

aggregates the

results and

returns the

final result.

Page 27: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – Azure SQL Data warehouse

Data management service

Control node (MPP engine)

Control DBs TempDBMaster

SQL Server

Azure blob storage

Data management service

Compute node 2

User Data

SQL

Server

Data management service

Compute node 1

User Data

SQL

Server

Page 28: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – Azure SQL Data warehouse

Data Movement Service (DMS)

is our technology for moving

data between the nodes.

DMS gives the Compute nodes

access to data they need for

joins and aggregations.

DMS is not an Azure service. It

is a Windows service that runs

alongside SQL Database on all

the nodes.

Data management service

Page 29: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – Azure SQL Data warehouse

Data management service

Control node (MPP engine)

Control DBs TempDBMaster

SQL Server

Azure blob storage

Data management service

Compute node 2

User Data

SQL

Server

Data management service

Compute node 1

User Data

SQL

Server

Page 30: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – Azure SQL Data warehouse

Azure blob storage Data is stored in Azure Storage Blobs.

When Compute nodes interact with data, they write and read directly to and from blob storage.

Since Azure storage expands transparently and limitlessly, SQL Data Warehouse can do the same.

Since compute and storage are independent, SQL Data Warehouse can automatically scale storage separately from scaling compute, and vice-versa. Azure Storage is fully fault tolerant.

Page 31: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – scaling

Since each compute node only works on a

subset of the data, if we want to scale, all we

need to do is add more compute nodes.

The “Magic” is that we can add more

compute nodes without moving

(redistributing) the data.

The scaling takes only a couple of minutes

(initializing the compute node)

Page 32: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – scaling

Changing the amount of

compute is as simple as

moving a slider to the

left or right, but can also be

scheduled using T-SQL or PShell.

Compute usage in SQL Data

Warehouse is measured

using SQL Data Warehouse Units (DWUs).

Page 33: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – data distribution

All tables are distributed.

Each distribution is like a bucket; storing a unique subset of the data.

For now, SQL DW has 60 distributions.

Each table is divided into 60 different distributions, from the moment it’s created.

When there’s only one compute node, it holds all distributions, when there’s more, the distributions are spread among them.

Page 34: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Architecture – finally

Bring all the data you want, pay only for the storage.

If you want to query your data (dahhh), pay only for the compute you need, when you need.

Classic MPP design, scaling is almost linear.

We don’t really need to know how many nodes or distributes are there under the cover – it’s a PaaS, we are guaranteed a certain amount of performance.

Page 35: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Creating a DWH

Creating is simple as 1,2,3…

DWH is defined in a “Server”

just like Azure SQL database.

Pause and scale are a button away.

DEMO

Page 36: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Creating a table – distribution

When a table is created, it is spread across

all of the distributions.

We need to choose how to distribute the

data:

Hash

Round-robin

(evenly but randomly,

default behavior)

Page 37: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Creating a table – type

The default behavior is that a table is created

with a clustered column store index.

(which makes Azure SQL DWH, a columnar

database)

Choose, what type of table during creation:

Page 38: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Creating a table – statistics

Statistics are not created automatically,

we have to create them ourselves!

Statistics are not updated automatically!

Page 39: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Connecting and creating a table

DEMO Add firewall rule

Connect using SSMS

Create a table

Create statistics

Page 40: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Loading data

SQL Data Warehouse supports many loading

methods:

SSIS

BCP

SQLBulkCopy API

Azure Data Factory

PolyBase

The “Push” methods – a query

that goes through the “control

node”, which becomes a

bottleneck. (single-client gated)

by far the fastest and most

scalable SQL Data Warehouse

loading method to date

Page 41: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Loading data – PolyBase

PolyBase is a scalable, query processing framework compatible with Transact-SQL that can be used to combine and bridge data across relational database management systems and Azure Blob Storage.

Currently PolyBase can load data from UTF-8encoded delimited text files as well as the popular Hadoop file formats RC File, ORC, and Parquet. PolyBase can load data from gzip, zlib and Snappy compressed files.

Page 42: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

A “Pull” method.

Every compute node,

has an HDFS bridge

in the DMS service.

Every bridge can

parallel connect to

external resources.

Loading data – PolyBase

Page 43: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

PolyBase data loading is not limited by the

Control node, and so as you scale out your

DWU, your data transfer throughput also

increases.

A recommended way of loading data:

1. write your source to CSV files

2. put the files in Azure Blob Storage

3. Load using PolyBase

Loading data – PolyBase

Page 44: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Loading data – PolyBase

DEMO Query using PolyBase

CTAS to load data using PolyBase

Page 45: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

This is SQL Server inside – you know it all.

Uses almost the same T-SQL.

Supports views, stored procedures, partitions

and many other known and loved features.

Built-in HADR (it’s a PaaS, remember?)

Out-of-the-box backup and restore service.

Azure SQL Data warehouse – features

Page 46: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

SSMS is not yet fully supported (but very

soon) – SSMS is supported (July 14th)!

Visual studio (SSDT) is supported

Integrates easily to Azure ML, PowerBI and

Data Factory.

Many 3rd party solutions are available:

Azure SQL Data warehouse – tools

Page 47: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Best used for data processing scenarios, not

as a data store all users can query, because:

Concurrency is very limited (like all MPPs)

Use to load data, process it, and move it to

the next step or for individual queries that

needs massive amount of processing.

Azure SQL Data warehouse – usage

Page 48: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Azure SQL Data warehouse – Summary

A relational columnar DWH.

MPP service that allows us to scale compute in

separate from storage.

We can pause the compute whenever needed.

Using the infinite power of the cloud, we can

process as much data as we want, and use as

much power as we want.

We don’t need to buy expensive hardware.

Page 49: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

Azure SQL Data warehouse – Summary

Best for:

Your data is already in Azure.

You need scheduled computing power for

data processing.

You have a lot of data, but don’t want to

spend a lot of money.

Not for concurrent querying!

Page 50: Azure SQL DWH - Meetupfiles.meetup.com/19858010/08_2016-Azure SQL DWH - data platform meetup.pdfWhat is Azure SQL Data warehouse 6 | 4/4/2016 Azure SQL DWH an enterprise-class, distributed

THANK YOU!

Questions?