user group bi

Franck Sidi Lead SQL Server & Bi – Microsoft Israel

SQL Server Parallel DWH Architecture “Aka : Madison”

Trusted, Scalable Platform Our scalability strategy

“Madison” in 2010 Q1

Agenda

Concepts and Principles

Reference Architectures “FastTrack”

Madison functional overview

Early adoption

Symmetric Multiprocessing

Single DB instance

“Shared Everything” Architecture

Server/CPU’s share

memory

disks

Can lead to resource contention as you scale

SMP

Massively Parallel Processing

Server/CPU’s have their own dedicated resources

“Shared Nothing” Architecture

“Secret Sauce” is parallelizing operations

Lightning-fast Queries, Data Loads and Updates

Linear Scalability

Problem needs to be partitionable

MPP

SMP vs MPP

SMP

HW advancements increasing ability to scale-up

Scaling is limited

High end SMP very expensive

Extremely high concurrency for some workloads

Less than 1-2 TB of data SMP will almost always be better

Full SQL Server functionality

HA must be architected in

MPP

HW advancements increasing ability to scale-up & scale-out

Scaling to 1 PB+

Scale out is relatively low cost

Relatively high concurrency for complex workloads

> 2 TB up to 1 PB

Limited SQL Server functionality

HA is built in

Agenda




Early adoption

How some solve the problem today

Big SAN

Biggest 64-core Server

Connected together!

What’s wrong with

this picture???

System out of balance

This server can consume 16 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec

Even when the SAN is dedicated to the SQL Data Warehouse, which it often isn’t

Lots of disks for Random IOPS BUT

Limited controllers Limited IO bandwidth

System is typically IO bound and queries are slow

Despite significant investment in both Server and Storage

Where Does an I/O Go?

Understand potential throughput of the hardware

Each component in the path has associated speed/bandwidth

Know where the potential bottlenecks exist

Cach

e

Fron

t End

Po

rts

Co

ntro

llers/Pro

cessors

Switch

Host

Switch

PCI Bus HBA Fiber Channel Ports Array Processors Disks

Potential Performance Bottlenecks

FC

HBA

A

B

FC

HBA

A

B FC

SW

ITC

H

STORAGE

CONTROLLER A

B

A

B CA

CH

E

SE

RV

ER

CA

CH

E

SQ

L S

ER

VE

R

WIN

DO

WS

CP

U C

OR

ES

CPU Feed Rate HBA Port Rate Switch Port Rate SP Port Rate

A

B

DISK DISK

LUN

DISK DISK

LUN

SQL Server

Read Ahead Rate

LUN Read Rate Disk Feed Rate

The alternative: A balanced system

Design a server + storage configuration that can deliver all the IO bandwidth that CPUs can consume when executing a SQL Relational DW workload

Avoid sharing storage devices among servers

Avoid overinvesting in disk drives Focus on scan performance, not IOPS

Layout and manage data to maximize range scan performance and minimize fragmentation

Sequential I/O

Sequential I/O

Ideal for data warehousing

Large reads & writes

Scans on large data stores are usually read with sequential read patterns and not random read patterns

Scalable, predictable performance

Requires 1/3 or fewer drives for same performance

Random I/O

Ideal for OLTP

Small reads and writes

OLTP usually random-read centric Seek queries are a goal in OLTP query optimization

Seeks usually cause random reads

Not as predictable & scalable for data warehousing

Requires large number of drives

All databases contain both scans and seeks among with other types of reads and writes, DW workload indicate that the vast majority of reads are sequential – not all

What is Fast Track Data Warehouse?

A method for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware configurations developed in conjunction with hardware partners using this method Best practices for data layout, loading and management

Relational Database Only – Not SSAS, IS, RS

Fast Track Scope

Analysis Services

Cubes

PerformancePoint

Reporting Services

Web Analytic Tools

Dedicated SAN,

Storage Array

SharePoint Services

Microsoft Office SharePoint

Data Warehouse

Data Staging,

Bulk Loading

Presentation Layer Systems

Reference Architecture Scope (dashed)

Excel Services

Pre

sen

tati

on

Data

P

resen

tati

on

Data

Lower TCO Minimizes risk of overspending on un-balanced hardware configurations

Commodity Hardware

Choice HW platform

Implementation vendor

Reduced Risk Validated by Microsoft

Encapsulates best practices

Known performance & scalability

Benefits of Fast Track appliance model

Fast Track DW Reference Configurations

Server CPU CPU

Cores SAN Data Drive Count

Initial

Capacity*

Max

Capacity**

HP Proliant

DL 385 G6

(2) AMD Opteron Istanbul

six core 2.6 GHz

12 (3) HP MSA2312fc (24) 300GB 15k

RPM SAS

6TB 12TB

HP Proliant

DL 585 G6

(4) AMD Opteron Instanbul

six core 2.6 GHz

24 (6) HP MSA2312fc (48) 300GB 15k SAS 12TB 24TB

HP Proliant

DL 785 G6

(8) AMD Opteron Istanbul

six core 2.8 GHz

48 (12) HP MSA2312 (96) 300GB 15k SAS 24TB 48TB

Dell PowerEdge R710 (2) Intel Xeon Nehalem

quad core 2.66 GHz

8 (2) EMC AX4 (16) 300GB 15k FC 4TB 8TB

Dell Power Edge R900 (4) Intel Xeon Dunnington

six core 2.67GHz

24 (6) EMC AX4 (48) 300GB 15k FC 12TB 24TB

IBM X3650 M2 (2) Intel Xeon Nehalem

quad core 2.67 GHx

8 (2) IBM DS3400 (16) 200GB 15K FC 4TB 8TB

IBM X3850 M2 (4) Intel Xeon Dunnington

six core 2.67 GHz

24 (6) IBM DS3400 (24) 300GB 15k FC 12TB 24TB

IBM X3950 M2 (8) Intel Xeon Nehalem four

core 2.13 GHz

32 (8) IBM DS3400 (32) 300GB 15k SAS 16TB 32TB

Bull Novascale R460

E2

(2) Intel Xeon Nehalem

quad core 2.66 GHz

8 (2) EMC AX4 (16) 300GB 15k FC 4TB 8TB

Bull Novascale R480

E1

(4) Intel Xeon Dunnington

six core 2.67GHz

24 (6) EMC AX4 (48) 300GB 15k FC 12TB 24TB

* Core-balanced compressed capacity based on 300GB 15k SAS not including hot spares and log drives. Assumes 25% (of raw disk space) allocated for Temp DB. ** Represents storage array fully populated with 300GB15k SAS and use of 2.5:1 compression ratio. This includes the addition of one storage expansion tray per enclosure. 30% of this storage should be reserved for DBA operations

SMP Server

per 4-Cores

Per MSA2312 Drive Details • Each MSA can hold 12 drives, this configuration requires 11 • MSA is 2U in total (capacitor eliminates need for battery) • Each MSA SP port controls 4 LUNs • Each pair of LUNs consists of (2) 300GB 15k SAS drives RAID1

Each SP rated at 500MB/s or 1000MB/s for both SP’s

Using 300GB 15k SAS drives each LUN rated at 125MB/s each SP controls 4 LUN’s at 500MB/s or 1000MB/s per MSA DAE

Each SP port rated at 4Gb/s or 400MB/s and 1600MB/s for all 4 SP ports.

Each HBA port rated at 4Gb/s or 400MB/s and 1600MB/s for all 4 SP ports.

SWIT

CH

SP A

SP B

03 04

RAID GP02

LUN3

LUN4

01 02

RAID GP01

LUN1

LUN2

05 06

RAID GP03

LUN5

LUN6

07 08

RAID GP04

LUN7

LUN8

09 10

RAID GP05

LUN0 (Logs)

HS ONLY 8

data

disks !!!

Fast Track DWCore-Balanced Architecture

Fast Track Data Warehouse Components

Software:

• SQL Server 2008 Enterprise

• Windows Server 2008

Hardware:

• Tight specifications for servers, storage and networking

• ‘Per core’ building block

Configuration guidelines:

• Physical table structures

• Indexes

• Compression

• SQL Server settings

• Windows Server settings

• Loading

RA: Tightly Spec'd

RAs include not only hardware but best practices in:

Window OS configuration

SQL Server startup options

Database physical layout

Table types

Indexing

Statistics

Managing fragmentation

Loading procedures

Fast Track Case Study - Results

Teradata SQL Server Fast Track DW

Comparison

Loading – Subject Area 1

5:10:21 total time 51:31 total time R SQL Server 6x faster

Loading – Subject Area 2

4:36:08 total time 1:50.01 total time R SQL Server 2.5x faster

Query times – Subject Area 1

3:03 avg query time (using 9 benchmark queries)


R SQL Server 12x faster

Query times – Subject Area 2



R SQL Server 7x faster

Agenda




Early adoption

About DATAllegro…

Industry

Standard

Networking

Proprietary Appliance

Management and

MPP Database

Industry

Standard

Storage

Open Source

Database and OS

Technology Partners

Industry

Standard

Servers

http://ingres.com/index.html

http://www.linux.org/

Integration Plans Provide scale out through MPP on SQL Server and Windows

Offer ‘Appliance like’ user experience to Data Warehouse customers

Lower TCO to high end Data Warehousing

Offer integrated BI platform to small and very large Enterprises

OPEN SOURCE DATABASE & OS

Industry Standard Servers

Industry Standard Networking

Industry Standard Storage

http://ingres.com/index.html

http://www.linux.org/

MPP Additional Considerations

Principles & approach of SMP carry forward

Deeper level of complexity – High Availability

Parallelization

Inter node data movement

Modular building blocks Balanced CPU and storage

Both SMP and MPP are based on building blocks that scale by the CPU core

Adds network, storage processing and disk bandwidth for each core

Based on maximizing & sustaining true sequential I/O while minimizing disks

Generally changes balance of systems so more can be spent on CPU and SW than on storage to give better overall performance for a given budget

Building blocks can be adjusted for multiple MPP configurations – high performance, archive and extreme performance

The future of SQL Server Data Warehousing – Project "Madison"

Predictable Scale out through MPP

Customers with over 400 TB data warehouses

Commodity Hardware

Lower cost

Frequent performance improvements

Easier upgrade and maintenance

Higher customer comfort

Better compatibility

Ultra Shared Nothing

An extension of traditional shared nothing design

Push shared nothing architecture into SMP node

IO and CPU affinity within SMP nodes Eliminate contention per user query

Use full resources for each user query

Multiple physical instances of tables

Distribute large tables

Replicate small tables

Distribute AND Replicate medium tables

Re-Distribute rows “on-the-fly” when necessary

Madison Server Components

Control

Compute

Storage

Landing Zone

Backup

Management

Failover/Spare

Spare Database Server

Du

al Fib

er

Ch

an

nel

Database Servers

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

Du

al I

nfi

nib

an

d

Control Nodes

Active / Passive

SQL

System Architecture Database Servers

Du

al I

nfi

nib

an

d

Control Nodes

Active / Passive

Spare Database Server

Du

al Fib

er

Ch

an

nel

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL Client Drivers

ETL Load Interface

Corporate Backup

Solution

Data Center

Monitoring

Corporate Network Private Network

8Gbs Fiber Channel Local San

20Gbs Infiniband DMS Backbone

IPoIB Dedicated LAN

Software Architecture

SQL Server

DW Authentication

DW Configuration

DW Queue

DW Schema

Compute Nodes Compute Nodes

Compute Nodes

Landing Zone

Backup Node

Management Node

Built by DWPU Existing MS software 3rd Party

SQL Server

DMS

User Data

IIS

Nexus Query Tool

JDBC OLE-DB ODBC

Ado.Net

Admin Console

MS BI (AS, RS)

DMS

DMS

Loader Client

SQL SSIS

HPC AD

Madison Service

DSQL Core Engine

Services DMS

Manager

SQL OS

DMS

SQL OS

Control Node & Client Drivers Client connections always go through the control node

Clustered to a passive node

Processes SQL requests

Prepares execution plan

Orchestrates distributed execution

Local SQL Server to do final query plan processing / result aggregation

Will use same set of drivers used by DATAllegro Provided by DataDirect

ODBC, OLE-DB, JDBC and Ado.Net client drivers

Wire protocol (SeQuel Link)

Available drivers for 32 and 64 bits

Compute Nodes

A SQL Server 2008 instance

DB engine nodes autonomous on local data

SQL as primary interface

Each MPP node is a highly tuned SMP node with standard interfaces

Landing Zone Provides high capacity storage for data files from ETL processes

Integration services available on the landing zone

Connected to internal network

Available as sandbox for other applications and scripts that run on internal network.

Source Landing

Zone Files Data

Loader Compute

Nodes

Backup Node

Builds on SQL Server native backup/restore facility

Use VDI interface to plug into backup pipeline

Database-level backup

Coordinated backup across the nodes

Quiesce write activity to synchronize

Can only restore to another appliance with exactly the same number of distributions

Data Distribution & Replication

Compute Nodes Storage Nodes Control Node

Spare Node

Tables Are Hash

Distributed Or

Replicated

Landing Zone Node

Text

File Text

File Text

File Text

File

Date Dim D_DATE_SK D_DATE_ID D_DATE D_MONTH …

Item I_ITEM_SK I_ITEM_ID I_REC_START_DATE I_ITEM_DESC …

Store Sales Ss_sold_date_sk Ss_item_sk Ss_customer_sk Ss_cdemo_sk Ss_store_sk Ss_promo_sk Ss_quantity …

Promotion P_PROMO_SK P_PROMO_ID P_START_DATE_SK P_END_DATE_SK …

Store S_STORE_SK S_STORE_ID S_REC_START_DATE S_REC_END_DATE S_STORE_NAME …

Customer C-CUSTOMER_SK C_CUSTOMER_ID C_CURRENT_ADDR …

Customer Demographics

CD_DEMO_SK CD_GENDER CD_MARITAL_STATUS CD_EDUCATION …

Database Distributed & Replicated Tables

C I

D

CD

S

P

C I

D

CD

S

P C I

D

CD

S

P

C I

D

CD

S

P

C I

D

CD

S

P

C I

D

CD

S

P

C I

D

CD

S

P

C I

D

CD

S

P

SS

SS

SS

SS

SS

SS

SS

SS

Physical Storage Configuration – Single Node LUN 8 LUN 2 LUN 3 LUN 1

Log LUN

UserDB Log TempDB Log StageDB Log

Local Drive 1 Local Drive 2 Local Drive 3 Local Drive 4 Local Drive 5 Local Drive 6

Tem

pD

B

TempDB1.mdf TempDB2.ndf TempDB3.ndf TempDB4.ndf TempDB5.ndf TempDB6.ndf

User

Data

ba

se(s

)

FG Dist B FG Dist C FG Dist H FG Dist A

DistData1.mdf

DistData2.ndf

DistData3.ndf

DistData4.ndf

DistData5.ndf

DistData6.ndf

DistData7.ndf

DistData8.ndf

Replicated FG

ReplData1.mdf

ReplData2.ndf

ReplData3.ndf

ReplData4.ndf

ReplData5.ndf

ReplData6.ndf

ReplData7.ndf

ReplData8.ndf

Sta

gin

g

Data

ba

se

FG Stage B FG Stage C FG Stage H

Replicated FG

FG Stage A

StageData1.mdf

StageData2.ndf

ReplData1.mdf

ReplData2.ndf

ReplData3.ndf

ReplData4.ndf

ReplData5.ndf

ReplData6.ndf

ReplData7.ndf

ReplData.ndf

StageData3.ndf

StageData4.ndf

StageData5.ndf

StageData6.ndf

StageData1.ndf

StageData2.ndf

Create Table – Behind the Scenes

Microsoft Confidential

Create Table store_sales

with

distribute_on (ss_item_sk)

partition_on(ss_sold_date_sk)

cluster_on (ss_sold_date_sk)

Create Table mad_store_sales_a Create Table mad_store_sales_ … Create Table mad_store_sales_h

8K 8K

8K 8K

8K

8 Filegroups

1 Table per FG

Distribution_a

thru

Distribution_h

12 Partitions

(ss_sold_date_sk)

N-number of

Pages

Tuple

Multiple levels of redundancy:

• Leveraging MSCS for node availability

• Cluster aware services:

• SQL Server, Madison, DMS

• Leveraging MSCS for SQL Services, DMS

• 1 spare node for every 8* compute nodes

High Availability

8x1

Security and Encryption Retain DA v3 design

Authentication and authorization done by Madison server Users and Roles as first class principals Nested role capabilities Connection to SQL back-ends through high privilege account SQL nodes reside on private network

No support for integrated auth Leverages TDE to expose DB-level encryption

Supports key rotation

The Logical Data Model

Multiple databases per appliance Each user database maps to one SQL Server db per node

Tables Replicated, Distributed, Replicated + Distributed

Leverage SQL Server compression

Supports Partitioning

Supports secondary indexes

Views

Data Types Most scalar data types supported by SQL Server 2008 are supported by Madison

Main exceptions Character and binary strings limited to 8K (i.e. no BLOB support)

XML

Sql-Variant

System and CLR UDTs

Latin1_General with binary comparison only

SQL Server Data Types DAv3 Madison

bigint P P

binary

bit P

char / nchar P P

date, time P

datetime (was date in DA) P P

datetime2 P

datetimeoffset P

decimal P P

float P P

geometry / geography

hierarchyid

Int (was integer in DA) P P

money P

real P

smalldatetime P

smallint P P

smallmoney P

sql_variant

text / ntext / image

timestamp

tinyint P P

varchar / nvarchar / varbinary P P

v*(max)

uniqueidentifier

xml

Supported SQL Syntax

Aligned with ANSI SQL 92 Basic INSERT, UPDATE, DELETE, SELECT

CREATE TABLE AS SELECT

Limited analytical function support

Teradata extensions Quantile, Sample,…

Configuration and Monitoring

Madison services instrumented Logs and Performance Counters

Capture and forward SNMP alerts from devices within the appliance

Small subset of DMVs to union underlying node DMVs

Leverage HPC for monitoring

Challenge: Is it an appliance or a collection of nodes?

Manageability

Web-based main administrative user interface Based on DATAllegro manageability UI

Monitoring system health and activity

Leveraging HPC pack 2008 Systems management

Monitoring

Cluster health

Query Tools GUI Tool:

Nexus (CoffingDW)

Table & view object explorer

Interactive query execution

Command line tool: Replacement for DA-SQL

Flavor of SqlCmd

MS BI Integration Integration Services

Madison enabled as a source Data movement, lookup operations, etc.

Will add a new SSIS destination Ensure integrated high performance loads

Reporting Services

Fully supported; including parameterized queries Will customize experience for report builder and report designer

Analysis Services

Will get connectivity through OLE-DB provider Will enable both MOLAP and ROLAP storage

High Level Release Definitions

“Madison” (aka v1)

Focus on time to market Compatibility with DATAllegro v3 MS BI integration H1 2010

Closer functional alignment with SQL Server Better integration with SQL and MS ecosystem, tools and technologies

V2+ Will start

running MTPs in the

summer

Recap

Data Warehousing Reference Architectures available today!

SQL Server Fast Track

SQL Server “Madison” Built for advanced, large scale data warehouses Shared-nothing MPP architecture

Early evaluation programs starting soon All feedback welcome:

[email protected]

Thank you!