sql server warehousing (fast track 4.0 & pdw) stephen strong, matthew winter dbi331
TRANSCRIPT
Agenda
Fast Track Data Warehouse SQL Server 2012The ProblemBalanced ArchitecturePerformance GainsLessons Learnt
SQL Server Parallel Data WarehouseWhat is Parallel Data WarehousePDW Architecture & ComponentsAU3.0 EnhancementsRoadmap
Q & A
Solution
Step 1: Buy new shiny SAN arrayStep 2: Buy big serverStep 3: Install SQL Server 2012Step 4: Bulk load 30TB DW with SSISStep 5: Grant access
Users
30TB
Why that “solution” doesn’t work with DW
Infrastructure CostLarge scans of huge fact tablesAudience Survey – SAN throughput
DATA WAREHOUSING OFFERINGSWHERE FAST TRACK FITS IN
Self Build Business Data
Warehouse
Fast Track Reference
Architectures
Parallel Data Warehouse Half Rack
Parallel Data Warehouse Full Rack
Effort to Build Variable Very low Moderate Very low Very low
Capacity Variable 5TB 14TB – 95TB 80TB 500TB +
Concurrency Variable Light Light – high Very high Very high
Query Complexity
Variable Medium Medium – high Very high Very high
Architecture SMP SMP SMP MPP MPP
Form factor Software only
Appliance Reference Architecture
Appliance Appliance
Fast Track DW Reference Architectures
Software:• SQL Server 2012 Enterprise• Windows Server 2008 R2
Hardware System Design:• Tight specifications for
servers, storage and networking
• Resource Balanced & Validated
• Latest generation Servers & Storage
Database Configuration:• Workload Specific• Database Architecture• SQL Server settings• Windows Server settings• Performance Guidance
Workload specific Software & System designs that help Customers
accelerate their SQL Server 2012 projects.
Key benefits
Still Windows Server & SQL Server for admins & mgmtCan select from a range of HW vendorsEngineered by expert from HW vendor and MicrosoftPre-tested for predictable resultsFaster implementationGreat performance/price ratioLow SQL Server license count
Under the covers – the “secret” sauce
Sequential ReadsParallel reads across multiple disk setsBalanced architectureDifficult to achieve in large server / share SAN environmentDatabase layout matches hardware layoutMatching throughput for each componentLUNs mapped to Storage Processors, network paths, CPUs
Server
Windows Server 2008 R2
SQL Server 2012
HBA
Fib
er
Sw
itch
HBA
Storage Enclosure
Storage Processor 3GB/sec
Storage Processor 3GB/sec
RAID10
RAID10
RAID10
RAID10
Storage Enclosure
Storage Processor 3GB/sec
Storage Processor 3GB/sec
RAID10
RAID10
RAID10
RAID10
Storage Enclosure
Storage Processor 3GB/sec
Storage Processor 3GB/sec
RAID10
RAID10
RAID10
RAID10
CPU
CPU
Storage Enclosure
Storage Processor 3GB/sec
Storage Processor 3GB/sec
RAID10
RAID10
RAID10
RAID10
HBA
HBA
400MB/sec each
1.6GB/secper port
1.6GB/sec each6.4GB/sec total
6.4GB/sec totalPCIe 3.0
2 socket x8 core16-cores total5.8GB/sec total
Performance Enhancements over Fast Track 3.0
Take DL380 FT 3.0 hardware – 20TB solutionSQL Server 2008 R2 – 2.4GB/sec benchmark queriesSQL Server 2012 – 5GB/sec with ColumnStore IndexesNo hardware changesNo Solid State Drives10x – 100x query improvement with ColumnStore
SQL Server Configuration
Trace FlagsDB file placementTempdb: 8-12 files maxIndex light – optimise for sequential scan throughputCapacity is based on Page and ColumnStore compressionSupport for AlwaysOn Failover Clustering
Fast Track Integration
Fast TrackSQL Server
2012
ETL(SSIS)
Source Systems
Source Systems
Source Systems
Cubes(SSAS)
Excel
SSRS
SP2010
PowerView
3rd Party
Lessons Learnt
Use SQLIO for validationAim for consistent MB/sec above published #sTake care with DB file placementOn 30TB+ configuration pay attention to SP-LUN mappingsSimulate FT LUN layout in Dev environmentTake advantage of ColumnStore for extra performanceSet MAXDOP to number of physical coresUse Resource Governor to limit memory/query to 10-15%
Agenda
Fast Track Data Warehouse SQL Server 2012The ProblemBalanced ArchitecturePerformance GainsLessons Learnt
SQL Server Parallel Data WarehouseWhat is Parallel Data WarehousePDW Architecture & ComponentsAU3.0 EnhancementsRoadmap
Q & A
What is Parallel Data WarehouseScalable• Scales from 10’s to 100’s of TB of Data• Implemented as Massively Parallel Processing (MPP)
System
Standards Based• Leverages Commodity Hardware• Speaks SQL Server Language (T-SQL)
Flexible• Offers Hardware of Choice (HP/Dell)• Supports Multiple Hardware Architectures
Cost Effective• Low Price/TB Ratio
SQL Server vs. PDWSQL Server• Symmetric Multiprocessing
(SMP) System• ‘Shared Everything’
Architecture• High Contention for System
Resources (e.g. I/O, CPU)• Extremely High Concurrency
for Some Workloads• Limited Scalability (up to
95TB)• Requires Building High
Availability• Rich Functionality for OLTP
and Data Warehousing
Parallel Data Warehouse• Massively Parallel Processing (MPP)
System• ‘Shared Nothing’ Architecture• Balanced Resource Utilization (e.g.
I/O, CPU, Memory)• Higher Concurrency for Very
Complex Workloads• Excellent Scalability (up to 1PB)• Built-in High Availability and
Redundancy• Tuned for Data Warehousing
Control Rack Data Rack
PDW ArchitectureCompute Nodes Storage Nodes
Spare Compute Node
Du
al
Fib
er
Ch
an
nel
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Du
al
Infi
nib
an
d
Control Nodes
Active / Passive
Landing Zone
Backup Node
SQL
Management Servers
Client Drivers
ETL Load Interface
Corporate Backup Solution
Support / Patching
SQL
SQL
Control Node and Client DriversClient Connections Always Go Through The Control Node
Contains No Persistent User Data
PDW “Secret Sauce”• Processes SQL requests• Prepares execution plan• Orchestrates distributed execution
Local SQL Server• Final Query Plan Processing• Result Aggregation
Drivers • TDS Protocol (SQL Native 10)• Wire Protocol (SeQuel Link)• ODBC, OLE-DB, ADO.NET and JDBC Client Drivers for both 32-bit and 64-bit
Management Node
Runs the Windows Domain Controller (Active Directory)
Used for deploying patches to all nodes in the appliance
Holds images in case a node needs reimaging
Landing ZoneProvides High-Capacity Storage for Data Files from ETL processes
Integration Services available on the Landing Zone
Connected to Internal Network
Available as Sandbox for other Applications and Scripts that Run on Internal Network
Source Landing Zone Files Data Loader Compute
Nodes
DWLOADERor
SSIS
Backup Node
Coordinated Backup across Nodes
Database-Level Backup• Full or Differential• Metadata Backup
Can Restore to a Larger Appliance
Optional Item – 1 Size per Configuration• Up to 654TB of Capacity• Orderable in Different Capacities
Compute Node
Each MPP Node is a Highly Tuned SMP Node with Standard Interfaces
Dedicated Hardware, Database, and Storage
Running SQL Server 2008 Enterprise Edition
SQL as Primary Interface
Compute Node
Enterprise-ClassDBMS
TempDBWorkspace
Dual Multi-CoreProcessors
DUAL 4Gb FC Dual InfiniBand
CPU
CPU
RAM
Compute Node Hardware
Pre-Configured for Each SQL Server Instance on Each Compute Node.• Drives Configured as RAID 1 to Avoid Appliance Failover for a Single-Drive Failure• Configured with 3 LUNs (i.e. 3 RAID 1 Pairs)
TempDB Used for the Following Purposes:• Sort-Work Area for Data Loading into Clustered Index Tables• Spill Area for Hash Joins Not Fitting into Memory• Temporary PDW Tables
Storage Node
Dual 8 GB FiberStorage Processor
Dual 8 GB FiberStorage Processor
Data/Log Drives (RAID 1) - LFF
Hot Spare
Dual Fiber Channel Controllers• Active/Active• Provides Fault Tolerance
Storage Node – Drive Array (User Databases)
LUN 9
LUN 9
LUN 9
LUN 9
LUN 9
LUN1 LUN2 LUN3 LUN4 LUN5 LUN6 LUN7 LUN8
Log
User
Data
base
Small Form Factor Drives (SFF) (24) – 8 RAID 1 groups containing a single LUNLarge Form Factor Drives (LFF) (11) – 4 RAID 1 groups containing 2 LUNs each
LUN 9
LUN1 LUN2 LUN3 LUN4 LUN5 LUN6 LUN7 LUN8
Hot Spare
Hot Spare(SFF)
SFF – 6 disk in a RAID 1 + 0 groupLFF – 2 disk in a RAID 1 group
SFF - 2 Hot SparesLFF – 1 Hot Spare
Storage Node – Physical File Layout
LUN1 LUN2 LUN3 LUN4 LUN5 LUN6 LUN7 LUN8
User
Data
base
UDB Replicated FG
UDBRepl.ndf
FG UDB_Dist_A
UDBDist.ndf
UDBRepl.ndf UDBRepl.ndf UDBRepl.ndf UDBRepl.ndf UDBRepl.ndf UDBRepl.ndf UDBRepl.ndf
FG UDB_Dist_B
UDBDist.ndf
FG UDB_Dist_C
UDBDist.ndf
FG UDB_Dist_D
UDBDist.ndf
FG UDB_Dist_E
UDBDist.ndf
FG UDB_Dist_F
UDBDist.ndf
FG UDB_Dist_G
UDBDist.ndf
FG UDB_Dist_H
UDBDist.ndf
8 Distribution Filegroups
1 Replicated Filegroup with 8 Files
1 Log Filegroup
LUN 9
LUN 9
LUN 9
LUN 9
LUN 9
Log
LUN 9
PDW Benefits – Massive Parallel Processing
Control Rack Data Rack
Compute Nodes Storage Nodes
Spare Compute Node
Du
al
Fib
er
Ch
an
nel
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Du
al
Infi
nib
an
d
Control Nodes
Active / Passive
Landing Zone
Backup Node
SQL
Management Node
SQL
SQL
Query 1
Query 1 is Submitted to SQL Server via the Control Node.
?????????
Query is Executed on All 10 Nodes.
Results are Sent Back to the Client.
?
Customer SuccessesData Volume • 80 TB data warehouse analyzing data from
exchanges• Existing system based on SQL SMP farm • 2 different clusters of 6 servers each
Requirement • Linear scalability with additional hardware• Support hourly loads with SSIS – 300GB/day• BI Integration: SSRS, SSAS and PowerPivot
AU3 Feedback• SP and increased T-SQL support was great• Migrating SMP SSRS to PDW was painless• 142x for scan heavy queries & no summary tables• Enabled queries that do not run on existing
system
Reports
Dashboards
Scorecards
Portal
ETL
PDWOperationa
l DB’s
CUSTOMER EXAMPLE:Stock Exchange in the US
Customer SuccessesData Volume • Currently 11 TB data warehouse analyzing data from
transactional and clickstream sources (2 years data)• Business need to expand to 7 year data window, i.e. ~39
TB.
Requirement • Scalability - growing data volume does not affect
performance• Performance and ad-hoc analysis for interactive
querying by users• BI Integration with Microsoft BI stack - SSAS and SSRS
AU3 Feedback• SSAS cubes worked ‘out-of-box’• Performance an order of magnitude faster than existing
system (~30x on an expanded data set)
CUSTOMER EXAMPLE:Major Retailer in the
US
PDW
Nielsen OLTP system
SSAS
Click-Stream
Demo
Matthew WinterArchitectPDW & Appliance CoE
Parallel Data WarehouseLoading DataBreaking the Marketing Numbers
What’s New in Appliance Update 3.0 (AU3)
Best-in-class Manageability
10X Performance
More BI Tools,Any Language
Low Entry Cost
New in AU3: Breakthrough Performance
Support More Users with High Concurrency
Breakthrough Performance
Faster Query Performance Drives Greater Productivity
Faster Throughput for Mixed Workloads
Improvement in Data Movement in the appliance
6X
10X
Reduced cost of Upgrading from Scale Up SQL Server to MPP
New in PDW AU3: Best-in-Class Manageability
Best-in-Class Manageability
Reduced operational cost through unified management of appliances and other SQL instances
PDW Integration with
Cost
Order From Your Preferred Vendor
New in PDW AU3: Low Entry Cost
Low Entry Cost
New Lower Cost Half Rack Solution
HP
DELL
Use PDW on Data in Any Language
New in PDW AU3: More BI Tools, Any Language
More BI Tools,Any Language
Greater Support for Non-Microsoft BI & ETL Tools
International Data Support
SASHadoopInformaticaMicrostrategySAP Business Objects
In Review
Session Objectives • Scaling Microsoft SQL Server to manage the trend to analyse and report
on increasing data volumes.
Key Takeaways• The Fast Track Reference Architecture provides a cost-effective
approach to accelerate your data warehouse strategy. Providing a guide to building a reliable, scalable solution that will help you reduce your risk and save time.
• Parallel Data Warehouse is a preconfigured appliance based upon the proven SQL Server database technology. PDW is capable of scaling out as your data volumes grow, reducing the maintenance overhead to a minimum and provides the raw performance an Enterprise Data Warehouse requires.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS
PRESENTATION.