optimizing your bi semantic model for performance and scale dave wickert (ae7td)...

Post on 31-Dec-2015

227 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Optimizing Your BI Semantic Model for Performance and Scale

Dave Wickert (AE7TD) dwickert@microsoft.comPrincipal Program ManagerSQL Server Business IntelligenceMicrosoft

My radios

Main rig:ICOM 7000100w HF into a 33’ Buddipole

My radios

QRP rig: (on order )Elecraft KX310w HF into a 17M 28’ PAR EndFedz wire thrown up into a tree

My radios

SDR ‘toy’:Funcube Dongle

My radios

DStar digital radio in my office:DV Dongle

Session Objectives

You will understand: architecture of Analysis Services in tabular modeoptimizing processing performancequery processing architecture

Takeaway

Factors to think through for capacity planning

BI Semantic Model: Architecture

BI Semantic Model

Data model

Business logic and queries

Data access ROLAP MOLAPxVelocityVertiPaq

DirectQuery

MDX DAX

Multi-dimensional

Tabular

Third-partyapplications

ReportingServices Excel PowerPivot

Databases LOB Applications Files OData Feeds Cloud Services

SharePointInsights

VertiPaq Design Principles

1. Performance, Performance, Performance

VertiPaq Design Principles

2. Query Performance >> Processing Performance

VertiPaq Design Principles

3. Accommodate changes without forcing reload, if possible

Encoding

Value EncodingArithmetic for value <-> dataIDGreat for dense value distributionAllows computation on dataIDs

Hash EncodingHash table for value<->dataIDGreat for sparse value distributionRequires decompression for computation

Encoding(per column) Compression

(per segment)

values dataID(int)value hash

Encoding Example

Sales

1

2,000,000,000

Sales

1

2

Hash Encoding Value Encoding

Defined automatically to conserve space, discoverable through DMVs

T2(C1,C2)

P2

Part

itio

n 1

CC1

6

VertiPaq StorageT1

(C1,C2,C3)

Part

itio

n

1Pa

rtit

ion 2

Dictionary

Col. Segment Hierarchy

CC1

Relationship

1

1

2

2

3

3

4

4

5

5

6

Table data stored in segments & dictionaries per columnCalculated columns are stored like regular columnsHierarchies can provide quicker access for querying

Relationship structures are created to accelerate lookups across tables, remove hard coupling between tables

Partitions are a group of segments and intended for data managementAny table can have partitions, defined independently of other tables. Partitions are only for segment data

demo

Inspecting VertiPaq Storage

Processing Architecture

What’s going on?

Network

Memory

CPU

Processing Architecture

Hands On

Processing Phases

Read & Encode DataSegment N

Segment N

CompressSegment N

Read & Encode DataSegment N + 1

CompressSegment N+1

Build calc cols, hier,

relationships

Segment N +1

t

Special case of 3rd segment

Read & Encode DataSegment (until 2*segment_size)

Segment 1 + 2

CompressSegment 1

Read & Encode DataSegment 3

CompressSegment 2

Split

First segment can “stretch” to be twice as large Optimizes for smaller lookup tables

Segment 3

CompressSegment 3

t

Processing - Memory & CPU usage

Network

Memory

CPU

Segment 1 + 2

Compress Segment 1

Compress Segment 2

Read & Encode DataSegment (until 2*segment_size)

Read & Encode DataSegment 3

CompressSegment 3

Calc cols, hier, rels.

Segment 3Split

Controlling Segment Size and CompressionDefaultSegmentRowCount – # of rows in the segment

0 – default: 8M for Analysis Services, 1M for PowerPivotValue must be power of 2, should be at least 1MLarger => generally better compression, faster queries with lower overheadSmaller => smaller working set during processing

ProcessingTimeboxSecPerMRow-1 – default: 10 sec Smaller => greedy algorithm gives most gains in the beginningLarger => almost always better compression, higher query performanceIncrease for large number of columns (~>200)Reported in profiler & DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS

Processing options

DataT1Data

Hierarchies

CC1

Relationships

Recalc

Full: Data and RecalcDefault: Data and Recalc if neededDefrag: defragment dictionaries for a tableClear: remove dataAdd: creates new partition and merges

Incremental Processing

TypicalCreate new partitions for new dataProcessData on new partition or to reload into existing tablesProcess Recalc to rebuild calc columns, hierarchies, relationships

AvoidMultiple ProcessFull – each causes a Recalc, unless in a single transactional batch

Advanced ProcessingParallel processing

Use single transaction batch for processing tables in parallelNo parallel processing of a table’s partitions in SQL Server 2012

Use ProcessDefrag periodicallyDeleting partitions may leave outdated values in dictionaries

Merge partitions for reducing metadata overhead

Error handling RI violations assigned a blank value, no error during processing

Server Memory Map

VertiPaqPagingMode enables use of system pagingDictionaries need to remain in memory

Server memory

DB1 FE VertiPaq

caches

DB2 DB1~New, not

committed part of the database

VertiPaqMemoryLimit – 60 %

TotalMemoryLimit – 80 %

demo

Inspecting Processing Sequence

Query Processing

Querying

Two formula engines: MDX, DAX

FE calls into VertiPaq to retrieve data Query logic pushed to VertiPaq where possibleA VertiPaq query executes in parallel, one core per segmentOptimized for execution on compressed formatVertiPaq-level caching for chatty FE/VertiPaq communicationEvents in profiler

VertiPaq

MDX DAX

Queries

VertiPaq Query Performance

Scans @ 5B+ rows/s 1B+ rows of data

Scans @ 20B+ rows/s10B+ rows of data

demo

Inspecting Query Evaluation

Rich & Fast

RichSingle threaded per queryDesigned for expressivity

SimpleOne core per segmentOptimized for speed

DAX/MDX VertiPaq Query

More pushed down over time

Why raw speed counts . . .Vertipaq Performance, Performance, Performance

Amdahl's law: Establishes themaximum expected improvementto an overall system when onlypart of the system is improved.

The speedup of a program usingmultiple processors in parallelcomputing is limited by the timeneeded for the sequential fractionof the program.

Session Objectives and Takeaways

Session Objectives: Understand the architecture of Analysis Services in tabular modeUnderstand and optimize processing performanceUnderstand query processing architecture

Takeaways:Factors to think through for capacity planning

Related Content

DBI61-HOL: Developing a Tabular BISM Using SQL Server Data Tools

Microsoft SQL Server: Breakthrough Insights - Credible, Consistent Data

Workbook shown in talk:

Available here:• http://www.powerpivotblog.nl/

/what-is-using-all-that-memory-on-my-analysis-server-instance /bismservermemoryreport

• http://www.powerpivotblog.nl/wp-content /uploads/2012/02/BISMServerMemoryReport.xlsx

•  http://www.powerpivotblog.nl/?s=BISMServerMemory

Resources

Connect. Share. Discuss.

http://europe.msteched.com

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

TechNet

Resources for IT Professionals

http://microsoft.com/technet

Resources for Developers

http://microsoft.com/msdn

Evaluations

http://europe.msteched.com/sessions

Submit your evals online

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS

PRESENTATION.

Best Practices for Tabular ModelsWorkspace DB for data-driven design experience

BI Dev Studio

Analysis Services

Workspace DB

Process Tables/Partitions

Analysis Services

Deploy project

DB

Options for working with large datasets

1. Point to smaller/empty DB2. Filter large tables to reduce

rows

top related