optimizing your bi semantic model for performance and scale dave wickert (ae7td)...

Optimizing Your BI Semantic Model for Performance and Scale

Dave Wickert (AE7TD) dwickert@microsoft.comPrincipal Program ManagerSQL Server Business IntelligenceMicrosoft

My radios

Main rig:ICOM 7000100w HF into a 33’ Buddipole

My radios

QRP rig: (on order )Elecraft KX310w HF into a 17M 28’ PAR EndFedz wire thrown up into a tree

My radios

SDR ‘toy’:Funcube Dongle

My radios

DStar digital radio in my office:DV Dongle

Session Objectives

You will understand: architecture of Analysis Services in tabular modeoptimizing processing performancequery processing architecture

Takeaway

Factors to think through for capacity planning

BI Semantic Model: Architecture

BI Semantic Model

Data model

Business logic and queries

Data access ROLAP MOLAPxVelocityVertiPaq

DirectQuery

MDX DAX

Multi-dimensional

Tabular

Third-partyapplications

ReportingServices Excel PowerPivot

Databases LOB Applications Files OData Feeds Cloud Services

SharePointInsights

VertiPaq Design Principles

1. Performance, Performance, Performance

2. Query Performance >> Processing Performance

3. Accommodate changes without forcing reload, if possible

Encoding

Value EncodingArithmetic for value <-> dataIDGreat for dense value distributionAllows computation on dataIDs

Hash EncodingHash table for value<->dataIDGreat for sparse value distributionRequires decompression for computation

Encoding(per column) Compression

(per segment)

values dataID(int)value hash

Encoding Example

2,000,000,000

Hash Encoding Value Encoding

Defined automatically to conserve space, discoverable through DMVs

T2(C1,C2)

VertiPaq StorageT1

(C1,C2,C3)

Dictionary

Col. Segment Hierarchy

Relationship

Table data stored in segments & dictionaries per columnCalculated columns are stored like regular columnsHierarchies can provide quicker access for querying

Relationship structures are created to accelerate lookups across tables, remove hard coupling between tables

Partitions are a group of segments and intended for data managementAny table can have partitions, defined independently of other tables. Partitions are only for segment data

Inspecting VertiPaq Storage

Processing Architecture

What’s going on?

Network

Memory

Processing Architecture

Hands On

Processing Phases

Read & Encode DataSegment N

Segment N

CompressSegment N

Read & Encode DataSegment N + 1

CompressSegment N+1

Build calc cols, hier,

relationships

Segment N +1

Special case of 3rd segment

Read & Encode DataSegment (until 2*segment_size)

Segment 1 + 2

CompressSegment 1

Read & Encode DataSegment 3

CompressSegment 2

First segment can “stretch” to be twice as large Optimizes for smaller lookup tables

Segment 3

CompressSegment 3

Processing - Memory & CPU usage

Network

Memory

Segment 1 + 2

Compress Segment 1

Compress Segment 2

Read & Encode DataSegment (until 2*segment_size)

Read & Encode DataSegment 3

CompressSegment 3

Calc cols, hier, rels.

Segment 3Split

Controlling Segment Size and CompressionDefaultSegmentRowCount – # of rows in the segment

0 – default: 8M for Analysis Services, 1M for PowerPivotValue must be power of 2, should be at least 1MLarger => generally better compression, faster queries with lower overheadSmaller => smaller working set during processing

ProcessingTimeboxSecPerMRow-1 – default: 10 sec Smaller => greedy algorithm gives most gains in the beginningLarger => almost always better compression, higher query performanceIncrease for large number of columns (~>200)Reported in profiler & DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS

Processing options

DataT1Data

Hierarchies

Relationships

Recalc

Full: Data and RecalcDefault: Data and Recalc if neededDefrag: defragment dictionaries for a tableClear: remove dataAdd: creates new partition and merges

Incremental Processing

TypicalCreate new partitions for new dataProcessData on new partition or to reload into existing tablesProcess Recalc to rebuild calc columns, hierarchies, relationships

AvoidMultiple ProcessFull – each causes a Recalc, unless in a single transactional batch

Advanced ProcessingParallel processing

Use single transaction batch for processing tables in parallelNo parallel processing of a table’s partitions in SQL Server 2012

Use ProcessDefrag periodicallyDeleting partitions may leave outdated values in dictionaries

Merge partitions for reducing metadata overhead

Error handling RI violations assigned a blank value, no error during processing

Server Memory Map

VertiPaqPagingMode enables use of system pagingDictionaries need to remain in memory

Server memory

DB1 FE VertiPaq

caches

DB2 DB1~New, not

committed part of the database

VertiPaqMemoryLimit – 60 %

TotalMemoryLimit – 80 %

Inspecting Processing Sequence

Query Processing

Querying

Two formula engines: MDX, DAX

FE calls into VertiPaq to retrieve data Query logic pushed to VertiPaq where possibleA VertiPaq query executes in parallel, one core per segmentOptimized for execution on compressed formatVertiPaq-level caching for chatty FE/VertiPaq communicationEvents in profiler

VertiPaq

MDX DAX

Queries

VertiPaq Query Performance

Scans @ 5B+ rows/s 1B+ rows of data

Scans @ 20B+ rows/s10B+ rows of data

Inspecting Query Evaluation

Rich & Fast

RichSingle threaded per queryDesigned for expressivity

SimpleOne core per segmentOptimized for speed

DAX/MDX VertiPaq Query

More pushed down over time

Why raw speed counts . . .Vertipaq Performance, Performance, Performance

Amdahl's law: Establishes themaximum expected improvementto an overall system when onlypart of the system is improved.

The speedup of a program usingmultiple processors in parallelcomputing is limited by the timeneeded for the sequential fractionof the program.

Session Objectives and Takeaways

Session Objectives: Understand the architecture of Analysis Services in tabular modeUnderstand and optimize processing performanceUnderstand query processing architecture

Takeaways:Factors to think through for capacity planning

optimizing your bi semantic model for performance and scale dave wickert (ae7td)...

windows vista

vertipaq storage6262012

radios main rig

value dataidgreat

buddipole2my radios

radios sdr toy

product names

radios dstar digital

Documents

object recognition (part 1) cse p 576 larry zitnick...

windows presentation foundation fabio santini...

translated by max wickert - outriders poetry project · ©...

john howard it pro evangelist microsoft uk...

noaa - microsoft.com

texas stock laws by county chart (00216022)...work product...

release 1.1.0 mark wickert, chiranth siddappa

windows xp sp2 point de vue du développeur gilles guimard...

windows internals http:// scoriani@microsoft.com http:// ...

datenzugriff mit ado.net dirk primbs technologieberater...

matthiesen, wickert & lehrer, s.c. wisconsin louisiana...

enterprise solution patterns jose antonio silva...

· pdf filem pasternak@hugeinc.com cammy@microsoft.com...

eﬃcientoptimallearningforcontextualbandits - microsoft.com

1 the art of building a reusable class library brad abrams...

kinect case study cse p 576 larry zitnick...

resume mark a. wickert - eas.uccs.edu

lwml convention highlights peoria june 23 – 26, 2011...

microsoft codename “dallas” data for your apps! moe...

gandiva fair - microsoft.com