Download - Agenda 10 Key SQL 2012 BI Innovations BI Semantic Model Project ‘Apollo’ Vertipaq xVelocity in SQL 2012

Agenda

10 Key SQL 2012 BI InnovationsBI Semantic ModelProject ‘Apollo’VertipaqxVelocity in SQL 2012

10 Key SQL 2012 BI Innovations

1. BI Semantic Model2. Analysis Services

Tabular Mode1. With BIDS support

3. PowerPivot “2”4. Power View (SSRS)5. Self-Service Data

Alerts (SSRS)

6. Hadoop Big Data Integration

7. xVelocity–Columnstore Indexes

8. Geospatial Indexes9. Unstructured Data

Queries10.Data Quality Services– Updated Master

Data Services

Analysis Services: Tomorrow

Build on the strengths and success of Analysis Services and expand its reach to a much broader user base

Embrace the relational data model – well understood by developers and IT Pros

Bring together the relational and multidimensional models under a single unified BI platform—best of both worlds!

Provide flexibility in the platform to suit the diverse needs of BI applications

Analysis Services VisionBI Semantic

Model

BI Semantic Model

• One Model for All User Experiences

• Visualise analysis using your favourite tools• Model data the way you like• Store analytical data however it is best done

• BISM is a concept, not a product– Can be hosted in PowerPivot or SSAS

BI Semantic ModelOne Model for All User Experiences

BI Semantic ModelData Model

Business Logic/Queries

Data Access ROLAP MOLAP xVelocityDirectQuery

MDX DAX

Multi-dimensional

Tabular

YourApps

Reporting Services& Power View Excel PowerPivot

Databases LOB Applications Files OData Feeds Cloud Services

SharePointInsights

BI Semantic ModelWhat about existing Analysis Services applications?

New applicationsNew technology options

Existing applicationsBased on Unified Dimensional Model

Existing applicationsEvery UDM becomes a BI Semantic Model

Existing applicationsBased on Unified Dimensional Model

BI Semantic Model

Data model

Business logic and queries

Data access ROLAP MOLAP VertiPaqDirectQuery

MDX DAX

Multi-dimensional

Tabular

Third-partyapplications

ReportingServices Excel PowerPivot

Databases LOB Applications Files OData Feeds Cloud Services

SharePointInsights

BI Semantic Model: Architecture

Tabular ModeScaling PowerPivot to Enterprise Needs

• Model in PowerPivot– PowerPivot as source of SSAS Tabular Models– Excel for browsing and testing in SSDT

• All new PowerPivot features:– Diagrams, Measure Grid, KPIs, Hierarchies,

Perspectives, 30+ New DAX Functions

• …and, unique to SSAS Tabular Mode:– Row-level Security, Partitions, Large Tables

(>2 billion rows), Images, Memory Paging

BI Semantic Model

Data model


Data access

SQL Server Dynamics CRM

EndUser

Model Developer

VertiPaq

DAX

Tabular

Example: Power View Over a Sales Model

BI Semantic Model

Data model


Data access

SQL Server Dynamics CRM

EndUser

Model Developer

MDX DAX

Multi-dimensional

Tabular

VertiPaq

Example: Power View Over a Sales Model

BI Semantic Model

Data model


Data access

Oracle SAP

EndUser

Model Developer

MOLAP

MDX

Multi-dimensional

Example: Excel Over a Finance Model

BI Semantic Model

Data model


Data access

Oracle SAP

EndUser

Model Developer

MOLAP

DAX MDX

Multi-dimensionalTabular

Example: Excel Over a Finance Model

Demo….

Tour of SQL Server 2012BISM

BI Semantic Model

Rich data modeling

Sophisticated business logic using MDX and DAX

Fine-grained security – row/cell level

Multi-dimensional or tabular modeling

MDX and DAX for logic and queries

Cached and passthrough storage

VertiPaq for high performance, MOLAP for mission critical scale

DirectQuery and ROLAP for real-time

State-of-the-art compression algorithms

Flexibility Richness Scalability

Data Model

Tabular

Familiar model, easier to build, faster time to solution

Advanced concepts (parent-child, many-to-many) not available natively in the model… need calculations to simulate these

Easy to wrap a model over a raw database or warehouse for reporting & analytics

Multidimensional

Sophisticated model, higher learning curve

Advanced concepts baked into the model and optimized (parent-child, many-to-many, attribute relationships, key vs. name, etc.)

Ideally suited for OLAP type apps (e.g. planning, budgeting, forecasting) that need the power of the multidimensional model

Business Logic

DAX

Based on Excel formulas and relational concepts – easy to get started

Complex solutions require steeper learning curve – row/filter context, Calculate, etc.

Calculated columns enable new scenarios, however no named sets or calc members

MDX

Based on understanding of multidimensional concepts – higher initial learning curve

Complex solutions require steeper learning curve – CurrentMember, overwrite semantics, etc.

Ideally suited for apps that need the power of multidimensional calculations – scopes, assignments, calc members

Data Access and StorageVertiPaq

In-memory column store… typical 10x compression

Brute force memory scans… high performance by default… no tuning required

Basic paging support… data volume mostly limited to physical memory

MOLAP

Disk based store… typical 3x compression

Disk scans with in-memory subcube caching… aggregation tuning required

Extensive paging support… data volumes can scale to multiple terabytes

DirectQuery

Passes through DAX queries & calculations… fully exploits backend database capabilities

No support for MDX queries… no support for data sources other than SQL Server (in Denali)

ROLAP

Passes through fact table requests… not recommended for large dimension tables

Supports most relational data sources… no support for aggregations except SQL Server indexed views

Analysis Services Architecture

Excel Services

PowerPivot for SharePoint(Analysis Services)Excel

Browser

Analysis Services

BI Semantic Model

SharePoint

Reporting Services

Third Party Apps

PowerPivot for Excel

xlsx

xlsx

BI Development Studio

Personal BI Team BI Organizational BI

Project ‘Apollo’ in SQL 2012

Apollo: A new column-oriented query accelerator

• What is Apollo?

– What does column-oriented mean?

– How does it accelerate queries?

• When to create a columnstore index

• How to use Apollo

– Creating an index

– Running queries

– Loading data

• How does Apollo relate to VertiPaq and PowerPivot?

• Apollo is the code name for new functionality that is available

in SQL Server 2012

• It will substantially accelerate common data warehouse

queries

• Adds a column store option in SQL Server database engine

– New index type in the database engine

• Advanced query processing algorithms

– New batch mode processing

What is Apollo?

When to use Apollo

• Data warehousing

– Read-mostly workloads

– Star joins

– Process large amounts of data

• Generous amount of memory

– Best performance when data fits in memory

– Graceful degradation as fact table paged from disk

– Under severe memory constraints, falls back to row-at-a-time

processing

How does Apollo speed up queries? (1)

• Stores data column-wise

• Better compression

• Uses VertiPaq compression

technology

• Less IO

…

C1 C2 C3 C5 C6C4

How does Apollo speed up queries? (2)

• Fetches only needed columns from

disk

– Less IO

– Better buffer hit rates

C1

C2

C4 C5 C6

C3

SELECT region, sum (sales) …

34

• Columnstore indexes provide an easy

way to significantly improve data

warehouse and decision support query

performance against very large data

sets

• Performance improvements for

“typical” data warehouse queries from

10x to 100x

• Ideal candidates include queries

against star schemas that use filtering,

aggregations and grouping against

very large fact tables

Improved Data Warehouse Query performance

35

• You need to execute high performance DW queries against very large data sets?– In SQL Server 2008 and SQL Server 2008 R2

• OLAP (SSAS) MDX solution• ROLAP and T-SQL + intermediate summary tables, indexed views and aggregate

tables– Inherently inflexible

– In SQL Server 2012• You can create a columnstore index on a very large fact table referencing all

columns with supporting data types– Utilizing T-SQL and core Database Engine functionality – Minimal query refactoring or intervention

• Upon creating the columnstore index, your table becomes “read only” – but you can still use partitioning to switch in and out data OR drop/rebuild indexes periodically

What Happens When…

36

• Two complimentary technologies:– Storage

• Data is stored in a compressed columnar data format (stored by column) instead of row store format (stored by row). – Columnar storage allows for less data to be accessed

when only a sub-set of columns are referenced – Data density/selectivity determines how compression

friendly a column is – example “State” / “City” / “Gender”

– Translates to improved buffer pool memory usage

– New “batch mode” execution• Data can then be processed in batches (1,000 row

blocks) versus row-by-row• Depending on filtering and other factors, a query may

also benefit by “segment elimination” - bypassing million row chunks (segments) of data, further reducing I/O

How Are These Performance Gains Achieved?

37

OrderDate

20010701

…

20010702

…

…

20010703

…

…

…

…

20010704

…

Cost

2171.29

1912.15

2171.29

413.14

333.42

1295.00

4233.14

641.22

24.95

64.32

1111.25

ProductID

310

311

312

313

314

315

316

317

318

319

320

321

• Column Store (values compressed)

ProductID

OrderDate Cost

310 20010701

2171.29

311 20010701

1912.15

312 20010702

2171.29

313 20010702

413.14

• Row Store (Heap / B-Tree)

Column vs. Row Store

data page1000

ProductID

OrderDate

Cost

314 20010701

333.42

315 20010701

1295.00

316 20010702

4233.14

317 20010702

641.22

data page1001

data page2001

data page2000

data page2002

38

• Allows processing of 1,000 row blocks as an alternative to single row-by-row operations– Enables additional algorithms that can reduce CPU

overhead significantly– Batch mode “segment” is a partition broken into million

row chunks with associated statistics used for Storage Engine filtering

• Batch mode can work to further improve query performance of a columnstore index, but this mode isn’t always chosen:– Some operations aren’t enabled for batch mode:

• E.g. outer joins to columnstore index table / joining strings / NOT IN / IN / EXISTS / scalar aggregates

– Row mode might be used if there is SQL Server memory pressure or parallelism is unavailable

– Confirm batch vs. row mode by looking at the graphical execution plan

Batch Mode

39

• Performance gains can come from a combination of:– Columnstore indexing alone + traditional row

mode in QP– Columnstore indexing + batch mode in QP– Columnstore indexing + hybrid of batch and

traditional row mode in QP

Columnstore format + batch mode Variations

40

• SSMS

• T-SQL

Creating a columnstore index

41

• Index type– Columnstore indexes are always non-clustered and non-

unique– They cannot be created on views, indexed views, sparse

columns– They cannot act as primary or foreign key constraints

• Column selection– Unlike other index types, there are no “key columns”

• Instead you choose the columns that you anticipate will be used in your queries

• Up to 1,024 columns – and the ordering in your CREATE INDEX doesn’t matter

• No concept of “INCLUDE”• No 900 byte index key size limit

• Column ordering– Use of ASC or DESC sorting not allowed – as ordering is

defined via columnstore compression algorithms

Defining the Columnstore Index

Accelerating Data Warehouse Queries with SQL Server 2012 Columnstore Indexes

Demo..

43

• Supported data types– Char / nchar / varchar / nvarchar

• (max) types, legacy LOB types and FILESTREAM are not supported

– Decimal/numeric• Precision greater than 18 digits NOT supported

– Tinyint, smallint, int, bigint– Float/real– Bit– Money, smallmoney– Date and time data types

• Datetimeoffset with scale > 2 NOT supported

Supported Data Types

44

• Columnstore indexes cannot be used in conjunction with – Change Data Capture and Change Tracking – Filestream columns (supported columns from same table

are supported)– Page, row and vardecimal storage compression– Replication– Sparse columns

• Data type limitations– Binary / varbinary / ntext / text / image / varchar (max) /

nvarchar (max) / uniqueidentifier / rowversion / sql_variant / decimal or numeric with precesion > 18 digits / CLR types / hierarchyid / xml / datetimeoffset with scale > 2

• You can prevent a query from using the columnstore index using the IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX query hint

Limitations

• Method 1: Disable the columnstore index

• Disable (or drop) the indexALTER INDEX my_index ON T DISABLE

• Update the table

• Rebuild the columnstore indexALTER INDEX my_index ON T REBUILD

Adding data to a table with a columnstore index

• Method 2: Use Partitioning

• Load new data into a staging table

• Build a columnstore indexCREATE NONCLUSTERED COLUMNSTORE INDEX my_index ON StagingT(OrderDate, ProductID, SaleAmount)

• Switch the partition into the tableALTER TABLE StagingT SWITCH TO T PARTITION 5

Adding data to a table with a columnstore index

• VertiPaq:

– PowerPivot for Excel

– PowerPivot for Sharepoint

– Analysis Services

– Database Engine – Apollo

• Use Apollo for relational data warehousing

– Large fact tables

– Ad hoc or reporting queries

– When you don’t need MDX

Apollo and VertiPaq

• 1 TB version of the TPC-DS database

• 1.44 billion rows in catalog_sales fact table

• 32 logical processor machine with 256 GB RAM

• Warm start

• Query

Performance example

SELECT w_city, w_state, d_year, SUM(cs_sales_price) AS cs_sales_priceFROM warehouse, catalog_sales, date_dimWHERE w_warehouse_sk = cs_warehouse_sk and cs_sold_date_sk = d_date_sk and w_state in ('SD','OH') and d_year in (2001,2002,2003)GROUP BY w_city, w_state, d_yearORDER BY d_year, w_state, w_city;

Performance example: Results

Total CPU time Elapsed time

No columnstore 502 sec 501 sec

Columnstore 31.0 sec 1.10 sec

Speedup 16X 455X

Summary: Apollo in a nutshell

Astonishing speedup for DW queriesGreat compression

Columnstore technology+

Advanced query processing

51

• SQL Server 2012 offers significantly faster query performance for data warehouse and decision support scenarios – 10x to 100x performance improvement

depending on the schema and query• I/O reduction and memory savings through

columnstore compressed storage• CPU reduction with batch versus row processing,

further I/O reduction if segmentation elimination occurs

– Easy to deploy and requires less management than some legacy ROLAP or OLAP methods • No need to create intermediate tables, aggregates,

pre-processing and cubes

– Interoperability with partitioning– For the best interactive end-user BI experience,

consider Analysis Services, PowerPivot and Crescent

Summary: SQL 2012 ColumStore

xVelocity in SQL 2012

No more Vertipaq, it’s now called xVelocity in-

memory technologies in SQL 2012

Thank You

Download - Agenda 10 Key SQL 2012 BI Innovations BI Semantic Model Project ‘Apollo’ Vertipaq xVelocity in SQL 2012

Top Related