Agenda
10 Key SQL 2012 BI InnovationsBI Semantic ModelProject ‘Apollo’VertipaqxVelocity in SQL 2012
10 Key SQL 2012 BI Innovations
1. BI Semantic Model2. Analysis Services
Tabular Mode1. With BIDS support
3. PowerPivot “2”4. Power View (SSRS)5. Self-Service Data
Alerts (SSRS)
6. Hadoop Big Data Integration
7. xVelocity–Columnstore Indexes
8. Geospatial Indexes9. Unstructured Data
Queries10.Data Quality Services– Updated Master
Data Services
Analysis Services: Tomorrow
Build on the strengths and success of Analysis Services and expand its reach to a much broader user base
Embrace the relational data model – well understood by developers and IT Pros
Bring together the relational and multidimensional models under a single unified BI platform—best of both worlds!
Provide flexibility in the platform to suit the diverse needs of BI applications
Analysis Services VisionBI Semantic
Model
BI Semantic Model
• One Model for All User Experiences
• Visualise analysis using your favourite tools• Model data the way you like• Store analytical data however it is best done
• BISM is a concept, not a product– Can be hosted in PowerPivot or SSAS
BI Semantic ModelOne Model for All User Experiences
BI Semantic ModelData Model
Business Logic/Queries
Data Access ROLAP MOLAP xVelocityDirectQuery
MDX DAX
Multi-dimensional
Tabular
YourApps
Reporting Services& Power View Excel PowerPivot
Databases LOB Applications Files OData Feeds Cloud Services
SharePointInsights
BI Semantic ModelWhat about existing Analysis Services applications?
New applicationsNew technology options
Existing applicationsBased on Unified Dimensional Model
Existing applicationsEvery UDM becomes a BI Semantic Model
Existing applicationsBased on Unified Dimensional Model
BI Semantic Model
Data model
Business logic and queries
Data access ROLAP MOLAP VertiPaqDirectQuery
MDX DAX
Multi-dimensional
Tabular
Third-partyapplications
ReportingServices Excel PowerPivot
Databases LOB Applications Files OData Feeds Cloud Services
SharePointInsights
BI Semantic Model: Architecture
Tabular ModeScaling PowerPivot to Enterprise Needs
• Model in PowerPivot– PowerPivot as source of SSAS Tabular Models– Excel for browsing and testing in SSDT
• All new PowerPivot features:– Diagrams, Measure Grid, KPIs, Hierarchies,
Perspectives, 30+ New DAX Functions
• …and, unique to SSAS Tabular Mode:– Row-level Security, Partitions, Large Tables
(>2 billion rows), Images, Memory Paging
BI Semantic Model
Data model
Business logic and queries
Data access
SQL Server Dynamics CRM
EndUser
Model Developer
VertiPaq
DAX
Tabular
Example: Power View Over a Sales Model
BI Semantic Model
Data model
Business logic and queries
Data access
SQL Server Dynamics CRM
EndUser
Model Developer
MDX DAX
Multi-dimensional
Tabular
VertiPaq
Example: Power View Over a Sales Model
BI Semantic Model
Data model
Business logic and queries
Data access
Oracle SAP
EndUser
Model Developer
MOLAP
MDX
Multi-dimensional
Example: Excel Over a Finance Model
BI Semantic Model
Data model
Business logic and queries
Data access
Oracle SAP
EndUser
Model Developer
MOLAP
DAX MDX
Multi-dimensionalTabular
Example: Excel Over a Finance Model
Demo….
Tour of SQL Server 2012BISM
BI Semantic Model
Rich data modeling
Sophisticated business logic using MDX and DAX
Fine-grained security – row/cell level
Multi-dimensional or tabular modeling
MDX and DAX for logic and queries
Cached and passthrough storage
VertiPaq for high performance, MOLAP for mission critical scale
DirectQuery and ROLAP for real-time
State-of-the-art compression algorithms
Flexibility Richness Scalability
Data Model
Tabular
Familiar model, easier to build, faster time to solution
Advanced concepts (parent-child, many-to-many) not available natively in the model… need calculations to simulate these
Easy to wrap a model over a raw database or warehouse for reporting & analytics
Multidimensional
Sophisticated model, higher learning curve
Advanced concepts baked into the model and optimized (parent-child, many-to-many, attribute relationships, key vs. name, etc.)
Ideally suited for OLAP type apps (e.g. planning, budgeting, forecasting) that need the power of the multidimensional model
Business Logic
DAX
Based on Excel formulas and relational concepts – easy to get started
Complex solutions require steeper learning curve – row/filter context, Calculate, etc.
Calculated columns enable new scenarios, however no named sets or calc members
MDX
Based on understanding of multidimensional concepts – higher initial learning curve
Complex solutions require steeper learning curve – CurrentMember, overwrite semantics, etc.
Ideally suited for apps that need the power of multidimensional calculations – scopes, assignments, calc members
Data Access and StorageVertiPaq
In-memory column store… typical 10x compression
Brute force memory scans… high performance by default… no tuning required
Basic paging support… data volume mostly limited to physical memory
MOLAP
Disk based store… typical 3x compression
Disk scans with in-memory subcube caching… aggregation tuning required
Extensive paging support… data volumes can scale to multiple terabytes
DirectQuery
Passes through DAX queries & calculations… fully exploits backend database capabilities
No support for MDX queries… no support for data sources other than SQL Server (in Denali)
ROLAP
Passes through fact table requests… not recommended for large dimension tables
Supports most relational data sources… no support for aggregations except SQL Server indexed views
Analysis Services Architecture
Excel Services
PowerPivot for SharePoint(Analysis Services)Excel
Browser
Analysis Services
BI Semantic Model
SharePoint
Reporting Services
Third Party Apps
PowerPivot for Excel
xlsx
xlsx
BI Development Studio
Personal BI Team BI Organizational BI
Project ‘Apollo’ in SQL 2012
Apollo: A new column-oriented query accelerator
• What is Apollo?
– What does column-oriented mean?
– How does it accelerate queries?
• When to create a columnstore index
• How to use Apollo
– Creating an index
– Running queries
– Loading data
• How does Apollo relate to VertiPaq and PowerPivot?
• Apollo is the code name for new functionality that is available
in SQL Server 2012
• It will substantially accelerate common data warehouse
queries
• Adds a column store option in SQL Server database engine
– New index type in the database engine
• Advanced query processing algorithms
– New batch mode processing
What is Apollo?
When to use Apollo
• Data warehousing
– Read-mostly workloads
– Star joins
– Process large amounts of data
• Generous amount of memory
– Best performance when data fits in memory
– Graceful degradation as fact table paged from disk
– Under severe memory constraints, falls back to row-at-a-time
processing
How does Apollo speed up queries? (1)
• Stores data column-wise
• Better compression
• Uses VertiPaq compression
technology
• Less IO
…
C1 C2 C3 C5 C6C4
How does Apollo speed up queries? (2)
• Fetches only needed columns from
disk
– Less IO
– Better buffer hit rates
C1
C2
C4 C5 C6
C3
SELECT region, sum (sales) …
34
• Columnstore indexes provide an easy
way to significantly improve data
warehouse and decision support query
performance against very large data
sets
• Performance improvements for
“typical” data warehouse queries from
10x to 100x
• Ideal candidates include queries
against star schemas that use filtering,
aggregations and grouping against
very large fact tables
Improved Data Warehouse Query performance
35
• You need to execute high performance DW queries against very large data sets?– In SQL Server 2008 and SQL Server 2008 R2
• OLAP (SSAS) MDX solution• ROLAP and T-SQL + intermediate summary tables, indexed views and aggregate
tables– Inherently inflexible
– In SQL Server 2012• You can create a columnstore index on a very large fact table referencing all
columns with supporting data types– Utilizing T-SQL and core Database Engine functionality – Minimal query refactoring or intervention
• Upon creating the columnstore index, your table becomes “read only” – but you can still use partitioning to switch in and out data OR drop/rebuild indexes periodically
What Happens When…
36
• Two complimentary technologies:– Storage
• Data is stored in a compressed columnar data format (stored by column) instead of row store format (stored by row). – Columnar storage allows for less data to be accessed
when only a sub-set of columns are referenced – Data density/selectivity determines how compression
friendly a column is – example “State” / “City” / “Gender”
– Translates to improved buffer pool memory usage
– New “batch mode” execution• Data can then be processed in batches (1,000 row
blocks) versus row-by-row• Depending on filtering and other factors, a query may
also benefit by “segment elimination” - bypassing million row chunks (segments) of data, further reducing I/O
How Are These Performance Gains Achieved?
37
OrderDate
20010701
…
20010702
…
…
20010703
…
…
…
…
20010704
…
Cost
2171.29
1912.15
2171.29
413.14
333.42
1295.00
4233.14
641.22
24.95
64.32
1111.25
ProductID
310
311
312
313
314
315
316
317
318
319
320
321
• Column Store (values compressed)
ProductID
OrderDate Cost
310 20010701
2171.29
311 20010701
1912.15
312 20010702
2171.29
313 20010702
413.14
• Row Store (Heap / B-Tree)
Column vs. Row Store
data page1000
ProductID
OrderDate
Cost
314 20010701
333.42
315 20010701
1295.00
316 20010702
4233.14
317 20010702
641.22
data page1001
data page2001
data page2000
data page2002
38
• Allows processing of 1,000 row blocks as an alternative to single row-by-row operations– Enables additional algorithms that can reduce CPU
overhead significantly– Batch mode “segment” is a partition broken into million
row chunks with associated statistics used for Storage Engine filtering
• Batch mode can work to further improve query performance of a columnstore index, but this mode isn’t always chosen:– Some operations aren’t enabled for batch mode:
• E.g. outer joins to columnstore index table / joining strings / NOT IN / IN / EXISTS / scalar aggregates
– Row mode might be used if there is SQL Server memory pressure or parallelism is unavailable
– Confirm batch vs. row mode by looking at the graphical execution plan
Batch Mode
39
• Performance gains can come from a combination of:– Columnstore indexing alone + traditional row
mode in QP– Columnstore indexing + batch mode in QP– Columnstore indexing + hybrid of batch and
traditional row mode in QP
Columnstore format + batch mode Variations
40
• SSMS
• T-SQL
Creating a columnstore index
41
• Index type– Columnstore indexes are always non-clustered and non-
unique– They cannot be created on views, indexed views, sparse
columns– They cannot act as primary or foreign key constraints
• Column selection– Unlike other index types, there are no “key columns”
• Instead you choose the columns that you anticipate will be used in your queries
• Up to 1,024 columns – and the ordering in your CREATE INDEX doesn’t matter
• No concept of “INCLUDE”• No 900 byte index key size limit
• Column ordering– Use of ASC or DESC sorting not allowed – as ordering is
defined via columnstore compression algorithms
Defining the Columnstore Index
Accelerating Data Warehouse Queries with SQL Server 2012 Columnstore Indexes
Demo..
43
• Supported data types– Char / nchar / varchar / nvarchar
• (max) types, legacy LOB types and FILESTREAM are not supported
– Decimal/numeric• Precision greater than 18 digits NOT supported
– Tinyint, smallint, int, bigint– Float/real– Bit– Money, smallmoney– Date and time data types
• Datetimeoffset with scale > 2 NOT supported
Supported Data Types
44
• Columnstore indexes cannot be used in conjunction with – Change Data Capture and Change Tracking – Filestream columns (supported columns from same table
are supported)– Page, row and vardecimal storage compression– Replication– Sparse columns
• Data type limitations– Binary / varbinary / ntext / text / image / varchar (max) /
nvarchar (max) / uniqueidentifier / rowversion / sql_variant / decimal or numeric with precesion > 18 digits / CLR types / hierarchyid / xml / datetimeoffset with scale > 2
• You can prevent a query from using the columnstore index using the IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX query hint
Limitations
• Method 1: Disable the columnstore index
• Disable (or drop) the indexALTER INDEX my_index ON T DISABLE
• Update the table
• Rebuild the columnstore indexALTER INDEX my_index ON T REBUILD
Adding data to a table with a columnstore index
• Method 2: Use Partitioning
• Load new data into a staging table
• Build a columnstore indexCREATE NONCLUSTERED COLUMNSTORE INDEX my_index ON StagingT(OrderDate, ProductID, SaleAmount)
• Switch the partition into the tableALTER TABLE StagingT SWITCH TO T PARTITION 5
Adding data to a table with a columnstore index
• VertiPaq:
– PowerPivot for Excel
– PowerPivot for Sharepoint
– Analysis Services
– Database Engine – Apollo
• Use Apollo for relational data warehousing
– Large fact tables
– Ad hoc or reporting queries
– When you don’t need MDX
Apollo and VertiPaq
• 1 TB version of the TPC-DS database
• 1.44 billion rows in catalog_sales fact table
• 32 logical processor machine with 256 GB RAM
• Warm start
• Query
Performance example
SELECT w_city, w_state, d_year, SUM(cs_sales_price) AS cs_sales_priceFROM warehouse, catalog_sales, date_dimWHERE w_warehouse_sk = cs_warehouse_sk and cs_sold_date_sk = d_date_sk and w_state in ('SD','OH') and d_year in (2001,2002,2003)GROUP BY w_city, w_state, d_yearORDER BY d_year, w_state, w_city;
Performance example: Results
Total CPU time Elapsed time
No columnstore 502 sec 501 sec
Columnstore 31.0 sec 1.10 sec
Speedup 16X 455X
Summary: Apollo in a nutshell
Astonishing speedup for DW queriesGreat compression
Columnstore technology+
Advanced query processing
51
• SQL Server 2012 offers significantly faster query performance for data warehouse and decision support scenarios – 10x to 100x performance improvement
depending on the schema and query• I/O reduction and memory savings through
columnstore compressed storage• CPU reduction with batch versus row processing,
further I/O reduction if segmentation elimination occurs
– Easy to deploy and requires less management than some legacy ROLAP or OLAP methods • No need to create intermediate tables, aggregates,
pre-processing and cubes
– Interoperability with partitioning– For the best interactive end-user BI experience,
consider Analysis Services, PowerPivot and Crescent
Summary: SQL 2012 ColumStore
xVelocity in SQL 2012
No more Vertipaq, it’s now called xVelocity in-
memory technologies in SQL 2012
Q & A
Thank You