ssas 2016 tabular direct query vs in memory - web viewsql server, azure sql database, oracle, and...

25
7/4/2016 Brett Powell SSAS Tabular 2016: Direct Query vs. In- Memory Summary One of the top features of SSAS 2016 is an enhanced Tabular Direct Query mode. Direct Query mode, which generates and passes client reporting queries to the data source system as SQL statements, will likely be the recommended solution in favor of the default Tabular In-Memory mode for many BI projects due primarily to its reduced administration and infrastructure costs as well as its advantages in data freshness and version control. However, SSAS Tabular In-Memory mode and the xVelocity Analytics engine may be the preferred choice for existing and new models even if the underlying data source is supported by Direct Query given potential performance advantages and modeling features. The purpose of this post to provide an overview of SSAS Tabular Direct Query and to the identify factors that will drive the Direct Query versus In-Memory decision. Please see the following two links for more detail: SSAS 2016 On Demand Webinar What's New in SSAS Direct Query Background Technically, SSAS Tabular has supported Direct Query mode since it was first released with SQL Server 2012 (1100 Compatibility Level). We must say ‘technically’ because it was very limited in terms of data sources supported, client tools supported, performance, and functionality. The only 1

Upload: phungdang

Post on 30-Jan-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

7/4/2016Brett Powell

SSAS Tabular 2016: Direct Query vs. In-MemorySummaryOne of the top features of SSAS 2016 is an enhanced Tabular Direct Query mode. Direct Query mode, which generates and passes client reporting queries to the data source system as SQL statements, will likely be the recommended solution in favor of the default Tabular In-Memory mode for many BI projects due primarily to its reduced administration and infrastructure costs as well as its advantages in data freshness and version control. However, SSAS Tabular In-Memory mode and the xVelocity Analytics engine may be the preferred choice for existing and new models even if the underlying data source is supported by Direct Query given potential performance advantages and modeling features. The purpose of this post to provide an overview of SSAS Tabular Direct Query and to the identify factors that will drive the Direct Query versus In-Memory decision.

Please see the following two links for more detail:

SSAS 2016 On Demand WebinarWhat's New in SSAS

Direct Query BackgroundTechnically, SSAS Tabular has supported Direct Query mode since it was first released with SQL Server 2012 (1100 Compatibility Level). We must say ‘technically’ because it was very limited in terms of data sources supported, client tools supported, performance, and functionality. The only source system supported was SQL Server, MDX-based client tools (ie Excel pivot tables) were not supported, and even then a combination of DAX query ‘chattiness’ and less than optimal SQL query plans resulted in poor performance. Additionally, you couldn’t define security roles in the model nor could you create calculated columns. There were two hybrid modes available in which one or more tables could be In-Memory while the other(s) were DirectQuery but this carried its own complexities such as differences in data freshness and reporting tools supported.

Note: For more information on Direct Query in SSAS Tabular 2012/2014 see links identified in the References section.

SSAS 2016 Direct Query addresses the following modern BI/DW realities and customer needs:

1

Page 2: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

Data platforms from other vendors such as Teradata and Oracle comprise a significant market share in addition to SQL Server and PDW/APS appliances

Modern relational data sources and structures can now better handle analytical workloads in addition to transactional processing. o Parallel query processing, columnar data structures, and in-memory data make dedicated SSAS infrastructure and optimizations

less necessary

Organizations that have invested in data warehouse technologies and supporting ETL and Master Data Management processes prefer to utilize the data warehouse directly; additional data movement, infrastructure, and administration is undesirable

Real-time or near real-time data access is increasingly a business requirement. A nightly or even hourly refresh job to load SSAS servers fails to answer the question “What’s happening right now?”

Support for a variety of reporting and analysis tools including MDX-based clients such as Excel are necessary for broad deployment and adoption of Direct Query models

2

Page 3: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

7/4/2016Brett Powell

SSAS Tabular 2016 Matrix: In-Memory Mode versus Direct QueryThe table below summarizes the primary factors and considerations in choosing the query mode of a new SSAS 2016 (1200 Compatibility) project as well as whether migrating an existing In-Memory model to Direct Query might make sense. Detail on each quadrant is provided below the table as well as a high level FAQ but please note that this is a Part I post – I’ll be adding more low level technical detail in the weeks that follow.

SSAS Tabular 2016 In-Memory versus Direct QueryAdvantages Disadvantages

In-Memory(Default)

xVelocity Engine Performanceo Fully In-Memoryo Columnar Store, 7-10X compressiono Data Cache and Segment Elimination

All DAX Functions Available and Optimized Calculated Tables More data sources supported

SSAS Server Hardware Requirements o RAM, NUMA, CPU Clock

Partition Design & Data Processing Required Administration, Monitoring, Security Roles Data Freshness Version Control (vs. Data Source) xVelocity knowledge required

Direct Query

No partitions and processing required Avoid SSAS Server Hardware Cost Data as fresh as source system Scale with source data warehouse

o Not limited by RAM or NUMA Data Source Security as an option

o Define RLS defined via DAX in the model Consistency (less version control issues)

Some DAX functions not optimized Potentially lower performance

o Dependent on data source No Calculated Tables Import from PBIX/XLSX with Power Query Limited MDX Support

o No Hierarchieso No Calculated Members

3

Page 4: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

7/4/2016Brett Powell

In-Memory: AdvantagesThere are very good reasons why this is the default setting in SSAS Tabular, why Vertipaq/xVelocity In-Memory is informally referred to as ‘The engine of the devil’, and why Power Pivot and Power BI Desktop models leverage this engine as well. If you were architecting a new BI/analytics engine with maximum query read performance as a top priority, you’d ultimately conclude that a columnar, in-memory, and highly compressed data store is preferable to disk-based, row-oriented and uncompressed data store. A columnar structure alone can greatly reduce the IO cost for most BI queries that only require a few columns, memory is increasingly less expensive and inherently fast, and the xVelocity Analytics Engine (Vertipaq) adds state of the art compression algorithms to reduce the amount of memory stored and queried along with ‘column segment elimination’ and data cache features to further improve query performance. All of this contributes to rapid query response times and speed-of-thought exploratory slicing of models for most deployments.

With In-Memory mode, you have the full arsenal of DAX functions at your disposal for measures, queries, and in defining row level security roles. This allows for very rich, complex analytics to implement business logic/rules and as a model designer/author you can remain focused on DAX data types and semantics – you don’t have to be mindful of differences between DAX and the relational data source as you do with Direct Query mode. For example, several common DAX functions and many of the 50 new DAX functions are not optimized for Direct Query mode and thus may present performance problems requiring an alternative design.

In terms of modeling functionality, you have everything SSAS 2016 Tabular (1200) offers (variables, bi-directional filtering, display folders, translations, etc) including the new calculated tables feature. Calculated tables are exclusive to in-memory mode and can help simplify models and/or aid query performance in many scenarios such as using a table expression at processing time to persist a small table in memory to support certain metrics. Additionally, the views used by the In-Memory tables and partitions often encapsulate some level of integration or business logic not persisted in the source system. It’s not recommended but the option to extend or customize a Tabular solution well beyond its source system(s) is common and since this ETL-like enhancement takes place during model processing query times are unaffected.

Finally, from a data source standpoint, you face very few limitations with in-memory mode. All common databases are supported including IBM DB2 and Informix and an in-memory model can leverage multiple data sources for its tables (though it can’t merge them as with Power Query/Query Editor yet) ranging from APS to a text or Excel file. Direct Query models (understandably) don’t have this flexibility or robust data source support.

4

Page 5: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

7/4/2016Brett Powell

In-Memory: DisadvantagesUnfortunately, there’s a price, sometimes a very significant price, in deploying SSAS Tabular In-Memory mode models – particularly for larger scale deployments. To begin with, as an in-memory engine you need large amounts of RAM such that all Tabular models remain in memory (and above paging and cache clearing thresholds) as well as to support data processing operations. To deploy large Tabular databases, which I’ll roughly define as 750 million plus row fact tables, BI servers with 800GB to 2TB+ of RAM are not uncommon. It’s also recommended to obtain CPUs with high clock rates to support single threaded formula engine evaluations, large CPU cache sizes, and as fast of RAM as possible– 2133Mhz DDR4 or faster. These hardware specifications generally require new, dedicated Tabular BI servers as the specs for existing VM host servers generally don’t align with these needs. Additionally, to support many concurrent users and queries multiple Tabular-dedicated BI query servers are often required with a network load balancer configuration. Procuring and configuring this infrastructure is a significant undertaking for many IT/BI organizations and greater concern, particularly for organizations that have invested in a data warehouse appliance and/or relational data warehouse architecture, is the extra layer and data movement to manage associated with this infrastructure.

Note: Yes, I’m aware of Azure cloud options and yes, this could well provide infrastructure cost and time savings but as of today the prospect of moving the data warehouse to the cloud is still just a future consideration for most organizations. On-Premise and Hybrid deployments (e.g. Power BI via Enterprise Gateway) is much more the norm.

Even if the price of the hardware isn’t an issue, an in-memory mode model’s scalability is limited by A) the cardinality of dimensions and B) SSAS Tabular lacks NUMA awareness. If you have a customer, product, or other dimension with 1.5M+ rows the columnar data structure of in-memory mode may struggle to deliver performance for queries involving these relationships that don’t contain selective filters on the given dimension. The cardinality issue is somewhat mitigated (more detail later) in SSAS 2016 but for large dimensions (due to slowly changing dimension processes, junk dimension designs, etc) cardinality is still a performance concern for In-Memory mode. If you deploy an in-memory mode model to a multi-socket motherboard you will undoubtedly encounter performance issues/limitations as Tabular is not NUMA aware as of SSAS 2016 GA. Therefore, the maximum amount of RAM supported on a single CPU socket still represents a bottleneck for in-memory models.

Note: Details (and work arounds) of the high grain dimensions and NUMA is outside the scope of this document.

There’s also very significant management and administrative overhead costs with large-scale SSAS in-memory models. In effect, Tabular in-memory models represent duplicated and isolated datasets from their data sources. New security roles have to be applied and maintained within the SSAS project. Fact table partitions have to be carefully designed and maintained to balance many factors (availability, compression, resource usage, etc). Processing jobs in the form of PowerShell scripts, SSIS packages, SQL Agent or other tools have to be developed, scheduled and administered to keep the in-memory models updated.

5

Page 6: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

Once deployed In-Memory mode model tuning and monitoring has its own nuances that require a deeper knowledge of the underlying engine (e.g. ‘materialization’) and DAX programming language – the IT/BI team cannot directly leverage their relational database and SQL knowledge. Given all of this it’s not uncommon for BI development resources to be significantly engaged on management and administration tasks or for BI/IT teams to consider hiring a dedicated SSAS database administrator.

Note: In the event that historical data warehouse data is updated, cleansed, or augmented (not uncommon, particularly for newer environments) it’s then necessary to re-process the SSAS Tabular In-Memory partitions and/or tables.

Beyond the high infrastructure and administration costs, which alone may strongly sway decisions toward Direct Query, there are also analytical limitations and issues with In-Memory mode. Data freshness is limited by the need to load the data from the source system and re-calculate all relationships and hierarchies. Therefore, given the timing gap between data source update and the Tabular processing job, In-Memory mode models are always, on some minimal level at least, representing outdated data to users. Additionally, many reports in an organization may source the relational database and thus, depending on timing, produce a different result than the in-memory Tabular model.

6

Page 7: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

7/4/2016Brett Powell

Direct Query: AdvantagesThe top benefit of Direct Query is the elimination of the extra layer of data and data movement without sacrificing the rich semantic layer exposed to users and client tools (KPIs, Hierarchies (in DAX clients), metric folders). In short, you can use Analysis Services for analytics and leave the data storage and admin/management aspects, optionally the security as well, for the source system. Like magic, SSAS will convert an incoming DAX or MDX query, such as from a Power BI dashboard or an Excel pivot table, respectively, into an efficient SQL statement and return the results from the source. The DAX queries submitted to SSAS 2016 are dramatically less ‘chatty’ or verbose than prior versions (for both Direct Query and In-Memory) and Direct Query mode has improved logic in generating more performance SQL code.

Direct Query benefits from ‘Super DAX’ (insert link) features such as measure fusion and join elimination. For additional detail on Direct Query performance I’d recommend the free ‘What’s new with SSAS webinar?’ (here) and the overall ‘what’s new’ (insert link)

The scalability issues with dimension granularity and NUMA are also addressed via Direct Query. For example, a DAX query involving a large customer or products dimension (1.5M+ rows) with limited filtering might be parallelized across multiple nodes in a direct query scenario (e.g. Teradata, PDW/APS). Additionally, the source system for Direct Query is often already configured for large scale databases and/or can be adjusted to meet greater query demands and processing workloads via a variety of options including Columnar and in-memory storage options of their own.

Direct Query also eliminates the data freshness and version control issues with In-Memory mode. A non-SSAS based report will tie to the SSAS-based report provided equal queries and execution times. If the source system supports real-time or near real-time visibility then this data can be exposed via SSAS.

7

Page 8: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

7/4/2016Brett Powell

Direct Query: DisadvantagesThe main downside of Direct Query is the potential loss of performance relative to the xVelocity In-Memory data store. In-Memory mode is architected from the ground up with BI in mind whereas the Direct Query source system may have other priorities or less advanced features. For example, SQL Server 2008 is supported as a Direct Query data source and the non-clustered/secondary column store index wasn’t even available for that release. Additionally, many organizations don’t have well-defined and de-normalized schemas to support reporting. In many environments the product ‘dimension’ may not be a single table but rather represent a very complex SQL statement involving many tables and views – and other organizations’ source hardware isn’t well suited for analytical workloads. The In-Memory model often benefits in terms of performance from these joins, filters and derived columns already being resolved and the results loaded/compressed but a Direct Query model would have to navigate this logic at query time.

A related and important consideration is the potential loss of modeling flexibility with Direct Query. The new calculated tables feature is not available to Direct Query models and, cautiously assuming Power Query is integrated into SSDT later this year, then it would be much easier to import/migrate Power Pivot and Power BI Desktop files to SSAS Tabular In-Memory mode as these files often contain some level of query transformation. Although all DAX functions are technically supported by Direct Query not all of these are ‘optimized’ for Direct Query per the Restrictions section and link. The ‘non-optimized’ functions, some of which are common and important to models, cannot be used whatsoever in Direct Query calculated columns.

Note: In a future post or document I’ll go deeper into the DAX to Direct Query limitations.

From a reporting and analysis standpoint if MS Excel plays a primary role as the client reporting tool there are significant limitations with Direct Query models to be aware of. The model’s user defined hierarchies will not be visible in the field list rendering the model much less user-friendly (to Excel clients, not PBI). Additionally, any ‘Calculated Members’ you might define on top of an SSAS connection such as with OLAP PivotTable Extensions (see link) to extend a model are not supported as well.

8

Page 9: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

7/4/2016Brett Powell

Restrictions on Direct QuerySee: Direct Query Restrictions

Data Sources:o Only a single relational database per Direct Query model

SQL Server, Azure SQL Database, Oracle, and Teradatao Stored procedures cannot be specified in a SQL statement to define tables when using Data Import Wizard

“In short, all DAX functions are supported for Tabular 1200 CL DQ models”o However, not all functions optimized for tabular 1200 dQ modelso Not all functions are supported for all formula typeso We can put DAX functions into two camps: optimized and non-optimized

See: DAX Limitations

I intend to analyze and discuss these limitations in more detail in a future document or blog post.

Calculated Tables are not supported in Direct Query modelso You do have calculated columns, however

DAX Formulas:o DAX formulas containing elements that cannot be converted into SQL syntax will return validation errorso This is mostly limited to certain DAX functions.

Implicit measures are supported MDX Client (e.g. Excel) Limitations

o No session-scope statements such as calculated members Can use query-scope constructs such as WITH

o No User Defined Hierarchies (model hierarchies will not be exposed) o No native SQL queries

9

Page 10: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

The Decision: In Memory or Direct Query?

The short answer is A) it depends on many factors (e.g. What’s most important to you? What’s your current state?) and B) testing – significant testing and analysis is needed for any model that’s more than a pure pilot/POC.

At a very high level my current thought with Direct Query is to prove/discover that In-Memory mode is necessary for a given project for the reasons identified. If the BI/Analytical needs can be met by Direct Query without major changes to the source system or extensive work-arounds within SSAS, then Direct Query can deliver a simplified BI architecture and provide fresher data access. If Direct Query mode is an option given source system support and available source system resources for the query workload, the essential question with In-Memory (Default) mode becomes:

“Does the performance and analytical features of the xVelocity storage engine and the DAX query engine, respectively, justify the hardware and administration costs as well as data freshness limitations?”

10

Page 11: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

Note: It could be argued “What about SSAS Multidimensional - couldn’t we use ROLAP Multidimensional instead of Tabular Direct Query?”. It’s far outside the scope of this document to go into Multidimensional versus Tabular so I’ll just say for now that Tabular and the DAX programming language is more aligned with MSBI roadmap/investment and its close relationship with Power BI and inherent simplicity and speed makes Tabular the clear choice for any new SSAS project. Given SSAS 2016 enhancements it may also make sense to migrate existing Multidimensional models to SSAS Tabular.

11

Page 12: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

7/4/2016Brett Powell

Decision Process

1. You can start by determining if your environment/Data Source is supported by Direct Query mode:

2. If the data source is supported, deploy direct query POC model and start high level testing of essential reporting queries and scenarios.

3. If Step 2 is successful, dive deeper into the complex analytics the model will need to support (or might) and analyze the workload on the source system. For example

12

Page 13: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

7/4/2016Brett Powell

Other Considerations Is an existing In-Memory Model Slow?

o Is this due to something with the In-Memory mode itself OR is the model implemented incorrectly? It might be a bit of both.

o You may have 1-2M+ distinct values

Get Going (expose the data we have to users) with minimal cost or time investmento Just point DQ model to source tables with essential views and joins and deployo Just allow the business to look at the data (sums, average), relationshipo (Could think of Power Pivot/Power BI Desktop/Tabular as POC tool – this takes that to a higher level – can get up and running

even faster)

How familiar is the BI team with DAX?o If new to DAX and only going to use basic functions then Direct Queryo If advanced DAX knowledge is present or needed then this favors In-Memory mode

Level of experience with the xVelocity In-Memory Analytics Engine and Partition Processing?o Does your BI team understand the structure and organization of the Vertipaq engine and how to build and manage processing

jobs?

Does your model need the non-optimized DAX functions?o Which ones (RANKX, TOPN, CONTAINS, LOOKUPVALUE?)o How does your Direct Query model perform with these?

Is it an option to tweak your data source to better support

Have we already invested in fast, scalable data source?o SQL Server, PDW or APS (Teradata, Oracle, others

Should we optimize or enhance our data warehouse source or invest in In-Memory BI Servers (1TB of RAM)?o Example: Build or enhance data warehouse source objects like a new Fact table or a columnar data structure available at source

13

Page 14: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

Direct Query FAQs: Are KPIs supported?

o Yes Are Perspectives supported?

o Yes Are the new DAX Variables supported?

o Yes Are the new bi-directional relationships supported?

o Yes Are the new Calculated Tables supported?

o No, but you can write calculated columns How do I implement a change to the model?

o Simple, just re-deploy from SSDT and run a process re-calc operation Are cube functions from Excel supported?

o Yes! Can we switch from an in-memory model to a direct query model and vice versa should circumstances change?

o Will there be differences in result of queries due to different data types between relational source and Vertipaq engine?o Will there be errors in functions or comparisons (e.g. DAX forgiving with different data types)o Is there something we would be converting or using in DAX that isn’t optimized for DQ

Are Translations supported?o (Model – Translations)o Do you see summarize columns

Are reporting properties (e.g. Default Field Sets) supported?o Ye

14

Page 15: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

Links/References:

What’s New in SSAS?https://blogs.msdn.microsoft.com/analysisservices/2016/03/25/on-demand-webinar-whats-new-for-sql-server-2016-analysis-services-available-now/

What’s New Overview:https://msdn.microsoft.com/en-us/library/bb522628.aspx

Super DAXhttps://blogs.msdn.microsoft.com/analysisservices/2015/09/02/whats-new-in-microsoft-sql-server-analysis-services-tabular-models-in-sql-server-2016-ctp-2-3/

Direct Queryhttps://msdn.microsoft.com/en-us/library/hh230898.aspx

DAX Compatibilityhttps://msdn.microsoft.com/library/mt723603(SQL.130).aspx

Direct Query and Hybrid Mode for SSAS Tabular 2012-2014http://www.mssqlgirl.com/slide-deck-directquery-vs-vertipaq-for-pass-dwbi-vc.htmlhttp://www.mssqlgirl.com/hybrid-mode-in-tabular-bi-semantic-model-part-1.htmlhttp://www.mssqlgirl.com/hybrid-mode-in-tabular-bi-semantic-model-part-2.html

15

Page 16: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

Direct Query Test Project Images

16

Page 17: SSAS 2016 Tabular Direct Query vs In Memory - Web viewSQL Server, Azure SQL Database, Oracle, and Teradata. Stored procedures cannot be specified in a SQL statement to define tables

From DAX Query to SQL Statement

17