THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Using and optimizing DirectQuery in #powerbi and #ssas #tabular

More than one year ago, I and Alberto Ferrari started to work on DirectQuery, exploring the new implementation appeared in Analysis Services 2016 for the Tabular models, which uses the same engine of Power BI. The previous version of DirectQuery appeared in SSAS Tabular 2012 was very slow and suffered of performance issues also using small models. We have never seen it adopted in a production environment, it was very hard to optimize, and there were too many restrictions to the data model (for example, no MDX queries = no PivotTable in Excel as a client, and no time intelligence = complex DAX code to write even for trivial calculations).

For these reasons, when we worked on Analysis Services 2016 and Power BI, we wanted to push the new version of DirectQuery, which solved many of the limitations of the first implementation, discovering its limits and where it could be used in a real production environment. The results of our efforts is now available in a white paper published by Microsoft: DirectQuery in Analysis Services 2016. The document contains a detailed step-by-step guide to implement DirectQuery in your model, and then investigate on how DirectQuery transforms a DAX or MDX query into one or more query in SQL. If you want a quick answer to the question “should we use DirectQuery?”, then the answer is…. it depends!

You have to set your expectations to a correct level. You can use DirectQuery successfully in certain conditions (database size, frequency of refresh, capacity and performance of your RDBMS server). You certainly have to test the particular workload you want to apply to DirectQuery. The best suggestion is: try and measure it. The whitepaper will provide you many hints about what you should try and what you should expect. Specific measures we made could be different today, because there are often new release of this technology, and we can expect more improvements going forward. But you can certainly start to spend time testing it, and if you understand how it work, you can figure out what are the scenarios where you might want to adopt it.

The white paper can be used also for Power BI: the engine is the same, we will probably see many improvements very soon, and maybe that the version of the engine you are using is already much better than the ones we used writing the whitepaper. However, the basic concepts are the same, and you can see and measure the improvements by repeating the same tests we described in the document.

Out of the records, and in a very informal way, I would like to highlight that you should try DirectQuery with a correct attitude. Don’t expect the magic box, and don’t assume it will be bad, it could surprise you in both cases! Most important, think about why you might want to use this technology.

The right reasons for using DirectQuery are:

  • You need real-time queries. Processing window creates latency that you cannot accept. Good reason. Just ask to yourself if your customers are ready for real-time queries in a dashboard. You might discover they are not.
  • You have a small database to query, that changes often.
  • You have a large amount of data, too big to fit in memory, and you *don’t* need queries returning a result in less than 2 seconds.

There are also two very wrong reasons for choosing DirectQuery:

  • Saving money. I think that the in-memory Tabular model has a lower cost considering hardware+software+maintenance in order to provide the same level of performance.
  • Improving query performance. On the same hardware, it’s very hard. On a different hardware, maybe… but see the previous point about the cost. And also consider network latency (you run the RDBMS on another server, right?).

I don’t want to scare you. I just want to set the right level of expectations. With that in mind, you can implement successful projects using DirectQuery today.

Published Wednesday, April 19, 2017 1:07 PM by Marco Russo (SQLBI)

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

jerome said:

another reason is having multiple Tabular models connected to the same data. and you dont want to spent hours in over processing the same data and fill your memory for redundant and less used data.

today we have to setup too many models because there is no "shared table" features in tabular, like we can have 1 dimension used in multiple cubes in OLAP and we can also have linked cubes in OLAP, which is not possible in Tabular.

another limitation in Tabular: we cant hide tables or columns for some users, they can access the full model all the time, so we have to create multiple models to implement a good security.

all of these duplicated models consumes a lot of RAM for the same information, so a direct query mode simplifies this, the data is stored only 1 time at the database level.

part of the added value of directquery, is having access to more details and more attributes, you can have thousands of columns which cost you no RAM. (this saved me a huge amount of rework)

finally... with SQL Server 2016 and vNext which offers clustered column store indexes to in-memory tables, you have a really fast database which is near the performance of Tabular. and using views which union all in-memory table + standard table, you can have a system in "multiple partitions" strategy. Like live updated data (or recent data) in-memory and historical data on the disk, the same way we can setup partitions in an OLAP cube (MOLAP and ROLAP partitions)

So does the directquery mode is really bad with a lot of features which completes the lack we have in Tabular today?

today I starts by creating Direct query models first, and processed models only if I have the time to process the data and the performance is really important.

At the end, we have the choices, and its what is the most important!

April 19, 2017 6:52 AM
 

Marco Russo (SQLBI) said:

Yes I agree that choices are important and I see your concern about duplicated tables if you have multiple models. SSAS 2017 has object-level security, and this will solve one of the issues you raised. Columnstore index is not as fast as VertiPaq, but if the db you have is small enough, performance provided by SQL could be enough for your needs.

May 4, 2017 10:55 PM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Privacy Statement