THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

  • First steps with Scheduled Data Refresh for Power Query #powerbi #powerquery

    Just a few days before my session about Power Query at TechEd 2014, Microsoft released a new update that enables the scheduled data refresh of a Power Pivot workbook containing Power Query transformations.

    This is a very good news, because it enables the data refresh of a number of different data sources. Even if the number of providers supported by this release is limited (only SQL Server and Oracle), you can use a SQL Server database as a bridge to access different data sources through views using Linked Server connections.

    If you want to use this feature, first of all read carefully the Scheduled Data Refresh for Power Query blog post on MSDN web site. It guides you through are the steps required in order to enable the data source connection through the Data Management Gateway. As you will see, in reality you need to create the data source connections corresponding to the Power Query databases you use. Thus, in reality you might skip the data source configuration if you already have the corresponding databases enabled in the Power BI admin center. However, I suggest you to go through the steps described in that blog post at the beginning, because if the same database has two different drivers, it needs two different data sources. For this reason, I have a number of notes that might be helpful to avoid certain issues.

    • Power Query uses the .NET Framework Data Provider for SQL Server and Oracle Data Provider for .NET, whereas Power Pivot by default creates a SQL Server connection using the SQL Server Native Client 11.0 (SQLNCLI11).
      • Even if you already created a data source for a SQL Server database you refresh in a Power Pivot workbook, you have to create another data source for the same SQL Server database for Power Query, because you use two different drivers.
      • You might consolidate these data sources to only one, by changing the data provide in the advanced options of a Power Pivot configuration, but I am not sure this is a good idea. I would keep the two version of data sources, one for each provider, in case I use the same database in both connections
    • Power Query creates one connection string in Excel for each query you create. The connection string contains the entire transformation and when you copy it in the New Data Source page in Power BI admin, the internal query is analyzed to extract the required connection to SQL Server. If these connections are already configured as Power BI data sources, then you don’t need to do anything else. I suggest you to iterate all the queries you have following this step until you are confident of the internals and you are sure the required data sources are already available.
      • Even if you create a single query in M language accessing to different databases, the referenced connections will be found and each database will have a separate data source configuration in Power BI. I was worried that loading multiple tables from different database on the same server would have produced a single data source enabling to access all the databases on the server, but luckily this does not happen and security is preserved!
    • I spot an issue using certain DateTimeZone functions (DateTimeZone.FixedLocalNow, DateTimeZone.FixedUtcNow, DateTimeZone.LocalNow, and DateTimeZone.UtcNow) that seem not working with scheduled data refresh. You can read more about such issue in this thread on Power Query MSDN forum. I found a workaround using the Table.Buffer function, so that by stopping query folding the expression is not translated in SQL but evaluated directly by the Power Query engine. However, I hope this will be fixed soon.
    • A Power Query transformation that contains only a script, without accessing to any data source, currently is not refreshed. This would be useful for generating a Date table, I opened this other thread about this issue on the forum, I hope there will be news on that, too.
      • In the same thread you will find another tip: the literal in the form #literal, such as #table, are being mis-analyzed by scheduled refresh, but at least for this issue there are workarounds available, until the issue is not fixed by Microsoft.
    • You can use SQL Server views based on linked servers to overcome the limitation of providers currently supported by Data Management Gateway (which is the component used by scheduled data refresh).
    • Now that it is possible to publish SSIS packages as OData Feed Sources, you can expose a SQL Server view to Power BI, and accessing it from Power Pivot or Power Query, you can execute SSIS packages at refresh time. If the package is not too long to execute (it would timeout the connection), this is a smart way to arrange execution of some small “corporate ETL” in sync with the data refresh on Power BI, without relying on synchronized scheduling dates (which is always one more thing to maintain). This further extends the range of providers you can use with scheduled data refresh.

    I would like to get more detailed errors when something goes wrong and scheduled data refresh stops, but this is a good start.

  • LINQ to DAX project on CodePlex #dax #tabular #ssas

    Since its release, I've seen a number of scenarios where Analysis Services Tabular is the analytical engine of the reporting section in a larger system. In these conditions, at least a part of the queries sent to Analysis Services are DAX queries generated by code (as a consequence of user interaction or for other reasons).

    Since DAX knowledge is not very common among developers, having a LINQ to DAX Query provider is more than welcome to simplify DAX code generation. Dealogic, which is between the first companies I helped in Tabular adoption a few years ago, invested time in a first version of the LINQ to DAX provider that is now available on CodePlex and open to contributions.

    I looked at the features available and at the DAX code generated and it already looks very interesting. You can generate good DAX queries without knowing DAX (which I suggest to study anyway!), and LINQ to DAX does a lot of job splitting conditions between CALCULATETABLE, SUMMARIZE, ADDCOLUMNS and FILTER. Kudos to György Farkas for his effort!

  • How to handle fact tables with different granularities in #dax #powerpivot #tabular

    A common question I receive from Excel users learning Power Pivot is how to handle table that have different granularities. In reality, this terminology is not the one they use: the concept of “table granularity” is used mostly by Kimball practitioners, who immediately identify this scenario in a “two fact tables with different granularities” pattern. In Power Pivot this situation is often the reason of many troubles for Excel users, mostly because it is not clear how to correctly apply data modeling.

    Moreover, also who comes from a Multidimensional background does not know how to handle relationships between fact tables and a dimension at different granularities. You have the ability to define the dimension relationship at any (attribute) hierarchical level in Multidimensional, but it seems that this feature is not available in Tabular. In reality, we have two options, for example when you have data at the product category level and you want to join the product dimension:

    • Conform the relationship at the dimension granularity level (product category), hiding the measures coming from the fact table when the value is not valid (product name)
    • Import the fact table without defining a relationship in the data model, and simulate the relationship (at the product category level) using a DAX expression that applies a corresponding filter at query time

    I wrote an article about handling different granularities in the www.daxpatterns.com website, describing these two options in more details and providing practical examples. I think that both techniques are useful, because simulating the relationship in DAX is more flexible for many reasons, but there could be scenarios where the data volume suggests using an approaches based on a physical relationship creating a dummy value in the dimension. As always, I would use the simpler approach unless you think that performance are not good enough, and only at that point you evaluate which patterns performs better.

    As a side comment: I don’t know what approach is “simpler”, because simulating a relationship requires a more verbose DAX formula, whereas the relationship based on a dummy item requires some work at the ETL level, and pollutes the dimension table with items that are not strictly required (with the relevant exception of a Date table, where you might use an existing day as a dummy element). But the budget pattern will be the subject of a dedicated pattern very soon…

  • Upcoming conference speeches and workshops #ssas #tabular #dax #powerpivot

    Between May and July I and Alberto will be speaker at several conferences, and I think it could be useful to write a single blog post with a recap:

    We will also deliver several courses:

    See you around the world! 

  • The ISEMPTY function in #dax #powerpivot #tabular

    Microsoft silently added the ISEMPTY function to the DAX language in Analysis Services build 11.00.3368 (SQL Server 2012 SP1 CU4). This function is particularly important in DAXMD (when you use DAX to query a Multidimensional model), because produces a much better execution plan in OLAP than the alternatives based on COUNTROWS.

    There is an advantage in using it in Tabular/Power Pivot models, too, even if there is an issue using it in Power Pivot. You can upgrade Power Pivot on Excel 2010 (you can download the new version of Power Pivot for Excel 2010 as part of the cumulative update released for SQL Server), but you cannot upgrade Excel 2013. This is done only through Office updates and up to now such a feature has not been added to the released versions of Excel 2013 (at least until version 15.0.4605.1003). The funny thing is that, if you have Power Pivot for SharePoint, you can have a server that would be able to use new features (such as ISEMPTY function) but you are not able to create an Excel 2013 file using them!

    Last week I wrote a an article on SQLBI that describes the available syntaxes you can use to check empty table condition in DAX. You will find a few code examples there. I hope that Microsoft will soon release an upgrade in Power Pivot for Excel 2013, too.

  • ABC Analysis in #dax: complete pattern and other links #powerpivot #tabular

    I recently published the ABC Classification article in www.daxpatterns.com, which is a more structured and organized way that recap what I already described in this blog a few years ago (see ABC Analysis in PowerPivot). The pattern describe how to implement the classification through calculated columns, so we consider it a specialization of the Static Segmentation pattern. You can implement it also as a measure, implementing a Dynamic Segmentation, and Gerhard Brueckl already described such implementation in his blog. I am not sure about creating a pattern for the dynamic version, because of the performance issue that could arise even with a few thousands of items to classify.

    Any feedback on this is welcome, we already have other patterns in the working, but we can always change prioritization based on comments!

  • Create Excel Dashboards working on Excel for iPad #excel #ipad #dashboard

    I recently tried Excel for iPad and tried opening several workbooks. The results are pretty good, but I’ve found that it wasn’t possible to display certain workbooks. For example, opening a workbook that contains many CUBEVALUE formulas, I should have seen this result:

    Dashboard-Ok

    However, sometime I’ve got an error saying that the workbook cannot be updated (the screenshot is in Italian language because the test was made on an iPad configured with such a language)

    Dashboard-fail

    What’s happened? Thanks to Dan Parish (Microsoft), I realized that the problem was that opening the workbook, an automatic refresh was happening, and this stumbled into the CUBEVALUE function I extensively used. If this happen, you have two possible workarounds:

    1. Disable automatic refresh
    2. Replace CUBVALUE with GETPIVOTDATA

    The reason why conditional formatting is going away is because conditional formatting doesn’t work on error cells, and neither do charts.  If opening the workbook the automatic recalc starts, every CUBE* function fails (it’s not supported on Excel for iPad, because of the lack of support for external data in this version of Excel) because they can’t fetch any data. At this point I wondered why the automatic recalc was happening, because I didn’t enabled such a condition in the Excel workbook. However, Dan’s explanation is: The reason it wasn’t happening for me is because Excel (all the Excel’s across all platforms) also have very complicated logic to determine when they need to recalculate.  If you saved this from a previous version of Excel (earlier than 2013), or one of several other things occurred, we’ll recalculate it on open.  In some cases where we feel it’s safe not to however, we don’t as a performance improvement.  That’s what I was seeing.

    I am not sure that the different locale settings were the reason why automatic refresh was happening. However, when I set the calculation mode to Manual (Formulas ribbon –> Calculation Options –> Manual) the problem of automatic refresh went away, and I was able to open the workbook. If you don’t want to rely on disabling automatic refresh (after all, changing the setting to Manual affect usability when you open the workbook from Excel on Windows), another approach is relying on GETPIVOTDATA functions instead of CUBE*. The operation can be time consuming, but in reality it’s not so different than using CUBE* functions if you start designing the dashboard in this way. This will be a topic for a future, longer article. But there is an interesting advantage of using GETPIVOTDATA if you use Power Pivot or Tabular: performance are better than CUBE* functions (the opposite is true for OLAP cubes).

  • Create Custom Time Intelligence Calculations in #dax #powerpivot #tabular

    The recent Time Patterns article published in www.daxpatterns.com contains many DAX formulas that I hope will be useful to anyone is interest in implementing time-related calculations in DAX without relying on the Time Intelligence functions. There are several reasons for doing that:

    • Custom Calendar: if you have special requirement for your calendar (such as week-based and ISO 8601 calendars), you cannot use standard DAX time intelligence functions.
    • DirectQuery: if you enable DirectQuery, time intelligence functions are not supported.

    I chose to use a standard month calendar for the complete pattern, because it’s a more complete example of the calculation required. In fact, the ISO calendar has a simpler requirement for comparisons over different periods, and I also have another example for that published on the Week-Based Time Intelligence in DAX article published on SQLBI more than one year ago.

    As usual, feedbacks are welcome!

  • Calculate Distinct Count in a Group By operation in Power Query #powerquery #powerbi

    The current version of Power Query does not have a user interface to create a Distinct Count calculation in a Group By operation. However, you can do this in “M” with a simple edit of the code generated by the Power Query window.

    Consider the following table in Excel:

    DistinctPowerQuery_01

    You want to obtain a table containing the number of distinct products bought by every customer. You create a query starting from a table

    DistinctPowerQuery_02

    You keep in the query only the columns required for the group by and the distinct count calculation, removing the others. For example, select Products and Customers and right-click the Remove Other Columns menu choice.

    DistinctPowerQuery_03

    Select the Customer column and click the Group By transformation. You see a dialog box that by default creates a count rows column.

    DistinctPowerQuery_04

    This query counts how many transactions have been made by each customer, and you don’t have a way to apply a distinct count calculation. At this point, simply change the query from this:

    let
        Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
        R
    emovedOtherColumns = Table.SelectColumns(Source,{"Product", "Customer"}),
        GroupedRows = Table.Group(RemovedOtherColumns, {"Customer"}, {{"Count", each Table.RowCount(_), type number}})
    in
        GroupedRows

    To this:

    let
        Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
        RemovedOtherColumns = Table.SelectColumns(Source,{"Product", "Customer"}),
        GroupedRows = Table.Group(RemovedOtherColumns, {"Customer"}, {{"Count", each Table.RowCount(Table.Distinct(_)), type number}})
    in
        GroupedRows

    The Table.RowCount function counts how many rows exist in the group. By calling Table.DistinctCount here, you reduce the number of rows in the table to a list of distinct count values, returning a correct value.

    DistinctPowerQuery_05

    I hope Power Query team will implement a distinct count option in the user interface. In the meantime, you can apply this easy workaround.

  • Optimize DISTINCTCOUNT in #dax with SQL Server 2012 SP1 CU 9 #ssas #tabular

    If you use DISTINCTCOUNT measures in DAX, you know performance are usually great, but you might have also observed that the performance slow down when the resulting number is high (depending on other conditions, it starts decreasing between 1 and 2 million as a result).

    If you have seen that, there is a good news. Microsoft fixed this issue (KB2927844) in SQL Server 2012 SP1 Cumulative Update 9. Performance improvement is amazing. With this fix, I have queries previously running in 15 seconds (cold cache) now running in less than 5 seconds. So if you have databases in Tabular with a column containing more than 1 million distinct values, probably it’s better you test this update. It’s available also for Power Pivot for Excel 2010, but not for Excel 2013 (as far as I know – Power Pivot for Excel 2013 updates are included in Excel updates). You can request the SP1CU9 here: http://support.microsoft.com/kb/2931078.

    Please consider that the build of Analysis Services 2012 that fixes this issue is 11.0.3412 (so a following build should not require this hotfix – useful note for readers coming here in the future, when newer builds will be available).

    UPDATE 2014-07-22: for the following major release, Analysis Services 2014, the fix has been released after RTM. You need the Build 12.00.2342 (or a more updated one), which is available in Cumulative Update 1 for RTM.

  • Common Statistical #DAX Patterns for #powerpivot and #tabular

    DAX includes several statistical functions, such as average, variance, and standard deviation. Other common algorithms require some DAX code and we published an article about common Statistical Patterns on www.daxpatterns.com, including:

    I think that Median and Percentile implementation are the most interesting patterns, because performance might be very different depending on the implementation. I am sure that a native implementation in DAX of such algorithms would be much better, but in the meantime you can just copy and paste the formulas presented in the article!

  • How to implement classification in #DAX #powerpivot #ssas #tabular

    In the last two weeks we published two new patterns of www.daxpatterns.com:

    These two patterns offers solutions to the general problem of classifying an item by the value of a measure or of a column in your Power Pivot or Tabular data model. For example, you might create groups of products based on the price or on the volume of sales. The difference between the two techniques is that the static segmentation applies the classification using calculated columns (so it is calculated in advance - at refresh time - and is not subject to changes made to filter selection in queries), whereas the dynamic segmentation perform the classification at query time (so it is slower but it considers filters applied to queries).

    In my experience, many people want to use the dynamic approach, but in reality they often realize later that the static segmentation was the right choice, not just for performance but mainly for easiness of use.

  • Amsterdam PASS UG Meeting on March 18 #dax #tabular #powerpivot

    I will be in Amsterdam for the Advanced DAX Workshop on March 17-19, 2014 (hint: there are still a few seats available if you want to do a last-minute registration), and the evening of March 18 I will speak at a PASS Nederland UG meeting (between 18:30 and 21:00) that you can attend for free by registering here.

    Here are the two topics I will present at the UG meeting, in two relatively short 45 minutes sessions:

    DAX Patterns

    Do you know that great feeling when you are struggling to find a formula, spend hours writing non-sense calculations until a light turns into your brain, your fingers move rapidly on the keyboard and, after a quick debug, DAX starts to compute exactly what you wanted? This session shows some of these scenarios spending some time looking at the pattern of each one, discussing the formula and its challenge and, at the end, writing the formula. Scenarios include custom calendars, budget patterns and related distinct count. A medium knowledge of the DAX language will let you get the best out of the session.

    DAX from the Field: Real-World Case Studies

    In this session, we will dive into lessons from the field, where real customers are using DAX to solve complex problems well beyond the Adventure Works scenarios. How do you make a database fit in memory if it doesn’t fit? How do you handle billions of rows with a complex calculation? What tools can you use to benchmark and choose the right hardware? How do you scale up performance on both small and large databases? What are the common mistakes in DAX formulas that might cause performance bottlenecks? These are just a few questions we will answer by looking at best practices that are working for real customers. In this session, you will learn efficient DAX solutions and how far you can push the limits of the system.

  • LASTDATE vs. MAX? CALCULATETABLE vs. FILTER? It depends! #dax #powerpivot #tabular

    A few days ago I published the article FILTER vs CALCULATETABLE: optimization using cardinality estimation, where I try to explain why the sentence “CALCULATETABLE is better than FILTER” is not always true. In reality, CALCULATETABLE internally might use FILTER for every logical expression you use as a filter argument. What really matters is the cardinality of the table iterated by the FILTER, regardless of the fact it’s an explicit statement or an implicit one generated automatically by CALCULATETABLE.

    In addition to the article, there is a digression related to the use of time intelligence functions, which returns a table and not a scalar values. These functions (such as DATESBETWEEN and LASTDATE) might seem better than FILTER, but this is not necessarily true.

    For example, consider this statement:

    CALCULATE (

        SUM ( Movements[Quantity] ),

        FILTER (

            ALL ( 'Date'[Date] ),

            'Date'[Date] <= MAX( 'Date'[Date] )

        )

    )

    Can avoid the FILTER statement using DATESBETWEEN? Yes, we can replace the filter with the following expression:

    CALCULATE (

        SUM ( Movements[Quantity] ),

        DATESBETWEEN (

            'Date'[Date],

            BLANK(),

            MAX ( 'Date'[Date] )

        )

    )

    Is this faster? No. DATESBETWEEN is executed by the formula engine, it’s not better than FILTER. But there is more. You might wonder why I’m using MAX instead of LASTDATE. Well, in the FILTER example there was a semantic reason, I would have obtained a different result. LASTDATE returns a table, not a scalar value, even if it is a table containing only one row, which can be converted into a scalar value. More important, LASTDATE performs a context transition, which would transform the row context produced by the FILTER iteration into a filter context, hiding the existing filter context that I wanted to consider in my original expression. Now, in DATESBETWEEN I don’t have this issue, so I can write it using LASTDATE obtaining the same result:

    CALCULATE (

        SUM ( Movements[Quantity] ),

        DATESBETWEEN (

            'Date'[Date],

            BLANK(),

            LASTDATE ( 'Date'[Date] )

        )

    )

    But this is not for free. The LASTDATE function produces a more expensive execution plan in this case. Consider LASTDATE only as filter argument of CALCULATE/CALCULATETABLE, such as:

    CALCULATE (

        SUM ( Movements[Quantity] ),

        LASTDATE ( 'Date'[Date] )

    )

    At the end of the day, a filter argument in a CALCULATE function has to be a table (of values in one column or of rows in a table), so using a table expression in a filter argument is fine, because in this case a table is expected and there are no context transitions. But think twice before using LASTDATE where a scalar value is expected, using MAX is a smarter choice.

  • Expert Cube Development new edition now available! #ssas #multidimensional

    It is available the new edition of the advanced OLAP book, now called “Expert Cube Development with SSAS Multidimensional Models”. The previous edition was titled “Expert Cube Development with Microsoft SQL Server 2008 Analysis Services” and the biggest issue of the book was… the title! In fact, there haven’t been major changes in Multidimensional since that release, despite there has been 3 new releases of SQL Server (2008 R2, 2012 and 2014). For this reason we removed the version of the product from the title. In terms of content, don’t expect any particular change. We only added a small appendix about support for DAX queries, available with Analysis Services 2012 SP1 CU4 and later versions.

    I would like to highlight that, as Chris said:

    • If you already have the first edition, probably you’re not interested because you will not find new content here (just bug fixes and new screenshots)
    • The book is about SSAS Multidimensional models, if you are interested to Tabular we have another excellent book on that topic!
    • This is an advanced book, if you are a beginner with Multidimensional, wait to be more proficient before starting this book. The few negative reviews we received were from readers who tried to use this book to learn Multidimensional or as a step-by-step guide. We’d like to set the right expectations, avoiding you buy a book you don’t need.

    If you want to buy it, here are a few useful links:

    Have a good reading!

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement