THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

  • How to handle fact tables with different granularities in #dax #powerpivot #tabular

    A common question I receive from Excel users learning Power Pivot is how to handle table that have different granularities. In reality, this terminology is not the one they use: the concept of “table granularity” is used mostly by Kimball practitioners, who immediately identify this scenario in a “two fact tables with different granularities” pattern. In Power Pivot this situation is often the reason of many troubles for Excel users, mostly because it is not clear how to correctly apply data modeling.

    Moreover, also who comes from a Multidimensional background does not know how to handle relationships between fact tables and a dimension at different granularities. You have the ability to define the dimension relationship at any (attribute) hierarchical level in Multidimensional, but it seems that this feature is not available in Tabular. In reality, we have two options, for example when you have data at the product category level and you want to join the product dimension:

    • Conform the relationship at the dimension granularity level (product category), hiding the measures coming from the fact table when the value is not valid (product name)
    • Import the fact table without defining a relationship in the data model, and simulate the relationship (at the product category level) using a DAX expression that applies a corresponding filter at query time

    I wrote an article about handling different granularities in the www.daxpatterns.com website, describing these two options in more details and providing practical examples. I think that both techniques are useful, because simulating the relationship in DAX is more flexible for many reasons, but there could be scenarios where the data volume suggests using an approaches based on a physical relationship creating a dummy value in the dimension. As always, I would use the simpler approach unless you think that performance are not good enough, and only at that point you evaluate which patterns performs better.

    As a side comment: I don’t know what approach is “simpler”, because simulating a relationship requires a more verbose DAX formula, whereas the relationship based on a dummy item requires some work at the ETL level, and pollutes the dimension table with items that are not strictly required (with the relevant exception of a Date table, where you might use an existing day as a dummy element). But the budget pattern will be the subject of a dedicated pattern very soon…

  • Upcoming conference speeches and workshops #ssas #tabular #dax #powerpivot

    Between May and July I and Alberto will be speaker at several conferences, and I think it could be useful to write a single blog post with a recap:

    We will also deliver several courses:

    See you around the world! 

  • The ISEMPTY function in #dax #powerpivot #tabular

    Microsoft silently added the ISEMPTY function to the DAX language in Analysis Services build 11.00.3368 (SQL Server 2012 SP1 CU4). This function is particularly important in DAXMD (when you use DAX to query a Multidimensional model), because produces a much better execution plan in OLAP than the alternatives based on COUNTROWS.

    There is an advantage in using it in Tabular/Power Pivot models, too, even if there is an issue using it in Power Pivot. You can upgrade Power Pivot on Excel 2010 (you can download the new version of Power Pivot for Excel 2010 as part of the cumulative update released for SQL Server), but you cannot upgrade Excel 2013. This is done only through Office updates and up to now such a feature has not been added to the released versions of Excel 2013 (at least until version 15.0.4605.1003). The funny thing is that, if you have Power Pivot for SharePoint, you can have a server that would be able to use new features (such as ISEMPTY function) but you are not able to create an Excel 2013 file using them!

    Last week I wrote a an article on SQLBI that describes the available syntaxes you can use to check empty table condition in DAX. You will find a few code examples there. I hope that Microsoft will soon release an upgrade in Power Pivot for Excel 2013, too.

  • ABC Analysis in #dax: complete pattern and other links #powerpivot #tabular

    I recently published the ABC Classification article in www.daxpatterns.com, which is a more structured and organized way that recap what I already described in this blog a few years ago (see ABC Analysis in PowerPivot). The pattern describe how to implement the classification through calculated columns, so we consider it a specialization of the Static Segmentation pattern. You can implement it also as a measure, implementing a Dynamic Segmentation, and Gerhard Brueckl already described such implementation in his blog. I am not sure about creating a pattern for the dynamic version, because of the performance issue that could arise even with a few thousands of items to classify.

    Any feedback on this is welcome, we already have other patterns in the working, but we can always change prioritization based on comments!

  • Create Excel Dashboards working on Excel for iPad #excel #ipad #dashboard

    I recently tried Excel for iPad and tried opening several workbooks. The results are pretty good, but I’ve found that it wasn’t possible to display certain workbooks. For example, opening a workbook that contains many CUBEVALUE formulas, I should have seen this result:

    Dashboard-Ok

    However, sometime I’ve got an error saying that the workbook cannot be updated (the screenshot is in Italian language because the test was made on an iPad configured with such a language)

    Dashboard-fail

    What’s happened? Thanks to Dan Parish (Microsoft), I realized that the problem was that opening the workbook, an automatic refresh was happening, and this stumbled into the CUBEVALUE function I extensively used. If this happen, you have two possible workarounds:

    1. Disable automatic refresh
    2. Replace CUBVALUE with GETPIVOTDATA

    The reason why conditional formatting is going away is because conditional formatting doesn’t work on error cells, and neither do charts.  If opening the workbook the automatic recalc starts, every CUBE* function fails (it’s not supported on Excel for iPad, because of the lack of support for external data in this version of Excel) because they can’t fetch any data. At this point I wondered why the automatic recalc was happening, because I didn’t enabled such a condition in the Excel workbook. However, Dan’s explanation is: The reason it wasn’t happening for me is because Excel (all the Excel’s across all platforms) also have very complicated logic to determine when they need to recalculate.  If you saved this from a previous version of Excel (earlier than 2013), or one of several other things occurred, we’ll recalculate it on open.  In some cases where we feel it’s safe not to however, we don’t as a performance improvement.  That’s what I was seeing.

    I am not sure that the different locale settings were the reason why automatic refresh was happening. However, when I set the calculation mode to Manual (Formulas ribbon –> Calculation Options –> Manual) the problem of automatic refresh went away, and I was able to open the workbook. If you don’t want to rely on disabling automatic refresh (after all, changing the setting to Manual affect usability when you open the workbook from Excel on Windows), another approach is relying on GETPIVOTDATA functions instead of CUBE*. The operation can be time consuming, but in reality it’s not so different than using CUBE* functions if you start designing the dashboard in this way. This will be a topic for a future, longer article. But there is an interesting advantage of using GETPIVOTDATA if you use Power Pivot or Tabular: performance are better than CUBE* functions (the opposite is true for OLAP cubes).

  • Create Custom Time Intelligence Calculations in #dax #powerpivot #tabular

    The recent Time Patterns article published in www.daxpatterns.com contains many DAX formulas that I hope will be useful to anyone is interest in implementing time-related calculations in DAX without relying on the Time Intelligence functions. There are several reasons for doing that:

    • Custom Calendar: if you have special requirement for your calendar (such as week-based and ISO 8601 calendars), you cannot use standard DAX time intelligence functions.
    • DirectQuery: if you enable DirectQuery, time intelligence functions are not supported.

    I chose to use a standard month calendar for the complete pattern, because it’s a more complete example of the calculation required. In fact, the ISO calendar has a simpler requirement for comparisons over different periods, and I also have another example for that published on the Week-Based Time Intelligence in DAX article published on SQLBI more than one year ago.

    As usual, feedbacks are welcome!

  • Calculate Distinct Count in a Group By operation in Power Query #powerquery #powerbi

    The current version of Power Query does not have a user interface to create a Distinct Count calculation in a Group By operation. However, you can do this in “M” with a simple edit of the code generated by the Power Query window.

    Consider the following table in Excel:

    DistinctPowerQuery_01

    You want to obtain a table containing the number of distinct products bought by every customer. You create a query starting from a table

    DistinctPowerQuery_02

    You keep in the query only the columns required for the group by and the distinct count calculation, removing the others. For example, select Products and Customers and right-click the Remove Other Columns menu choice.

    DistinctPowerQuery_03

    Select the Customer column and click the Group By transformation. You see a dialog box that by default creates a count rows column.

    DistinctPowerQuery_04

    This query counts how many transactions have been made by each customer, and you don’t have a way to apply a distinct count calculation. At this point, simply change the query from this:

    let
        Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
        R
    emovedOtherColumns = Table.SelectColumns(Source,{"Product", "Customer"}),
        GroupedRows = Table.Group(RemovedOtherColumns, {"Customer"}, {{"Count", each Table.RowCount(_), type number}})
    in
        GroupedRows

    To this:

    let
        Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
        RemovedOtherColumns = Table.SelectColumns(Source,{"Product", "Customer"}),
        GroupedRows = Table.Group(RemovedOtherColumns, {"Customer"}, {{"Count", each Table.RowCount(Table.Distinct(_)), type number}})
    in
        GroupedRows

    The Table.RowCount function counts how many rows exist in the group. By calling Table.DistinctCount here, you reduce the number of rows in the table to a list of distinct count values, returning a correct value.

    DistinctPowerQuery_05

    I hope Power Query team will implement a distinct count option in the user interface. In the meantime, you can apply this easy workaround.

  • Optimize DISTINCTCOUNT in #dax with SQL Server 2012 SP1 CU 9 #ssas #tabular

    If you use DISTINCTCOUNT measures in DAX, you know performance are usually great, but you might have also observed that the performance slow down when the resulting number is high (depending on other conditions, it starts decreasing between 1 and 2 million as a result).

    If you have seen that, there is a good news. Microsoft fixed this issue (KB2927844) in SQL Server 2012 SP1 Cumulative Update 9. Performance improvement is amazing. With this fix, I have queries previously running in 15 seconds (cold cache) now running in less than 5 seconds. So if you have databases in Tabular with a column containing more than 1 million distinct values, probably it’s better you test this update. It’s available also for Power Pivot for Excel 2010, but not for Excel 2013 (as far as I know – Power Pivot for Excel 2013 updates are included in Excel updates). You can request the SP1CU9 here: http://support.microsoft.com/kb/2931078.

    Please consider that the build of Analysis Services 2012 that fixes this issue is 11.0.3412 (so a following build should not require this hotfix – useful note for readers coming here in the future, when newer builds will be available).

    UPDATE 2014-07-22: for the following major release, Analysis Services 2014, the fix has been released after RTM. You need the Build 12.00.2342 (or a more updated one), which is available in Cumulative Update 1 for RTM.

  • Common Statistical #DAX Patterns for #powerpivot and #tabular

    DAX includes several statistical functions, such as average, variance, and standard deviation. Other common algorithms require some DAX code and we published an article about common Statistical Patterns on www.daxpatterns.com, including:

    I think that Median and Percentile implementation are the most interesting patterns, because performance might be very different depending on the implementation. I am sure that a native implementation in DAX of such algorithms would be much better, but in the meantime you can just copy and paste the formulas presented in the article!

  • How to implement classification in #DAX #powerpivot #ssas #tabular

    In the last two weeks we published two new patterns of www.daxpatterns.com:

    These two patterns offers solutions to the general problem of classifying an item by the value of a measure or of a column in your Power Pivot or Tabular data model. For example, you might create groups of products based on the price or on the volume of sales. The difference between the two techniques is that the static segmentation applies the classification using calculated columns (so it is calculated in advance - at refresh time - and is not subject to changes made to filter selection in queries), whereas the dynamic segmentation perform the classification at query time (so it is slower but it considers filters applied to queries).

    In my experience, many people want to use the dynamic approach, but in reality they often realize later that the static segmentation was the right choice, not just for performance but mainly for easiness of use.

  • Amsterdam PASS UG Meeting on March 18 #dax #tabular #powerpivot

    I will be in Amsterdam for the Advanced DAX Workshop on March 17-19, 2014 (hint: there are still a few seats available if you want to do a last-minute registration), and the evening of March 18 I will speak at a PASS Nederland UG meeting (between 18:30 and 21:00) that you can attend for free by registering here.

    Here are the two topics I will present at the UG meeting, in two relatively short 45 minutes sessions:

    DAX Patterns

    Do you know that great feeling when you are struggling to find a formula, spend hours writing non-sense calculations until a light turns into your brain, your fingers move rapidly on the keyboard and, after a quick debug, DAX starts to compute exactly what you wanted? This session shows some of these scenarios spending some time looking at the pattern of each one, discussing the formula and its challenge and, at the end, writing the formula. Scenarios include custom calendars, budget patterns and related distinct count. A medium knowledge of the DAX language will let you get the best out of the session.

    DAX from the Field: Real-World Case Studies

    In this session, we will dive into lessons from the field, where real customers are using DAX to solve complex problems well beyond the Adventure Works scenarios. How do you make a database fit in memory if it doesn’t fit? How do you handle billions of rows with a complex calculation? What tools can you use to benchmark and choose the right hardware? How do you scale up performance on both small and large databases? What are the common mistakes in DAX formulas that might cause performance bottlenecks? These are just a few questions we will answer by looking at best practices that are working for real customers. In this session, you will learn efficient DAX solutions and how far you can push the limits of the system.

  • LASTDATE vs. MAX? CALCULATETABLE vs. FILTER? It depends! #dax #powerpivot #tabular

    A few days ago I published the article FILTER vs CALCULATETABLE: optimization using cardinality estimation, where I try to explain why the sentence “CALCULATETABLE is better than FILTER” is not always true. In reality, CALCULATETABLE internally might use FILTER for every logical expression you use as a filter argument. What really matters is the cardinality of the table iterated by the FILTER, regardless of the fact it’s an explicit statement or an implicit one generated automatically by CALCULATETABLE.

    In addition to the article, there is a digression related to the use of time intelligence functions, which returns a table and not a scalar values. These functions (such as DATESBETWEEN and LASTDATE) might seem better than FILTER, but this is not necessarily true.

    For example, consider this statement:

    CALCULATE (

        SUM ( Movements[Quantity] ),

        FILTER (

            ALL ( 'Date'[Date] ),

            'Date'[Date] <= MAX( 'Date'[Date] )

        )

    )

    Can avoid the FILTER statement using DATESBETWEEN? Yes, we can replace the filter with the following expression:

    CALCULATE (

        SUM ( Movements[Quantity] ),

        DATESBETWEEN (

            'Date'[Date],

            BLANK(),

            MAX ( 'Date'[Date] )

        )

    )

    Is this faster? No. DATESBETWEEN is executed by the formula engine, it’s not better than FILTER. But there is more. You might wonder why I’m using MAX instead of LASTDATE. Well, in the FILTER example there was a semantic reason, I would have obtained a different result. LASTDATE returns a table, not a scalar value, even if it is a table containing only one row, which can be converted into a scalar value. More important, LASTDATE performs a context transition, which would transform the row context produced by the FILTER iteration into a filter context, hiding the existing filter context that I wanted to consider in my original expression. Now, in DATESBETWEEN I don’t have this issue, so I can write it using LASTDATE obtaining the same result:

    CALCULATE (

        SUM ( Movements[Quantity] ),

        DATESBETWEEN (

            'Date'[Date],

            BLANK(),

            LASTDATE ( 'Date'[Date] )

        )

    )

    But this is not for free. The LASTDATE function produces a more expensive execution plan in this case. Consider LASTDATE only as filter argument of CALCULATE/CALCULATETABLE, such as:

    CALCULATE (

        SUM ( Movements[Quantity] ),

        LASTDATE ( 'Date'[Date] )

    )

    At the end of the day, a filter argument in a CALCULATE function has to be a table (of values in one column or of rows in a table), so using a table expression in a filter argument is fine, because in this case a table is expected and there are no context transitions. But think twice before using LASTDATE where a scalar value is expected, using MAX is a smarter choice.

  • Expert Cube Development new edition now available! #ssas #multidimensional

    It is available the new edition of the advanced OLAP book, now called “Expert Cube Development with SSAS Multidimensional Models”. The previous edition was titled “Expert Cube Development with Microsoft SQL Server 2008 Analysis Services” and the biggest issue of the book was… the title! In fact, there haven’t been major changes in Multidimensional since that release, despite there has been 3 new releases of SQL Server (2008 R2, 2012 and 2014). For this reason we removed the version of the product from the title. In terms of content, don’t expect any particular change. We only added a small appendix about support for DAX queries, available with Analysis Services 2012 SP1 CU4 and later versions.

    I would like to highlight that, as Chris said:

    • If you already have the first edition, probably you’re not interested because you will not find new content here (just bug fixes and new screenshots)
    • The book is about SSAS Multidimensional models, if you are interested to Tabular we have another excellent book on that topic!
    • This is an advanced book, if you are a beginner with Multidimensional, wait to be more proficient before starting this book. The few negative reviews we received were from readers who tried to use this book to learn Multidimensional or as a step-by-step guide. We’d like to set the right expectations, avoiding you buy a book you don’t need.

    If you want to buy it, here are a few useful links:

    Have a good reading!

  • Implement Parameters using Slicers in #powerpivot #dax #tabular

    Apparently you cannot pass an argument to a DAX measure. In reality, you can interact with a slicer that has the only purpose of representing a parameter used by a DAX formula. You just create a table in the Power Pivot or Tabular data model, without creating any relationship with other tables in the same data model. This techniques is similar to the tool dimension you can implement in Multidimensional, but is simpler and somewhat more flexible. We described the DAX technique in Parameter Table pattern, published on www.daxpatterns.com.

    In the article we provided several examples, including how to implement cascading parameters using two slicers and presenting only valid combinations of parameters. If you miss the DateTool Dimension in Tabular, you will also see how to implement a Period table in DAX using the Parameter Table pattern, which is a good technique for injecting arguments and selecting algorithms in simulation models.

  • Connecting to #powerpivot from an external program (such as #Tableau)

    Many people requested me how to connect to Power Pivot from an external program, without publishing the workbook on SharePoint or on Analysis Services Tabular. I always said it is not possible (for both technical and licensing reasons), but someone observed that Tableau is able to extract data from a Power Pivot data model connecting directly to the xlsx file. I wanted to investigate how they solved the limitations that exists.

    From a technical point of view, you have to install a Tableau Add-In for Power Pivot Excel (it’s available for both 32 and 64 bit). Then, you connect using the Tableau Desktop software selecting the Microsoft Power Pivot connection. You choose a local Excel file and click Connect. The list of perspective appears in a combo box.

    connect

    You click ok and you navigate into the Power Pivot data model. But what’s happening? Tableau runs an Excel instance (probably by using Excel object model) and then connects through the Tableau Add-In for Power Pivot that you installed before. Probably this add-in acts as a bridge between the process running Excel and the process running Tableau. This solve the technical problem, and it would be interesting to know how to use the same add-in from other programs without having to write the same add-in again. I know many ISVs that would love to do that!

    But before starting your project in Visual Studio to do the same (after all, it shouldn’t be rocket science writing such a connector), consider the license agreement (EULA) of Office. It says that “Except for the permitted use described under "Remote Access" below, this license is for direct use of the software only through the input mechanisms of the licensed computer, such as a keyboard, mouse, or touchscreen. It does not give permission for installation of the software on a server or for use by or through other computers or devices connected to the server over an internal or external network.”. It seems we are in gray area here. The access to Excel is not direct. But at the same time, it is not made on another computer, and technically you are using keyboard, mouse and/or touchscreen when you are using Tableau Desktop.

    This is certainly an unsupported scenario (and if the background Excel process hangs for any reason, you have to kill it in Task Manager). But if the licensing allows that, or if Microsoft tolerate this, probably many companies writing software (I have a long list of requests I received…) could be interested in doing the same.

    I would love to hear some official word on this topic…

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement