THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

  • A strange behavior of AutoExist for MDX on Tabular models #powerpivot #ssas #tabular

    Alberto Ferrari wrote an interesting article about a strange behavior of AutoExist in normalized data models. I always say that a star schema is the best practice in Power Pivot and Tabular data modeling. The issue described by Alberto is another good reason to avoid snowflake schemas.

    I think that an example is better than many words. Consider this simple measure working in a star schema where all product attributes (such as Category and Subcategory) are in the same denormalized DimProduct table:

    SalesOfBikes := CALCULATE ( [Sales], DimProduct[Category] = "Bikes" )

    If you have a snowflake schema with DimProduct, DimProductSubcategory and DimProductCategory tables, you have to write a much longer and complex DAX formula in order to obtain the same result:

    SalesOfBikes :=
    CALCULATE (
        [Sales],
        DimProductCategory[EnglishProductCategoryName] = "Bikes",
        CALCULATETABLE (
            DimProductCategory,
            FILTER (
                ALL ( DimProductSubcategory ),
                IF (
                    ISFILTERED ( DimProductSubcategory[EnglishProductSubcategoryName] ),
                    CONTAINS (
                        VALUES ( DimProductSubcategory ),
                        DimProductSubcategory[ProductSubcategoryKey], DimProductSubcategory[ProductSubcategoryKey]
                    ),
                    TRUE
                )
            )
        )
    )

    Which seems crazy, and actually it is…

    The reasons are interesting and well described in the AutoExist and Normalization article on SQLBI.

  • Questions and Answers about SSAS Tabular Models #ssas #tabular

    I recently delivered the online session Create your first SSAS Tabular Model at 24HOP Pass Summit, which recording is now available here. I received many questions and did not have enough time, so I answer now in this blog post.

    How do you prevent a user from aggregating certain measures where the result would be invalid (example: unit margin %)?
    In Tabular you do not have the notion of “aggregation”. Every measure evaluates a DAX expression in a particular filter context. Imagine this as a SELECT statement in SQL that consider only the rows filtered by a WHERE condition. You might “remove” the visualization of a value in a measure by using DAX functions such as ISFILTERED, ISCROSSFILTERED and HASONEVALUE. Since you do not have the ability to “intersect” expressions in different dimensions as you can do in MDX using Tool dimensions, you do not have the problem of invalidate certain combinations (such as “Unit” and “Margin %”.

    How do you incrementally refresh this new Tabular model that was created?
    You can add data to an existing partition, create new partitions or reprocess existing partitions.

    Can you connect to SSRS report from Power View to drill down to the actual data rows?
    Currently this is not available. In general, you cannot add link to external URLs in the current version Power View.

    Can you create those views within the model instead of in the database, like you can with cube creation?
    No, in Tabular you do not have the notion of “Data Source View” like the one you have in Multidimensional. By the way, I do not consider a “best practice” embedding queries in a DSV in Multidimensional. If you can create views on the relational database, you simplify troubleshooting in case of data quality issues reported by end users (any DBA can check the views used, even without any knowledge about BI development tools).

    Can many to many relationships work in tabular model?
    In this version, you cannot create a many-to-many relationship directly in the data model, but you can apply many-to-many relationships in DAX formulas. The good news is that performance are usually faster than equivalent models in Multidimensional.

    Any changes for Tabular model between 2012 and 2014 versions of SSAS?
    No, there are no changes in Analysis Services between 2012 and 2014. SQL Server 2014 is a release that added new features only to the relational engine of SQL Server.

    Can you give a few examples of benefits over using multi-dimensional cubes?
    Tabular is easier to use, it is usually faster and it requires no maintenance (Multidimensional requires maintenance of aggregations as data volume grows and data distribution changes). Multidimensional has features not available in Tabular (e.g. custom rollup formulas, MDX Script, dynamic formatting for measures).

    How about the role playing dimensions concept can work in Tabular model? Is there an equivalent of role-playing dimensions in Multidimensional in Tabular?
    Role-playing dimensions are a usually a bad idea in Multidimensional, because you cannot rename hierarchies, attributes and member names. This result in confusing pivot table when you browse the data, considering that the only difference is the dimension name, which is not directly visible in the pivot table itself.
    Tabular does not support role-playing dimensions, but you can overcome that limitation by using DAX, enabling inactive relationships for specific calculations (you can define multiple relationships between the same tables in Tabular). However, if you want to offer navigation in different role-playing dimensions to the user, the best practice for both Multidimensional and Tabular is to import the same table multiple times, renaming data and metadata.

    Can you install regular SSAS and the Tabular model on the same server?
    Yes, you can install several instances of SSAS on the same server. You run the setup multiple times and choose for each instance whether it has to run as Multidimensional or Tabular. You make this choice during the setup.

    Can you use a server based Tabular model for O365 Power BI sheets? (on premise data – SharePoint cloud spreadsheets)
    Not yet – at the moment (September 2014) you can only publish a Power Pivot workbook and then refresh it getting on-premise data through the Data Management Gateway.

    Can I create a Tabular Model using a query as the source?
    Yes, but remember that this is not a best practice. Creating SSAS Tabular or Multidimensional models, the best practice is getting data from SQL views, without modifying the query on SSAS side. In this way, the content of a Tabular table will match the result of a view in SQL. This simplifies the maintenance and the support operation. Any DBA can check the result of a SQL view without having to open an Analysis Services project just to figure out where actual data comes from.

    How I can manage access of the data by level (For example: first group have access to 2014 year, but second group have access to all periods)?
    Role-based security allows you to create row-level security for each table in the data model. Thus, you can filter the rows of the tables you want to hide to a certain group of user. You define filter conditions using logical DAX expressions that are evaluated when each user creates a connection to SSAS Tabular model.

    Does the language M has something to do with the SQL server or it is used only in Excel?
    The language “M” is used only by Power Query, which is used only in Excel and in Data Management Gateway today. The “M” language can produce transformations in SQL, but you cannot obtain an “M” version of a SQL query.

    What happens if the source data of a Tabular model is not a star schema?
    A star schema is the best data model for Tabular. However, you can have more complex data models, but keep in mind that more tables and relationships might cause slower performances at query time.

    How do you handle multiple dates in a fact table that you want to attach to a date table?
    This is a question similar to the role-dimension one. You can import the Date table multiple times (possibly by renaming columns and content, reflecting the “role” of the dimension in these names), or you can create multiple relationships between Date dimension and Fact table, activating one relationship for each measure through the USERELATIONSHIP function in a CALCULATE statement.

  • Advanced DAX Workshops in the upcoming months –#dax #tabular #powerpivot

    The adoption of DAX is slowly growing and we can see that from comments to article, blog posts, web site traffic (including DAX Formatter usage). For this reason, the demand for more advanced training about DAX is also increasing. We had great feedback from our last white papers (Using Tabular Models in a Large-scale Commercial Solution and SSAS Tabular as Analytical Engine) and we see that this reflect in a broader adoption. For this reason, also the demand for more advanced DAX content is growing, and we planned a number of courses in the next future.

    I and Alberto Ferrari will travel a lot in the upcoming months to deliver many Advanced DAX workshops, but we still have some more introductive training coming soon.

    This is the list of upcoming Advanced DAX Workshop (only in Europe by now, hopefully US coming next…):

    We will have two pre-conferences at PASS Summit (3-4 November, 2014 – Seattle, WA):

    If you want to learn SSAS Tabular starting from scratch, you have these other options to attend the SSAS Tabular Workshop:

    We are also working on other content for the next months, and remember that we can deliver onsite training if your team is too large and/or too far from one of the public classroom sites. I look forward to see new things announced in the next weeks/months, and I am pretty sure DAX will grow its popularity (despite the lack of a good editor!).

  • Simulate the SQL WHERE condition in DAX–#dax #powerpivot #ssas

    I recently wrote a short article about filtering data in DAX. In the last month I’ve seen a growing number of people with a SQL background going to Tabular, without having a previous experience of MDX. In my opinion, learning DAX is easier if you try to match its features to SQL rather than to MDX. The reality is that DAX is somewhere in the middle, because join between tables can be implicit in a query (like in MDX), but you can create filters that resembles existing “virtual relationships” (like in SQL with explicit JOIN).

    For this reason, I recently restarted writing articles of the series “From SQL to DAX”, and I already published

    I am already working on other articles to enhance the series – feedback and suggestions for the future topics are welcome!

  • MDX Studio refresh

    If you work with MDX, using MDX Studio is a must, but it has not been updated for years. But community is great and Ihor Bobak has been able to apply two fixes: performance counters work with named instance and there is a single version regardless of the client libraries, so you have only one executable working with all available versions of Analysis Services (2005, 2008, 2008 R2, 2012, and 2014).

    You can download this refreshed version of MDX Studio from http://sql.bi/mdxstudio (we also have a nice shortened path now!).

  • Embed Analysis Services Tabular in your service or application #ssas #tabular

    Since 2012 I have seen many companies adopting SQL Server Analysis Services Tabular as the analytical engine for their product or service. I think that this is still a fraction of the companies that might do the same choice. Why? Because…

    The features already existing in Tabular might be enough to justify its adoption as analytical engine.

    This short statements needs more explanations, so I wrote a longer article titles SSAS Tabular as Analytical Engine, which is available also as a downloadable PDF that can be read offline and shared by mail. There are many other companies that are looking for an analytical engine for their applications. I hope that the experiences I tried to share will help these companies to better evaluate whether Tabular could be a model good for their needs or not.

    Have you made this choice? Have you considered (and maybe adopted) other products in one of the described scenarios?
    Feedbacks are welcome!

  • Possible SWITCH Optimization in DAX – #powerpivot #dax #tabular

    In one of the Advanced DAX Workshop I taught this year, I had an interesting discussion about how to optimize a SWITCH statement (which could be frequently used checking a slicer, like in the Parameter Table pattern).

    Let’s start with the problem. What happen when you have such a statement?

    Sales :=
        SWITCH (
            VALUES ( Period[Period] ),
            "Current", [Internet Total Sales],
            "MTD", [MTD Sales],
            "QTD", [QTD Sales],
            "YTD", [YTD Sales],
             BLANK ()
        )

    The SWITCH statement is in reality just syntax sugar for a nested IF statement. When you place such a measure in a pivot table, for every cell of the pivot table the IF options are evaluated. In order to optimize performance, the DAX engine usually does not compute cell-by-cell, but tries to compute the values in bulk-mode. However, if a measure contains an IF statement, every cell might have a different execution path, so the current implementation might evaluate all the possible IF branches in bulk-mode, so that for every cell the result from one of the branches will be already available in a pre-calculated dataset.

    The price for that could be high. If you consider the previous Sales measure, the YTD Sales measure could be evaluated for all the cells where it’s not required, and also when YTD is not selected at all in a Pivot Table. The actual optimization made by the DAX engine could be different in every build, and I expect newer builds of Tabular and Power Pivot to be better than older ones. However, we still don’t live in an ideal world, so it could be better trying to help the engine finding a better execution plan.

    One student (Niek de Wit) proposed this approach:

    Selection :=
    IF (
        HASONEVALUE ( Period[Period] ),
        VALUES ( Period[Period] )
    )

    Sales :=
    CALCULATE (
        [Internet Total Sales],
        FILTER (
            VALUES ( 'Internet Sales'[Order Quantity] ),
            'Internet Sales'[Order Quantity]
                = IF (
                    [Selection] = "Current",
                    'Internet Sales'[Order Quantity],
                    -1
                )
        )
    )

        + CALCULATE (
            [MTD Sales],
            FILTER (
                VALUES ( 'Internet Sales'[Order Quantity] ),
                'Internet Sales'[Order Quantity]
                    = IF (
                        [Selection] = "MTD",
                        'Internet Sales'[Order Quantity],
                        -1
                    )
            )
        )
        + CALCULATE (
            [QTD Sales],
            FILTER (
                VALUES ( 'Internet Sales'[Order Quantity] ),
                'Internet Sales'[Order Quantity]
                    = IF (
                        [Selection] = "QTD",
                        'Internet Sales'[Order Quantity],
                        -1
                    )
            )
        )
        + CALCULATE (
            [YTD Sales],
            FILTER (
                VALUES ( 'Internet Sales'[Order Quantity] ),
                'Internet Sales'[Order Quantity]
                    = IF (
                        [Selection] = "YTD",
                        'Internet Sales'[Order Quantity],
                        -1
                    )
            )
        )

    At first sight, you might think it’s impossible that this approach could be faster. However, if you examine with the profiler what happens, there is a different story. Every original IF’s execution branch is now a separate CALCULATE statement, which applies a filter that does not execute the required measure calculation if the result of the FILTER is empty. I used the ‘Internet Sales’[Order Quantity] column in this example just because in Adventure Works it has only one value (every row has 1): in the real world, you should use a column that has a very low number of distinct values, or use a column that has always the same value for every row (so it will be compressed very well!). Because the value –1 is never used in this column, the IF comparison in the filter discharge all the values iterated in the filter if the selection does not match with the desired value.

    I hope to have time in the future to write a longer article about this optimization technique, but in the meantime I’ve seen this optimization has been useful in many other implementations. Please write your feedback if you find scenarios (in both Power Pivot and Tabular) where you obtain performance improvements using this technique!

  • New training on Power Pivot with recorded video courses

    I and Alberto Ferrari started delivering training on Power Pivot in 2010, initially in classrooms and then also online. We also recorded videos for Project Botticelli, where you can find content about Microsoft tools and services for Business Intelligence. In the last months, we produced a recorded video course for people that want to learn Power Pivot without attending a scheduled course.

    We split the entire Power Pivot course training in three editions, offering at a lower price the more introductive modules:

    • Beginner: introduces Power Pivot to any user who knows Excel and want to create reports with more complex and large data structures than a single table.
    • Intermediate: improves skills on Power Pivot for Excel, introducing the DAX language and important features such as CALCULATE and Time Intelligence functions.
    • Advanced: includes a depth coverage of the DAX language, which is required for writing complex calculations, and other advanced features of both Excel and Power Pivot.

    There are also two bundles, that includes two or three editions at a lower price.

    Most important, we have a special 40% launch discount on all published video courses using the coupon SQLBI-FRNDS-14 valid until August 31, 2014. Just follow the link to see a more complete description of the editions available and their discounted prices. Regular prices start at $29, which means that you can start a training with less than $18 using the special promotion.

    P.S.: we recently launched a new responsive version of the SQLBI web site, and now we also have a page dedicated to all videos available about our sessions in conferences around the world. You can find more than 30 hours of free videos here: http://www.sqlbi.com/tv.

  • DIVIDE vs division operator in #dax

    Alberto Ferrari wrote an interesting article about DIVIDE performance in DAX. This new function has been introduced in SQL Server Analysis Services 2012 SP1, so it is available also in Excel 2013 (which still doesn’t have other features/fixes introduced by following Cumulative Updates…). The idea that instead of writing:

    IF ( Sales[Quantity] <> 0, Sales[Amount] / Sales[Quantity], BLANK () )

    you can write:

    DIVIDE ( Sales[Amount], Sales[Quantity] )

    There is a third optional argument in DIVIDE that defines the result in case the denominator (second argument) is zero, and by default its value is BLANK, so I omitted the third argument in my example.

    Using DIVIDE is very important, especially when you use a measure in MDX (for example in an Excel PivotTable) because it raise the chance that the non empty evaluation for the result is evaluated in bulk mode instead of cell-by-cell. However, from a DAX point of view, you might find it’s better to use the standard division operator removing the IF statement. I suggest you to read Alberto’s article, because you will find that an expression applying a filter using FILTER is faster than using CALCULATE, which is against any rule of thumb you might have read until now!

    Again, this is not always true, and depends on many conditions – trying to simplify, we might say that for a simple calculation, the query plan generated by FILTER could be more efficient – but, as usual, it depends, and 90% of the times using FILTER instead of CALCULATE produces slower performance. Do not take anything for granted, and always check the query plan when performance are your first issue!

  • Issues with July 2014 Office Update and Click-to-Run #excel #powerbi

    An interesting experience with Office Click-to-Run is that trusting it to install updates the night before a conference where you are a speaker. I’m writing this blog post while Alberto Ferrari is delivering its part of the preconference training day at SQL Bits about how to create a complete solution in Power BI. But I think it’s important to share.

    Symptoms:

    • Excel crashes when you click Save As
    • Word crashes when you click Save As
    • Outlook crashes when you click File Office Account
    • Power Point crashes when you click Save As

    In a word: Office no longer works well and crashes often.

    Yesterday I installed the latest update of Office using the Click-to-Run distribution: it’s not Windows Updates, so I cannot check in Windows Update what are the updates installed. You have to check the version you installed and this could be hard if the window that should show the number makes the application crashing.

    However, after some investigation:

    • I took a look at the page that shows the versions released for click-to-run: http://support.microsoft.com/gp/office-2013-click-to-run
    • I had the version 15.0.4631.1002 – July 2014 update (http://support.microsoft.com/KB/2980001)
    • I wanted to revert back to the previous version (15.0.4623.1003 – June 2014 update)
    • I followed instructions described here: https://community.office365.com/en-us/f/172/t/251109.aspx
      • Open an administrative command prompt, then run one of the next two commands based on your version:
        • For an Office installation in a 32-bit version of Windows:
          cd %programfiles%\Microsoft Office 15\ClientX86
        • For an Office installation in a 64-bit version of Windows
          cd %programfiles%\Microsoft Office 15\ClientX64
      • Then run the following command:
        officec2rclient.exe /update user updatetoversion=15.0.4623.1003

    I think that this procedure could be useful in the future in case any similar issue will happen again (hopefully not…)

    Lessons learned: never install *any* update the day before speaking at a conference. Not only Windows Updates, but also Office Click-to-Run updates…

  • Use of RANKX with decimal numbers in DAX #powerpivot #ssas #tabular

    Using decimal numbers in Power Pivot and Tabular might produce small rounding differences in certain calculations. This is nothing new when you work with floating point, as many programmer knows. The implementation of RANKX might suffer of a behavior producing wrong results when the measures used for the ranking returns a decimal value.

    For example, consider the following model, where there are three names (A, B, C), each one with a value resulting from the sum of rows in the fact table and a Pos measure, calculated using the following measure:

    Pos :=
    IF (
        HASONEVALUE ( Sample[Name] ),
        RANKX (
            ALL ( Sample[Name] ),
            CALCULATE ( SUM ( Sample[Value] ) )
        )
    )

    clip_image001

    In this case, everything works fine and the Pos has values from 1 to 3. However, when you select only one name, you might see a wrong number. In the following example, the Pos value is higher than the number of available names.

    clip_image002

    It is not easy to find a reproducible case, usually the rounding error results from complex calculations. The purpose of the previous example is to describe the symptoms that you might experience.

    Under the cover, the RANKX calculate the value of the measure for each element of the list of names, and then it searches in that table the result of the expression for the current filter context. If there is any rounding error in this operation… the match does not happen (or it might happen with the wrong index, even if this is harder) and you see the wrong Pos number as a result.

    Hopefully, a fix to this behavior will be released sooner or later. In the meantime, there are two possible workarounds:

    1. Cast the expression to currency using the CURRENCY function, so that the values compared are of currency data type, which is not subject to the described issue
    2. Store the original value in a column of Currency data type, so that the result is still a currency and the match works well

    By using either one of the workarounds, you will see the correct result:

    clip_image003

    The first approach (cast the result) might have a minimal impact in query performance. I would prefer, whenever possible, storing the values in a Currency column, so that any measure will not suffer of this issue.

    In any case, be careful about the data type of the expressions using in a RANKX function.

  • Learn #tabular and #dax at PASS Summit 2014

    During the last months I’ve seen an increasing adoption of Analysis Services Tabular (and I’m writing a longer article about a particular area of adoption which was probably not expected – more on that in a few weeks). This year at PASS Summit 2014 there are plenty of opportunities to learn (or improve skills on) Analysis Services Tabular:

    • Mon, Nov 03, 2014: Data Modeling in SSAS Tabular – pre-conference seminar by Marco Russo (yes, myself)
      • This one-day seminar introduces the Tabular models using Visual Studio. The only prerequisite is a knowledge of SQL language. Previous experience in Analysis Services Multidimensional or other OLAP / Analytical tools is welcome but not required. If you already created projects in Tabular, you will understand how the Tabular engine works and how to create optimal data models. Compression efficiency is important, and this impact the way you model tables and relationships. DAX is not included in this day, because it is covered by Alberto in another seminar the following day.
    • Tue, Nov 04, 2014: From 0 to DAX – pre-conference seminar by Alberto Ferrari
      • Either you use Tabular or Power Pivot, even if you started your Tabular experience the day before, this seminar introduces you the DAX language syntax and the important concepts (filter context, evaluation context) you have to know. This knowledge makes you able writing the formula you need, without the “try and see” approach that could be very confusing in DAX. Have you had this experience, writing a DAX formula that didn’t result in what you expect? This is the right seminar for you.
    • General sessions (75 minutes):
      • Advanced Modeling with Analysis Services Tabular (Alberto Ferrari)
        • This session is about how to overcome “limitations” in Tabular data modeling by creating virtual relationships, balance at point in time without snapshots, dynamic currency conversions, measures of active events, surveys, and basket analysis.
      • DAX Patterns (Marco Russo)
        • I will explain some of the patterns available in DAX Patterns web site. Yes, you can read the articles and use the patterns, but this session has the goal of explaining how these pattern works, and not only how to use them.
      • Working with Time Functions in DAX (Michael Antonovich)
        • I don’t know Michael, but if you never used Time Intelligence functions in DAX, this is a topic that you have to study. When you fill ready, you can still discover how to rewrite Time Intelligence functions with Time Patterns!
      • Load Testing Analysis Services (Bob Duffy)
        • Bob wrote interesting articles about performance of Analysis Services, like this one about partitioning in Tabular, and I will certainly attend this session as an attendee (I hope it will not be overlapped with mine!)
    • Even if not strictly related to Tabular, I made a personal selection of sessions that a BI Developer engaged with Analysis Services Tabular and/or Multidimensional should see:

    This is going to be a very interesting PASS Summit. I’ve seen a rich sessions portfolio also for ETL, DWH SSIS, SSRS, And, of course SQL Server! Please let me know if I missed some important session for the BI Developers target!

    SIDE NOTE: BI Sessions at PASS Summit - I’ve seen comments about moving BI-related sessions to PASS BA Analytics, giving more sessions to SQL Server. I’m not sure it would be a good idea for PASS. Today, PASS BA Analytics is a conference that is not mature, I would like to see more advanced sessions for BI Developers, but the point is that PASS BA only attracts a few hundreds of attendees, whereas the PASS Summit attracts a 35-40% of the audience made by people working in Business Intelligence arena. Dropping all the BI sessions would mean probably cutting a large part of the conference budget. I’ve seen PASS growing well and the SQL Saturday initiative is incredible. I understand that nobody would take the risk of damaging the main source of revenues of the organization that makes all this possible. Thus, I see a future for PASS BA Analytics are the conference for emerging tools and technologies (sort of “build” conference for BI – it’s not that today, but it’s a direction I would like to see), whereas PASS Summit is the conference for established and released tools and products (sort of “TechEd” conference for BI & SQL DBA & DEV).

  • Calculate the rolling average for 12 months in #DAX and a nice IF optimization

    Alberto published the Rolling 12 Months Average in DAX article on SQLBI a few days ago, which includes interesting consideration about how to avoid the pitfall of touching the boundaries of the Date table, which could result in a calculation error.

    More interesting for the geek of us is the optimization of the measure to avoid the IF statement. As you may already know if you watched some of our last events or course, using IF statement in a measure might affect performance, especially (but not only) when you query a Tabular or Power Pivot model from MDX (i.e. from a PivotTable in Excel). In this article, instead of using:

    Avg12M := IF ( [Sales] <> 0, <expression> )

    the formula is

    Avg12M := DIVIDE ( [Sales], [Sales] ) * <expression> )

    as you can see, the DIVIDE has the only purpose to return 1 if the value to check is other than 0, and 0 if it is 0. The query plan generated by this expression is much faster than the IF one, and this technique can be used in many other similar scenarios.

  • Calculate New, Returning, Lost, and Recovered Customers in #dax

    Calculating the number of new and returning customers is a recurring question. I would say this is a “classical” Business Intelligence problem, very common in marketing department. I worked on these problems with many customers, with small and large datasets, and I wrote a DAX Pattern “New and Returning Customers” showing how to calculate:

    • New Customers: customers who never made any purchase
    • Returning Customers: customers who bought something in the past
    • Lost Customers: customers who bought something but did not buy in the last N days days
    • Recovered Customers: previously “lost customers” who made a new purchase

    This is not a brand new topic, you can find many other blog posts on this topic (Chris Webb, Javier Guillén, Gerhard Brueckl, David Hager, Rob Collie), so my goal was to show very generic formulas that were generally the best solution in term of performance. This make the formula less readable, such as the following:

    [Returning Customers] :=
    COUNTROWS (
        CALCULATETABLE (
            VALUES ( <customer_key_column> ),
            VALUES ( <customer_key_column> ),
            FILTER (
                ALL ( <date_column> ),
                <date_column> < MIN ( <date_column> )
            )
        )
    )

    As you see, using CALCULATETABLE ( VALUES ( table[column] ), VALUES ( table[column] ), … ) seems a useless thing. Why counting the rows returned by VALUES and passing it also as a filter argument? This is a not so intuitive behavior of CALCULATE. The first argument is an expression that will be evaluated in a modified filter context. The third argument is a FILTER on the date column, which extends the range of dates considered, including all the past sales transactions. At this point, the first VALUES would return any customers who made a purchase in the past, but the second argument will only considered those that made a purchase in the current selection of time. The final result is an AND condition between two sets of customers (the intersection of the two sets), which is faster than trying to calculate the number of past transactions of each customer who made a purchase in the current selection of time, filtering only those that results in zero transactions.

    In general, I prefer using more readable DAX formulas, also in DAX patterns, optimizing them only when necessary. But in this case the performance might be important (visible to the user) also with a few thousands of customers. As usual, any feedback on the New and Returning Customers pattern will be very welcome!

  • Basket Analysis with #dax in #powerpivot and #ssas #tabular

    A few days ago I published a new article on DAX Patterns web site describing how to implement Basket Analysis in DAX. This topic is a very classical one and is also covered in the many-to-many revolution white paper. It has been also discussed in several blog posts, listed here in historical order:

    As usual, in DAX Patterns we try to present the required DAX formulas in a way that is easy to adapt to specific models. We also try to show a good implementation from a performance point of view. Further optimizations are always possible in DAX. However, in order to keep the model simple to adapt in different scenarios, we avoid presenting optimizations that would require particular assumptions or restrictions on the data model.

    I hope you will find the Basket Analysis pattern useful. Even if you do not need it today, reading the DAX formula is a good exercise to check your knowledge of evaluation contexts in DAX. For example, describing how does it work the following expression is not a trivial task!

    [Orders with Both Products] :=
    CALCULATE (
        DISTINCTCOUNT ( Sales[SalesOrderNumber] ),
        CALCULATETABLE (
            SUMMARIZE ( Sales, Sales[SalesOrderNumber] ),
            ALL ( Product ),
            USERELATIONSHIP ( Sales[ProductCode], 'Filter Product'[Filter ProductCode] )
        )
    )

    The good news is that you can use the patterns even if you do not really understand all the details of the DAX formulas you are using!
    Any feedback on this new pattern is very welcome.

More Posts Next page »

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement