THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

  • Calculate New, Returning, Lost, and Recovered Customers in #dax

    Calculating the number of new and returning customers is a recurring question. I would say this is a “classical” Business Intelligence problem, very common in marketing department. I worked on these problems with many customers, with small and large datasets, and I wrote a DAX Pattern “New and Returning Customers” showing how to calculate:

    • New Customers: customers who never made any purchase
    • Returning Customers: customers who bought something in the past
    • Lost Customers: customers who bought something but did not buy in the last N days days
    • Recovered Customers: previously “lost customers” who made a new purchase

    This is not a brand new topic, you can find many other blog posts on this topic (Chris Webb, Javier Guillén, Gerhard Brueckl, David Hager, Rob Collie), so my goal was to show very generic formulas that were generally the best solution in term of performance. This make the formula less readable, such as the following:

    [Returning Customers] :=
    COUNTROWS (
        CALCULATETABLE (
            VALUES ( <customer_key_column> ),
            VALUES ( <customer_key_column> ),
            FILTER (
                ALL ( <date_column> ),
                <date_column> < MIN ( <date_column> )
            )
        )
    )

    As you see, using CALCULATETABLE ( VALUES ( table[column] ), VALUES ( table[column] ), … ) seems a useless thing. Why counting the rows returned by VALUES and passing it also as a filter argument? This is a not so intuitive behavior of CALCULATE. The first argument is an expression that will be evaluated in a modified filter context. The third argument is a FILTER on the date column, which extends the range of dates considered, including all the past sales transactions. At this point, the first VALUES would return any customers who made a purchase in the past, but the second argument will only considered those that made a purchase in the current selection of time. The final result is an AND condition between two sets of customers (the intersection of the two sets), which is faster than trying to calculate the number of past transactions of each customer who made a purchase in the current selection of time, filtering only those that results in zero transactions.

    In general, I prefer using more readable DAX formulas, also in DAX patterns, optimizing them only when necessary. But in this case the performance might be important (visible to the user) also with a few thousands of customers. As usual, any feedback on the New and Returning Customers pattern will be very welcome!

  • Basket Analysis with #dax in #powerpivot and #ssas #tabular

    A few days ago I published a new article on DAX Patterns web site describing how to implement Basket Analysis in DAX. This topic is a very classical one and is also covered in the many-to-many revolution white paper. It has been also discussed in several blog posts, listed here in historical order:

    As usual, in DAX Patterns we try to present the required DAX formulas in a way that is easy to adapt to specific models. We also try to show a good implementation from a performance point of view. Further optimizations are always possible in DAX. However, in order to keep the model simple to adapt in different scenarios, we avoid presenting optimizations that would require particular assumptions or restrictions on the data model.

    I hope you will find the Basket Analysis pattern useful. Even if you do not need it today, reading the DAX formula is a good exercise to check your knowledge of evaluation contexts in DAX. For example, describing how does it work the following expression is not a trivial task!

    [Orders with Both Products] :=
    CALCULATE (
        DISTINCTCOUNT ( Sales[SalesOrderNumber] ),
        CALCULATETABLE (
            SUMMARIZE ( Sales, Sales[SalesOrderNumber] ),
            ALL ( Product ),
            USERELATIONSHIP ( Sales[ProductCode], 'Filter Product'[Filter ProductCode] )
        )
    )

    The good news is that you can use the patterns even if you do not really understand all the details of the DAX formulas you are using!
    Any feedback on this new pattern is very welcome.

  • White Paper on Analysis Services Tabular Large-scale Solution #ssas #tabular

    Since the first beta of Analysis Services 2012, I worked with many companies designing and implementing solutions based on Analysis Services Tabular. I am glad that Microsoft published a white paper about a case-study using one of these scenarios: An Analysis Services Case Study: Using Tabular Models in a Large-scale Commercial Solution. Alberto Ferrari is the author of the white paper and many people contributed to it. The final result is a very technical document based on a case study, which provides a level of detail that I don’t see often in other case studies (which are usually more marketing-oriented).

    This white paper has the following structure:

    • Requirements (data model, capacity planning, client tool)
    • Options considered (SQL Server Columnstore Indexes, SSAS Multidimensional, SSAS Tabular)
    • Data Model optimizations (memory compression, query performance, scalability)
    • Partitioning and Processing strategy for near real-time latency
    • Hardware selection (NUMA analysis, Azure VM tests)
    • Scalability tests (estimation of maximum users per node)

    If you are in charge of evaluating Tabular as analytical engine, or if you have to design your solution based on Tabular, this white paper is a must read. But if you just want to increase your knowledge of Analysis Services, you will find a lot of useful technical information. That said, my favorite quote of the document is the following one, funny but true:

    […] After several trials, the clear winner was a video gaming machine that one guy on the team used at home. That computer outperformed any available server, running twice as fast as the server-class machines we had in house.

    At that point, it was clear that the criteria for choosing the server would have to be expanded a bit, simply because it would have been impossible to convince the boss to build a cluster of gaming machines and trust it to serve our customers.  But, honestly, if a business has the flexibility to buy gaming machines (assuming the machines can handle capacity) – do this.

    Owen Graupman, inContact

    I want to write a longer discussion about how companies are adopting Tabular in scenarios where it is the hidden engine of a more complex solution (and not the classical “BI system”), because it is more frequent than you might expect (and has several advantages over many alternative approaches).

  • The updated Survey pattern for Power Pivot and Tabular #powerpivot #tabular #ssas #dax

    One of the first models I created for the many-to-many revolution white paper was the Survey one. At the time, it was in Analysis Services Multidimensional, and then we implemented it in Analysis Services Tabular and in Power Pivot, using the DAX language.

    I recently reviewed the data model and published it in the Survey article on DAX Patterns site. The Survey pattern is the foundation for others, such as the Basket Analysis, and it is widely used in many different business scenario. I was particularly happy to know it has been using to perform data analysis for cancer research!

    In this article I did some maintenance on the DAX formulas, checking that the proper error handling is part of the formulas, and highlighting some differences in slicers behavior between Excel 2010 and Excel 2013, which could be particularly important for the Survey scenario. As usual, we provide sample workbooks for both Excel 2010 and Excel 2013, and we use DAX Formatter to make the DAX code easier to read. Any feedback will be appreciated!

  • Write DAX queries in Report Builder #ssrs #dax #ssas #tabular

    If you use Report Builder with Reporting Services, you can use DAX queries even if the editor for Analysis Services provider does not support DAX syntax. In fact, the DMX editor that you can use in Visual Studio editor of Reporting Services (see a previous post on that), is not available in Report Builder. However, as Sagar Salvi commented in this Microsoft Connect entry, you can use the DAX query text in the query of a Dataset by using the OLE DB provider instead of the Analysis Services one. I think it’s a good idea to show the steps required.

    First, create a Data Source using the OLE DB connection type, and provide the connection string the provider (Provider), the server name (Data Source) and the database name (Initial Catalog), such as:
    Provider=MSOLAP;Data Source=SERVERNAME\\TABULAR;Initial Catalog=AdventureWorks Tabular Model SQL 2012

    image

    Then, create a Dataset using the data source previously defined, select the Text query type, and write the DAX code in the Query pane:

    image

    You can also use the Query Designer window, that doesn’t provide any particular help in writing the DAX query, but at least can show a preview of the result of the query execution.

    image

    I hope DAX will get better editors in the future… in the meantime, remember you can use DAX Studio to write and test your DAX queries, and DAX Formatter to improve their readability!

    If you want to learn the DAX Query Language, I suggest you watching my video Data Analysis Expressions as a Query Language on Project Botticelli!

  • Tips for adapting Date table to Power View forecasting #powerview #powerbi

    During the keynote of the PASS Business Analytics Conference, Amir Netz presented the new forecasting capabilities in Power View for Office 365. I immediately tried the new feature (which was immediately available, a welcome surprise in a Microsoft announcement for a new release) and I had several issues trying to use existing data models.

    The forecasting has a few requirements that are not compatible with the “best practices” commonly used for a calendar table until this announcement. For example, if you have a Year-Month-Day hierarchy and you want to display a line chart aggregating data at the month level, you use a column containing month and year as a string (e.g. May 2014) sorted by a numeric column (such as 201405). Such a column cannot be used in the x-axis of a line chart for forecasting, because you need a date or numeric column. There are also other requirements and I wrote the article Prepare Data for Power View Forecasting in Power BI on SQLBI, describing how to create columns that can be used with the new forecasting capabilities in Power View for Office 365.

  • Power Query in Modern Corporate BI–Copenhagen, June 3, 2014–#powerquery

    I will be in Copenhagen to deliver the SSAS Tabular Workshop on June 2-4, 2014 (few seats still available, but hurry up!).

    In the same week I will be a speaker in an evening community event, MsBIP møde nr. 21, delivering the Power Query in Modern Corporate BI session that I also presented at TechEd North America 2014 last week. It’s not just a session about Power Query, there is a broader scope related to Corporate BI vs. Self-Service BI, which could be open to many consideration. I think that the two worlds can (and should) collaborate, instead of fighting against each other, especially when there is an existing investment in Corporate BI. I hope to meet many of you there!

  • Implement Budget Allocation in DAX for Power Pivot and Tabular #powerpivot #tabular #ssas #dax

    Comparing sales and budget, or costs and budget, is a very common operation. However, it is often the case that you have different granularities for different tables containing budget and the data to compare with. There are two ways to do that: you can limit the comparison to the granularity that is common to the two tables, or you can allocate the budget where it’s not defined.

    For example, if you have a budget defined by quarter and category, you might want to allocate it by month and product. In this way, you will do the comparison as you had a more granular definition of the budget, without actually having to do the manual job of allocating data (usually in an Excel worksheet!).

    If you want to do budget allocation in DAX, you can use the Budget Patterns we published on DAX Patterns. If you come from and MDX/OLAP background, at first you might find it hard to solve the problem of not having attribute hierarchies that helps you in propagating the budget values to lower hierarchical levels. However, I think that once you get used to DAX, you will find the behavior very predictable and easy to “debug” also for more complex allocation formula. You just have to be careful in writing the DAX formula, but probably the pattern we wrote should help you designing the right data model, without creating physical relationships to the budget table!

    This pattern is also based on the Handling Different Granularities scenario I discussed a couple of weeks ago.

  • Meet me at TechEd 2014 – where and when #msteched

    If you are attending TechEd North America in Houston this week, stop me and say hello! I am always happy to meet blog readers, and of course if you have question or topic to discuss, try to join me at the BI booth expo. I tried to put a list of where and when you can find me (thanks to Kasper for the idea):

    • Tuesday, May 13, 10:45am-12:30pm at Microsoft booth in expo area (Data Platform and Business Intelligence: Data Platform)
    • Tuesday, May 13, 2:15pm-4:00pm at Microsoft booth in expo area (Datacenter & Infrastructure Managment: Application Solutions)
    • Tuesday, May 13, 6:30pm-8:30pm at Ask the Experts
    • Wednesday, May 14, 3:15pm-4:30pm in room 330 - DBI-B323 Power Query in Modern Corporate BI
    • Thursday, May 15, 8:30am-9:45am in room 330 - DBI-B322 Improving Power Pivot Data Models for Microsoft Power BI
    • Thursday, May 15, 12:30pm-3:15pm at  Microsoft booth in expo area (Datacenter & Infrastructure Managment: Application Solutions)

    If you are not attending TechEd, remember you will be able to see most of the recordings on Channel 9.

  • SSAS Tabular from the Trenches in London on June 11, 2014 #ssas #tabular

    I will be in London to teach the Advanced DAX Workshop on June 11-13, 2014 (if you are interested, there are still seats available).

    During my stay, I will also deliver an evening community session titles SSAS Tabular from the Trenches on June 11, 2014 at 6:30pm. The London Business Analytics Group organized this free event and you can find more a complete description of the content in a dedicated page, where you can also register (for free!). In a few words, I will share several experiences of SSAS Tabular adoption, in different scenarios, trying to help who is still struggling for adopting Tabular (or not). Questions and open discussions are always welcome in these events!

  • First steps with Scheduled Data Refresh for Power Query #powerbi #powerquery

    Just a few days before my session about Power Query at TechEd 2014, Microsoft released a new update that enables the scheduled data refresh of a Power Pivot workbook containing Power Query transformations.

    This is a very good news, because it enables the data refresh of a number of different data sources. Even if the number of providers supported by this release is limited (only SQL Server and Oracle), you can use a SQL Server database as a bridge to access different data sources through views using Linked Server connections.

    If you want to use this feature, first of all read carefully the Scheduled Data Refresh for Power Query blog post on MSDN web site. It guides you through are the steps required in order to enable the data source connection through the Data Management Gateway. As you will see, in reality you need to create the data source connections corresponding to the Power Query databases you use. Thus, in reality you might skip the data source configuration if you already have the corresponding databases enabled in the Power BI admin center. However, I suggest you to go through the steps described in that blog post at the beginning, because if the same database has two different drivers, it needs two different data sources. For this reason, I have a number of notes that might be helpful to avoid certain issues.

    • Power Query uses the .NET Framework Data Provider for SQL Server and Oracle Data Provider for .NET, whereas Power Pivot by default creates a SQL Server connection using the SQL Server Native Client 11.0 (SQLNCLI11).
      • Even if you already created a data source for a SQL Server database you refresh in a Power Pivot workbook, you have to create another data source for the same SQL Server database for Power Query, because you use two different drivers.
      • You might consolidate these data sources to only one, by changing the data provide in the advanced options of a Power Pivot configuration, but I am not sure this is a good idea. I would keep the two version of data sources, one for each provider, in case I use the same database in both connections
    • Power Query creates one connection string in Excel for each query you create. The connection string contains the entire transformation and when you copy it in the New Data Source page in Power BI admin, the internal query is analyzed to extract the required connection to SQL Server. If these connections are already configured as Power BI data sources, then you don’t need to do anything else. I suggest you to iterate all the queries you have following this step until you are confident of the internals and you are sure the required data sources are already available.
      • Even if you create a single query in M language accessing to different databases, the referenced connections will be found and each database will have a separate data source configuration in Power BI. I was worried that loading multiple tables from different database on the same server would have produced a single data source enabling to access all the databases on the server, but luckily this does not happen and security is preserved!
    • I spot an issue using certain DateTimeZone functions (DateTimeZone.FixedLocalNow, DateTimeZone.FixedUtcNow, DateTimeZone.LocalNow, and DateTimeZone.UtcNow) that seem not working with scheduled data refresh. You can read more about such issue in this thread on Power Query MSDN forum. I found a workaround using the Table.Buffer function, so that by stopping query folding the expression is not translated in SQL but evaluated directly by the Power Query engine. However, I hope this will be fixed soon.
    • A Power Query transformation that contains only a script, without accessing to any data source, currently is not refreshed. This would be useful for generating a Date table, I opened this other thread about this issue on the forum, I hope there will be news on that, too.
      • In the same thread you will find another tip: the literal in the form #literal, such as #table, are being mis-analyzed by scheduled refresh, but at least for this issue there are workarounds available, until the issue is not fixed by Microsoft.
    • You can use SQL Server views based on linked servers to overcome the limitation of providers currently supported by Data Management Gateway (which is the component used by scheduled data refresh).
    • Now that it is possible to publish SSIS packages as OData Feed Sources, you can expose a SQL Server view to Power BI, and accessing it from Power Pivot or Power Query, you can execute SSIS packages at refresh time. If the package is not too long to execute (it would timeout the connection), this is a smart way to arrange execution of some small “corporate ETL” in sync with the data refresh on Power BI, without relying on synchronized scheduling dates (which is always one more thing to maintain). This further extends the range of providers you can use with scheduled data refresh.

    I would like to get more detailed errors when something goes wrong and scheduled data refresh stops, but this is a good start.

  • LINQ to DAX project on CodePlex #dax #tabular #ssas

    Since its release, I've seen a number of scenarios where Analysis Services Tabular is the analytical engine of the reporting section in a larger system. In these conditions, at least a part of the queries sent to Analysis Services are DAX queries generated by code (as a consequence of user interaction or for other reasons).

    Since DAX knowledge is not very common among developers, having a LINQ to DAX Query provider is more than welcome to simplify DAX code generation. Dealogic, which is between the first companies I helped in Tabular adoption a few years ago, invested time in a first version of the LINQ to DAX provider that is now available on CodePlex and open to contributions.

    I looked at the features available and at the DAX code generated and it already looks very interesting. You can generate good DAX queries without knowing DAX (which I suggest to study anyway!), and LINQ to DAX does a lot of job splitting conditions between CALCULATETABLE, SUMMARIZE, ADDCOLUMNS and FILTER. Kudos to György Farkas for his effort!

  • How to handle fact tables with different granularities in #dax #powerpivot #tabular

    A common question I receive from Excel users learning Power Pivot is how to handle table that have different granularities. In reality, this terminology is not the one they use: the concept of “table granularity” is used mostly by Kimball practitioners, who immediately identify this scenario in a “two fact tables with different granularities” pattern. In Power Pivot this situation is often the reason of many troubles for Excel users, mostly because it is not clear how to correctly apply data modeling.

    Moreover, also who comes from a Multidimensional background does not know how to handle relationships between fact tables and a dimension at different granularities. You have the ability to define the dimension relationship at any (attribute) hierarchical level in Multidimensional, but it seems that this feature is not available in Tabular. In reality, we have two options, for example when you have data at the product category level and you want to join the product dimension:

    • Conform the relationship at the dimension granularity level (product category), hiding the measures coming from the fact table when the value is not valid (product name)
    • Import the fact table without defining a relationship in the data model, and simulate the relationship (at the product category level) using a DAX expression that applies a corresponding filter at query time

    I wrote an article about handling different granularities in the www.daxpatterns.com website, describing these two options in more details and providing practical examples. I think that both techniques are useful, because simulating the relationship in DAX is more flexible for many reasons, but there could be scenarios where the data volume suggests using an approaches based on a physical relationship creating a dummy value in the dimension. As always, I would use the simpler approach unless you think that performance are not good enough, and only at that point you evaluate which patterns performs better.

    As a side comment: I don’t know what approach is “simpler”, because simulating a relationship requires a more verbose DAX formula, whereas the relationship based on a dummy item requires some work at the ETL level, and pollutes the dimension table with items that are not strictly required (with the relevant exception of a Date table, where you might use an existing day as a dummy element). But the budget pattern will be the subject of a dedicated pattern very soon…

  • Upcoming conference speeches and workshops #ssas #tabular #dax #powerpivot

    Between May and July I and Alberto will be speaker at several conferences, and I think it could be useful to write a single blog post with a recap:

    We will also deliver several courses:

    See you around the world! 

  • The ISEMPTY function in #dax #powerpivot #tabular

    Microsoft silently added the ISEMPTY function to the DAX language in Analysis Services build 11.00.3368 (SQL Server 2012 SP1 CU4). This function is particularly important in DAXMD (when you use DAX to query a Multidimensional model), because produces a much better execution plan in OLAP than the alternatives based on COUNTROWS.

    There is an advantage in using it in Tabular/Power Pivot models, too, even if there is an issue using it in Power Pivot. You can upgrade Power Pivot on Excel 2010 (you can download the new version of Power Pivot for Excel 2010 as part of the cumulative update released for SQL Server), but you cannot upgrade Excel 2013. This is done only through Office updates and up to now such a feature has not been added to the released versions of Excel 2013 (at least until version 15.0.4605.1003). The funny thing is that, if you have Power Pivot for SharePoint, you can have a server that would be able to use new features (such as ISEMPTY function) but you are not able to create an Excel 2013 file using them!

    Last week I wrote a an article on SQLBI that describes the available syntaxes you can use to check empty table condition in DAX. You will find a few code examples there. I hope that Microsoft will soon release an upgrade in Power Pivot for Excel 2013, too.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement