THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

  • Connecting to #powerpivot from an external program (such as #Tableau)

    Many people requested me how to connect to Power Pivot from an external program, without publishing the workbook on SharePoint or on Analysis Services Tabular. I always said it is not possible (for both technical and licensing reasons), but someone observed that Tableau is able to extract data from a Power Pivot data model connecting directly to the xlsx file. I wanted to investigate how they solved the limitations that exists.

    From a technical point of view, you have to install a Tableau Add-In for Power Pivot Excel (it’s available for both 32 and 64 bit). Then, you connect using the Tableau Desktop software selecting the Microsoft Power Pivot connection. You choose a local Excel file and click Connect. The list of perspective appears in a combo box.

    connect

    You click ok and you navigate into the Power Pivot data model. But what’s happening? Tableau runs an Excel instance (probably by using Excel object model) and then connects through the Tableau Add-In for Power Pivot that you installed before. Probably this add-in acts as a bridge between the process running Excel and the process running Tableau. This solve the technical problem, and it would be interesting to know how to use the same add-in from other programs without having to write the same add-in again. I know many ISVs that would love to do that!

    But before starting your project in Visual Studio to do the same (after all, it shouldn’t be rocket science writing such a connector), consider the license agreement (EULA) of Office. It says that “Except for the permitted use described under "Remote Access" below, this license is for direct use of the software only through the input mechanisms of the licensed computer, such as a keyboard, mouse, or touchscreen. It does not give permission for installation of the software on a server or for use by or through other computers or devices connected to the server over an internal or external network.”. It seems we are in gray area here. The access to Excel is not direct. But at the same time, it is not made on another computer, and technically you are using keyboard, mouse and/or touchscreen when you are using Tableau Desktop.

    This is certainly an unsupported scenario (and if the background Excel process hangs for any reason, you have to kill it in Task Manager). But if the licensing allows that, or if Microsoft tolerate this, probably many companies writing software (I have a long list of requests I received…) could be interested in doing the same.

    I would love to hear some official word on this topic…

  • How to pass a #DAX query to DAX Formatter

    In its first two months, DAX Formatter served 3,500 requests and I see the daily trend slowly raising. If you have observed carefully the first articles published on DAX Patterns, you might have seen that you can click the link”Code beautified with DAX Formatter”.

    image

    When you click that link, you open the DAX Formatter page copying the query/formula shown in the box. The good news is that you can implement the same behavior in your articles/blogs/posts by using a GET or POST call.

    The easiest way is passing the query into the URL of a GET command:

    http://www.daxformatter.com/?fx=FORMULA&r=REGION

    The &fx argument is the dax code you want to format. The &r argument is optional and can be US (the default), UK or Other. Using Other you use the semicolon ( ; ) as a list separator, and comma ( , ) as decimal point, instead of the , and . settings used for US and UK. Here are two examples of the same query formatted with the two settings.

    http://www.daxformatter.com/?fx=EVALUATE%20calculatetable(Customers,Customer[Occupation]="Student")&r=US

    http://www.daxformatter.com/?fx=EVALUATE%20calculatetable(Customers;Customer[Occupation]="Student")&r=EU

    Using the URL might have different limits for its length, depending on the browser. We can consider 2000 characters as a practical limit. You can overcome this limitation by using a POST command. Here is an example of a simple html form that pass the content of a textbox as the query to format:

    <form action="http://www.daxformatter.com" method="post">
            <input type="hidden" name="r" value="US" />
            <textarea name="fx">EVALUATE calculatetable(Customers,Customer[Occupation]="Student")</textarea>
            <input type="submit" />
    </form>

    I have also received many feedback about many possible improvements of DAX Formatter – we’ll work on it, you just have to wait… but thanks for the support and appreciation!

    UPDATE Feb 27, 2014 

    You can now use the URL syntax with the additional arguments:

    embed=1 : request only the HTML formatted code

    font=n : optional - set the font size

    For example:  

    http://www.daxformatter.com/?embed=1&fx=EVALUATE%20calculatetable(Customers,Customer[Occupation]="Student")&r=US 

    http://www.daxformatter.com/?embed=1&font=22&fx=EVALUATE%20calculatetable(Customers,Customer[Occupation]="Student")&r=US

  • Distinct Count calculation on dimension attribute in #dax #powerpivot #tabular

    Creating Distinct Count calculations in DAX is pretty easy when the target of the operation is a column in the “fact table” of a star schema. When you want to apply the DISTINCTCOUNT function to a dimension attribute (such as Customer Country, Product Model, Employee Department, and so on), you need to apply one of the techniques described in the Related Distinct Count pattern we published on www.daxpatterns.com.

    Technically, this is an implementation of the many-to-many relationship calculation in DAX, but you can safely ignore the complexity behind the many-to-many patterns: just copy the DAX formula corresponding to the common scenarios:

    • Distinct Count on Attribute in Star Schema
    • Distinct Count on Attribute in Snowflake Schema
    • Distinct Count on Slowly Changing Dimension (SCD) Type 2

    We will continue publishing new patterns in the next weeks – let us know your feedback about the DAX Patterns website!

  • The Cumulative Total #dax pattern

    The first pattern published on www.daxpatterns.com is the Cumulative Total. Another common name of this calculation is Running Total, but the reason why we named the pattern in this way is that we identify all those scenarios in which you want to obtain a value, at a certain date, that corresponds to the result of a number of transactions executed in the past. Usually, this scenario is implemented using snapshot tables in the classical Kimball modeling. With a columnar table, you can afford avoiding the snapshot table even with a relatively large dataset.

    You might want to implement the Cumulative Total pattern to reduce the volume of data stored in memory, transforming snapshot tables into dynamic calculation in DAX. The examples shown in the article represent an implementation of the Inventory Valuation at any point in time. Remember, I am not saying snapshot tables can be avoided in Tabular, but you have to consider the alternative, especially when the size of snapshot table is an order of magnitude (or more) larger than the original transactions table. Do some benchmark and find the better solution for your model!

  • DAX Patterns website official launch! #dax #powerpivot #tabular

    I’m very proud to announce the official launch of the DAX Patterns website!

    http://www.daxpatterns.com

    I and Alberto Ferrari worked on this idea for a really long time. Many business scenarios can be solved in Power Pivot using always the same models, with just a few minor changes. This is especially true for the DAX formulas required. Sometime, you adapt the data model to the requirements and to the DAX needs. But many many times, you end up copying an old project, or an existing blog post or article, and try to adapt the old formula to the new data model. Sometime, you might lose time adapting details that were specific to a certain solution and do not fit well in your model. We just tried to put some order, giving names to common templates and creating more generic versions that can be easily adapted to specific requirements.

    Finding the name for this project has been the easiest part: DAX Patterns is simple, clear and direct. I don’t know if it is so intuitive for an Excel user as it is for a programmer, but we loved this name since the first moment and we never thought to anything else.

    The hard (and long) part is doing all the material job. Defining a format for each pattern, creating a web site, thinking about how to make the patterns easy to find. We based the search on the notion of “use case”. This should simplify web search.

    Every pattern has this format:

    • Basic Pattern Example: a very short and simplified application of the pattern in an easy example. You can understand the idea reading this short part, so you quickly realize whether the pattern might be good for your needs or not.
    • Use Cases: a list of practical examples in which you can apply the pattern. You can also search patterns by use cases in the website.
    • Complete Pattern: a complete and more detailed description of the patterns, with consideration about possible different implementations, performance, maintainability.
    • More Pattern Examples: other implementations of the pattern, sometimes required to show possible applications in a practical way (use cases are just descriptions).

    We already prepared many patterns that we will publish in the next weeks and months. This is still a work in progress, we don’t have all the patterns ready, but we reached a point where we have enough material to schedule for publishing while we are completing the job. Well, in reality I think we will never “complete” this web site, but we have a list of patterns we want to complete before publishing a book that will contain all of them, so that you will be able to have them also offline. In any case, we will look for new patterns based on feedback we gather from customers, students, readers. If you think there is a scenario you would like to be covered, write us a comment. I already have a list, but prioritization depends also on the feedback we get.

    You will find only a couple of patterns now, but you will see the list growing weekly, and I will mention new patterns on my blog when they will be available. If you find a pattern interesting, feel free to mention/link it from your blog, newsletter, forum posts, tweets, and so on. This will help other people finding the pattern when they look for a solution to a problem that the pattern can solve. And, of course, let us know if you have any feedback to improve DAX Patterns!

  • How to obtain last process date from #SSAS #Tabular

    I wrote an article about how to get the Last Process Date in SSAS Tabular. This is a common request and there are many ways to obtain this information. You can use the metadata provided by XMLA or DMV, just as you can do in SSAS Multidimensional. Or you can use a calculated column that contains the DAX expression NOW() – this might seem an expensive approach, but remember that a column that has the same value for all the rows is absolutely not expensive, because it will be completely compressed in memory!.

    I suggest taking a look at the examples in the article, because it is simple to confuse the date of last schema update with the date of last data update, because sometime the schema is updated in an unexpected way…

  • Scheduled Data Refresh in Power BI #powerbi #powerpivot

    Now that Power BI is finally available (I still use terms as “released” for software and “in production” for a service…) I made some test on the new Scheduled Refresh option that you have for your models (other info here.

    First of all, the Refresh feature within the Excel workbook no longer works, at least when you are using Data Management Gateway – you need to schedule a refresh for that, and I think this is a good idea. The refresh time could be very slow and waiting 30 seconds or more without being able to interact with the workbook wasn’t good from the user experience point of view. Well done.

    Now, when you schedule the refresh, you can set a frequency that can be daily or weekly. No intra-day updates, unfortunately. I think you might workaround this by creating several copies of the same workbook and then defining a different daily schedule for each copy, at different hours. I think that a more frequent update could be a good idea for certain businesses. Maybe with some limitation, but if the workbook is small I don’t see the real reason for such a limitation (considering that it could incentivize bad behaviors like the workaround I just described).

    You can also request a manual refresh – the strange thing is that you can do that only after you defined a scheduled refresh (maybe I don’t need that, so I might create an unnecessary schedule just for a single manual refresh).

    SNAGHTML2f2d425

    I observed that when you request a manual refresh, in reality you just put a request in an internal queue somewhere on the cloud. After you request a refresh, you cannot request a second one, because one refresh is running.

    image

    At this point, you see the status Running in the History tab:

    image

    In reality, chances are that you are just waiting but nothing is running. In fact, I have seen my workbook refreshed after 10 minutes, but history log says that execution time was only 28 seconds. For this reason I assume that there is a queue that handles these requests on the cloud and manual refresh can take so long to finish. However, the second execution required less than 2 minutes, so I am not sure it was because I refreshed a workbook on my tenant for the first time (maybe some deployment was involved) or I was just unlucky and my queue was particularly slow at the previous request.

    image

    I look forward to see more data sources supported by scheduled data refresh, and in particular Power Query would be an important step for making Power BI adoption even larger.

  • Natural and Unnatural Hierarchies in #powerpivot and #tabular

    Last week I wrote an article about Natural Hierarchies in Power Pivot and Tabular where I show why it is important creating natural hierarchies for performance reasons. I’d like to spend a few words about the reasons why this is so important, so you might read the article in advance before moving forward.

    I had to investigate on a performance issue in a Tabular model of a customer. Initially, I wasn’t able to reproduce the issue creating a similar query in a test environment, because I didn’t focused my attention on the presence of the hierarchy, and I was generating similar queries without the hierarchy at all. Once I realized that the problem was present only in MDX and not in DAX, I started analyzing the query created by Excel and I found a number of workarounds teaching users how to avoid dangerous combinations of attributes on the rows of a pivot table. After more investigations and some help from people in MS (thanks Marius!) I realized that the problem was the use of unnatural hierarchies. I was aware of the problem (I always explain to use natural hierarchies whenever possible), but I was not aware of the order of magnitude of the problem in certain conditions, as I described in the article.

    So, I think it is interesting to highlight how to detect the problem when you just look at the Profiler. If the data model uses unnatural hierarchies, you will find in MDX code that certain members have names such as:

    [Date].[Calendar].[Year].&[2007].&[1]
    [Date].[Calendar].[Year].&[2007].&[1].&[1]
    [Date].[Calendar].[Year].&[2007].&[1].&[1].&[March]

    These names, when you use a natural hierarchy, will be:

    [Date].[Calendar].[Semester].&[H1-2007]

    [Date].[Calendar].[Quarter].&[Q1-2007]

    [Date].[Calendar].[Month].&[March 2007]

    The reason why performance might be slow and the query starts consuming memory is because of the Non Empty operation performed by the engine in order to eliminate tuples that are not included in the result. It is important to know that this does not happen for any query, and is very sensitive to the number of combinations resulting by the Cartesian product of all the attributes included in an axis of the MDX query. For example, if you use the Country name instead of the City in the example that I included in the article, also the unnatural hierarchy works in a reasonable time.

    Lesson learned: always use natural hierarchies also in Power Pivot and SSAS Tabular!

  • Distinct Count in DAX Query Plans #dax #tabular

    In the last months I and Alberto Ferrari worked on many consulting engagements about designing Tabular models or optimizing existing ones. One of the strength of the Tabular model compared to Multidimensional is the performance of distinct count measures. Now, being faster doesn’t necessarily mean getting an immediate answer, and several times we had to work on performance improvements for this type of measure. The matter is very complex and we are still not ready to publish a set of guidelines about how to optimize distinct count measures in Tabular.

    However, during our tests we made interesting discoveries about different query plans you can obtain for distinct count calculations, which can be made using different techniques in DAX. First, I want to warn you that there are no silver bullets. Each technique might be faster in certain conditions but slower in other, so a big mistake is thinking that you found a better solution than a DISTINCTCOUNT function call in DAX and you apply the alternative way to any measure in any model. As I said, don’t do that.

    Now, if you want to know more and want to discover how to read DAX query plans and how to measure the best option for your specific model, then this new paper written by Alberto Ferrari is for you:

    Understanding Distinct Count in DAX Query Plans

    We already made much more discoveries, but we need to investigate more in order to provide common guidelines. It would be very wrong publishing some best practices that only work in certain conditions, and worsen performance in many other. But if you have queries running in too many seconds (or minutes), please contact me: I’d like to gather more use cases on this topic!

  • Using Added Columns in #DAX Queries

    I had an issue in a query that reminded me a few details naming columns created by ADDCOLUMNS.

    When you specify a simple name in ADDCOLUMNS, you can use it later, for example in a SUMMARIZE or in a FILTER expression, such as in the following example (see the Value column):

    EVALUATE
    CALCULATETABLE (
        ADDCOLUMNS (
            ADDCOLUMNS (
                VALUES ( 'Date'[Date] ),
                "Value", [Internet Total Sales]
            ),
            "Frequency", COUNTROWS (
                FILTER (
                    VALUES ( 'Date'[Date] ),
                    [Internet Total Sales] <= [Value]
                )
            )
        ),
        'Date'[Calendar Year] = 2006,
        'Date'[Month] = 7
    )

    Naming an added column in this way makes it accessible only through its name, using the “measure” syntax, which doesn’t have a table name before the name of the column (please, remind that this is not a best practice – always use the table name before the column name when you reference a column and always omit the table name when you reference a measure!).

    So, a better way to write the previous expression could be the following one:

    EVALUATE
    CALCULATETABLE (
        ADDCOLUMNS (
            ADDCOLUMNS (
                VALUES ( 'Date'[Date] ),
                "’Date’[Value]", [Internet Total Sales]
            ),
            "Frequency", COUNTROWS (
                FILTER (
                    VALUES ( 'Date'[Date] ),
                    [Internet Total Sales] <= ‘Date’[Value]
                )
            )
        ),
        'Date'[Calendar Year] = 2006,
        'Date'[Month] = 7
    )

    Until now, there is nothing really new. It could be just a good reminder. Now, for some reason a few days ago I wrote this:

    EVALUATE
    CALCULATETABLE (
        ADDCOLUMNS (
            ADDCOLUMNS (
                VALUES ( 'Date'[Date] ),
                "[Value]", [Internet Total Sales]
            ),
            "Frequency", COUNTROWS (
                FILTER (
                    VALUES ( 'Date'[Date] ),
                    [Internet Total Sales] <= [Value]
                )
            )
        ),
        'Date'[Calendar Year] = 2006,
        'Date'[Month] = 7
    )

    In this case executing the query you receive an error that says that the Value column is not found. The problem is that I used "[Value]" instead of  "Value". So the lesson is: be careful when you name a column, don’t use square brackets unless you want to include the table name, which is a good idea so you will not confuse its semantic with a measure when you reference such a column later in your expression.

  • Training for DAX, Tabular and Power Pivot in 2014 #powerpivot #dax

    I hadn’t time in the last two months for new blog posts, but with some good reasons! I’ve been busy putting DAX Formatter online and writing some other stuffs that I will be happy to share very soon, aside from daily engagements with customers, of course. But the waiting is almost over, and in the meantime I’d like to share the schedule for upcoming training about Tabular, Power Pivot and DAX.

    • Advanced DAX Workshop: this is the most advanced course about DAX available today. It is aimed at Advanced Power Pivot users and Analysis Services developers who want to master the DAX language and improve their skills in performance optimization. The course includes hands-on lab sessions with exercises for creating queries, solving business problems and locating performance bottlenecks in DAX. Only me or Alberto Ferrari deliver this workshop and at the moment we have this classrooms scheduled in Europe (hopefully we’ll have news about US later…):
    • SSAS Tabular Workshop: we extended the length of this workshop in classrooms to 3 days, so that now it includes also hands-on-labs. This course is suggested to BI developer who still don’t have hands-on experience on Tabular models. Previous experience in Multidimensional is not required, if you know Multidimensional you will simply have more questions that we’ll be happy to answer! The workshop is delivered only be me or Alberto Ferrari and at the moment we scheduled classrooms only in Europe, but we also have an online edition optimized for America’s Time Zones:
    • Power Pivot Workshop: this workshop is delivered online or in classroom at training centers around the world. It is aimed at Excel users who want to learn Power Pivot. I and Alberto Ferrari provides the online workshops, which are optimized for different time zones:

    I will write another blog post in a few weeks about my speeches in conferences and communities events – but other blog posts will be about technical content, I promise! I’m just working on a couple of articles… stay tuned!

  • Power Query now imports relationships #powerquery #powerpivot

    The December release of Power Query preview (version 2.9.3502.261) introduced an interesting feature: when you import multiple tables in the data model, relationships existing between tables in the data source are automatically detected and added to the data model as well. This is similar to the behavior you have in Power Pivot, but there is an important difference. When you import one or more tables in a data model where you already imported other tables before, the relationships are detected also between tables already imported and new tables selected for import. As you might know, Power Pivot detects relationships only between tables imported at once, not between tables imported and other tables already in the data model, even when they comes from the same data source.

    The detection in Power Query doesn’t work between tables imported in data model directly by Power Pivot. It only works between tables that have been imported using Power Query.

    I didn’t test the performance (I can imagine there is some price to pay using Power Query instead of a direct connection), but it’s clear that in the long term Power Query should be THE tool to import data in Power Pivot, or better in Excel, removing the overriding with existing functionalities that do similar things. I really like the progress that Power Query is doing, and I hope that the best is yet to come…

  • Format your #dax code with DAX Formatter

    When you write a DAX formula that is not trivial, you end up having a complex expression that is not really readable. I think that the same problem affect in part all Excel users, but in Excel you can easily split a formula in many steps, using different cells. This is not always possible in Tabular, because moving a sub-expression in a separate measure might change the semantic, and this operation is possible only for scalar values and not for function tables. So we need to write readable code.

    One year ago, we published the Rules for DAX Code Formatting. Today, I’m prod to announce the availability of DAX Formatter, a free service that automatically format the code following these rules. We designed the service to be easy to use. Type the DAX expression (or paste it) and press Format. That’s it.

    http://www.daxformatter.com/

    For those of you reading offline or too lazy to click on a link, here is how it works. Suppose you have this DAX query (but you can format also expressions for measures and calculated columns):

    EVALUATE ADDCOLUMNS(FILTER(CROSSJOIN(
    SUMMARIZE(Customer,Customer[Customer],Customer[Customer.Key0]),
    VALUES('Date'[Date.Key0])),NOT ISBLANK([Internet Sales Amount])),
    "Sales",[Internet Sales Amount])
    ORDER BY 'Date'[Date.Key0] DESC,Customer[Customer] ASC

    You paste it into DAX Formatter and you obtain this result:

    EVALUATE
    ADDCOLUMNS (
        FILTER (
            CROSSJOIN (
                SUMMARIZE (
                    Customer,
                    Customer[Customer],
                    Customer[Customer.Key0]
                ),
                VALUES ( 'Date'[Date.Key0] )
            ),
            NOT ISBLANK ( [Internet Sales Amount] )
        ),
        "Sales", [Internet Sales Amount]
    )
    ORDER BY 'Date'[Date.Key0] DESC, Customer[Customer] ASC

    Which is also more readable in the web site thanks to the layout available (you can choose between dark and light color scheme):

    SNAGHTML609884

     

    SNAGHTML60d8e8

    And if you use non-US locale, just select the separators for you.
    Please, Submit any bug using the Bug Report button.
    Currently you can format any DAX query and expression in the form:

    • EVALUATE …
    • Table[measure] := …
    • [measure] := …
    • Table[column] = …
    • = …
    • measure := …

    Feedback are welcome – enjoy!

  • Videos about #DAX on Project Botticelli

    More than one year ago, I and Alberto started recording videos for Project Botticelli, and now we have a set of videos about DAX that you can watch online. There are a few videos free, and others are available in the monthly subscription. If you are interested, use this 20% discount code before the end of the December 2013: SQLBI20HOLS2013

    The modules available now are the following:

    1. DAX in Action! Free!
    2. Calculated Columns vs. Measures Free!
    3. Introduction to DAX
    4. Evaluation Context
    5. CALCULATE Function
    6. Querying with DAX
    7. What Is Time Intelligence? Free!
    8. Time Intelligence in Depth
    9. Many-to-many Relationships in DAX Free!
    10. DAX Patterns: Banding, New vs Old, Many-to-many
    11. How to Calculate with Hierarchies in DAX? Free!
    12. Hierarchies in DAX

    If you work with DAX… well, there will be a nice surprise in a few days, certainly before Christmas – stay tuned!

    You can find on Project Botticelli other series of videos about BI in Excel (by Carmel Gunn), MDX (by Chris Webb) and Data Mining (by Rafal Lukawiecki).

  • Updates about Multidimensional vs Tabular #ssas #msbi

    I recently read the blog post from James Serra Tabular model: Not ready for prime time? (read also the comments because there are discussions about a few points raised by James) and the following post from Christian Wade Multidimensional or Tabular. In the last 2 years I worked with many companies adopting Tabular in different scenarios and I agree with some of the points expressed by James in his post (especially about missing features in Tabular if compared to Multidimensional), but I strongly disagree in others.

    In general, Tabular is a good choice for a new project when:

    • the development team does not have a good knowledge of Multidimensional and MDX (DAX is faster to learn, not so easy as it is sold by MS, but definitely easier than MDX)
    • you don’t need calculations based on hierarchies (common in certain financial applications, but not so common as it could seem)
    • there are important calculations based on distinct count measures
    • there are complex calculations based on many-to-many relationships

    Until now, I never suggested to migrate an existing Multidimensional model to a Tabular one. There should be very important reasons for that, such as performance issues in distinct count and many-to-many relationships that cannot be easily solved by optimizing the Multidimensional model, but I still never encountered this scenario.

    I would say that in 80% of the new projects, you might use either Multidimensional or Tabular and the real difference is the time-to-market depending on the skills of the development team. So it’s not strange that who is used to Multidimensional is not moving to Tabular, not getting a particular benefit from the new model unless specific requirements exist. The recent DAXMD feature that allows using SharePoint Power View on Multidimensional is a really important one, even if I’d like having also Excel Power View enabled for this scenario (this should be just a question of time).

    Another scenario in which I’m seeing a growing adoption of Tabular is in companies that creates models for their product/service and do that by using XMLA or Tabular AMO 2012. I am used to call them ISVs, even if those providing services cannot be really defined in this way. These companies are facing the multitenancy challenge with Tabular and even if this is a niche market, I see some potential here, because adopting Tabular seems a much more natural choice than Multidimensional in those scenario where an analytical engine has to be embedded to deliver one of the features of a larger product/service delivered to customers.

    I’d like to see other feedbacks in the comments: tell your story of choosing between Tabular and Multidimensional in a BI project you started with SQL Server 2012, thanks!

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement