SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus
I recently write a small article on SQLBI to describe how to create a DAX measure that search for items that have all the items selected in a slicer, instead of any of them. For example, you might want to create a measure that display how many customers bought at least one product for each of the categories selected in a slicer. You cannot change the behavior of the slicer itself, that applies a filter to the calculation, but using DAX you can iterate the selected items and make sure that each one exists in another set of items (such as the products bought by a customer). I think it could be an interesting technique also for different applications (I would be happy to read your ideas in the comments!). You can read the complete article on SQLBI.
David DeWitt and Rimma Nehme delivered the day 2 keynote at PASS Summit 2015 talking about Internet of Things. Many PASS attendees know David for providing some of the best keynotes every, technical and without vaporware and marketing stuffs. So this is a good time to discuss about the real stat of Internet of Things (IoT) these days.
It’s easy to talk about IoT, but it’s hard to enter into this complex and heterogeneous world. There are many type of devices, and a simple categorization is consumer vs. industrial, which have very different requirements and features (not to mention cost, power, and standards used). The key to connect IoT devices is the cloud, but communication happens in two directions: from device to cloud, and from cloud to device. There requires different technologies and often different devices, because many of them have only a single role (sending data or receiving commands, sensors or actuators, but certain devices might do both).
As you can imagine, this is a real source of Big Data. I often see Big Data used to manage data that are generated into a structured relational database, which seems to be a non-sense. But with the volume of data generated by IoT this technology makes perfect sense. Of course, Azure has a lot of technologies that helps you manage this amount of data, but since the topic of this blog is Business Intelligence, I’m more interested to what happens when you want to analyze data.
Here, a few technologies that have a certain history (if you know former names) came into play. Data Mining (also known as Machine Learning these days) can be fundamental to make predictions based on previous behaviors. David is great in providing simple examples to explain the concept: a boiler has a pressure sensor and you have to open valve before boiler explodes. You can train the algorithm for predicting boiler failure, or you can provide built-in intelligence in the algorithm, with a predefined limit of bar to open valve. The analysis of this data requires real-time stream analysis, relying on the cloud for this real-time analysis would generate too much traffic and would also have higher latency (dangerous for this type of applications).
Here it comes a new “definition”: fog computing, also known edge computing. The idea is to not move the data to computation, but to move computation to data. However, the IoT is a database problem, which is not managed in this way these days. And this is the main point of this keynote. Proposing a Polybase for IoT that includes:
- Declarative language: today IoT is based on imperative languages, whereas the goal is to introduce declarative language, such as IoT-SQL (imagine to add a WINDOW and ACTION suffix to a classical SQL query, so that it can act on a range of time, triggering an action when certain conditions happens)
- Complex object modeling: define a standard structure to identify IoT locations in a hierarchical structure (imagine an object model with an API to navigate hierarchies of objects, traversing path and similar stuffs – similar to many MDX statements we know well)
- Scalable metadata management: simple abstraction (Metadata, statistics, access privileges) unified to access different devices; metadata includes collection of standard and extended attributes
- Discrete & Continuous Queries: different query types, such as ExecuteOnce (like a standard SQL), ExecuteForever (continuous flow of responses from device), ExecuteAction (such as ExecuteForever plus an action to execute in defined conditions)
- Multi-purpose queries: here is the smart idea. With a definition of the process at an higher level, the decision of moving the work in the cloud or at the edge (in the fog) is made by the query optimizer, creating a real query plan that distributes the actual work to different parts of the system depending on the requests
This approach is really ambitious, but an important part of it is the idea of embedding security in the system. If you think about the future of IoT, security is of paramount importance. I don’t know if this will be the future of IoT, but this speech raised points that have to be faced, sooner better than later.
You will be able to watch this keynote soon at PASStv on demand.
I’m sitting at the bloggers table at the Day 1 keynote of PASS Summit 2015 and I try to recap the important news for the BI community.
- SQL Server Analysis Services 2016 CTP 3 will be available in a few days here, but Microsoft already announced the news in this release:
- Previously announced news (most of them already described in this blog and/or available in Power BI Desktop):
- Extended Events for SSAS available in SSMS
- Parallel partition processing
- New DAX functions
- DirectQuery over SQL Server 2014
- New DAX functions and syntax (including variables)
- New SQL Server 2016 compatibility level: it’s no longer based on XMLA (that was created for Multidimenisonal) and we have a new Tabular Object Model (TOM). No problem for existing models if they keep the 1103 compatibility level, but once you migrate to the new model, you have to use the new API (but you also have access to the new features). API for TOM is available in JSON, just as in Power BI APIs.
- JSON-based tabular scripting language: while there is still a single file for the entire model, the code is more readable and (most important) it’s easier to be managed in a version control system, possibly merging changes applied by different developers to the same data model
- Bi-directional cross filtering: just as in Power BI Desktop, with an equivalent diagram view layout
- MDX Support for direct query: this is a *BIG NEWS*. Implication is that you will be able to use Excel pivot tables (generating MDX queries) with Direct Query. But wait, there is more about DirectQuery…
- Oracle, Teradata, and APS supported in DirectQuery: while I’m still confident that an in-memory database is very useful when you have a certain amount of data, of for certain calculations, the ability to create real-time dashboards connecting your relational database to a thin semantic layer is wonderful. Just be careful, you have to use compatibility level 1103 for this feature in CTP3 (this will certainly change before RTM).
- SSDT for Visual Studio 2015 available: this is just necessary and expected. The good news is that you don’t need to download 1GB of file if you already have VS2015. Important news here are better performance of SSDT when you design a data model (but I want to test it… again, it’s easy to improve what we have today) and finally we’ll be able to use Visual Studio Configuration manager to create test and production deployment configurations. Really really needed!
- Improved DAX formula editing: now, it’s easy to do something better than what we had in the past! I’m really demanding here and I hope there will be a way to integrate DAX Formatter (I will investigate on that this week). By the way, the new development environment has formula fixup – you rename a measure and an automatic refactor happens, just as in Excel 2016 and Power BI Desktop.
- DBCC for Analysis Services: you can check the consistency of your tabular and multidimensional models.
The keynote was more about SQL Server 2016 as a platform, on premises and on cloud. SQL Server is now better than Oracle in Gartner quadrant, in TPC-H benchmark (for data warehouse) and for security (it is the database with the lower number of vulnerabilities in the last 5 years). This would have been a joke 10 years ago. There are also many improvements such as integration with R, many security, scalability, reliability and performance improvements (in-memory technology in SQL Server is always improving).
You can watch PASS keynote on PASStv: http://www.sqlpass.org/summit/2015/PASStv/Keynote.aspx
That’s all for today’s keynote. More news later this week watching sessions, I hope!
I don’t usually blog about news, unless there is a relevant impact on future directions. Last week we have seen a huge number of announcements for Power BI (Custom Visuals now available to anyone through a gallery online, you can import them in also in Power BI Desktop, and a number of improvements in Power BI Service and a new version of the Personal Gateway).
I am sure we will see a number of announcements this week at PASS Summit 2015. If you look at the public schedule of the sessions, you see that we will finally know more about Analysis Services and Reporting Services, which are expected to show new features in SQL Server 2016. We will finally talk about Corporate BI again, even if we should talk about Hybrid BI these days.
The keynote will start at 8:15 AM Pacific Time on Wednesday, October 28, 2015. I will cover announcements made during the keynote on twitter (@marcorus) and with more details on this blog.
If you will be in Seattle this week, I and Alberto will present these sessions:
I hope Darren will be able to release an update to DAX Studio, which is one of the tools I will use in my session. I already have an early build to show in the session, just in case, but the goal is to make it broadly available before the session.
UPDATE 2015-10-26: DAX Studio 2.3.0 is available!
If you use Analysis Services Tabular, you should dedicate a good amount of time in hardware selection. Unfortunately, throwing money in expensive hardware configuration could be a very bad idea, resulting in your 1,500$ desktop running faster than your expensive 100,000$ server. Moreover, if you use virtualization you have to be very careful in certain settings, that might affect the performance in a very bad way. When I say this, I’m describing queries running up to 2-3 times slower than in optimal conditions. So, why spending time to gain 10% when you have a bigger issue to solve?
I described the main best practices in the article Optimize Hardware Settings for Analysis Services Tabular on SQLBI. This is the result of helping many companies to detect hardware bottleneck and to plan the right hardware configuration. My experience says that the time you spend to correctly allocate the budget has a huge return of investment. Usually you cannot change the CPU or the RAM of a brand new server, so this step is critical. The next step is to check that hardware configuration is correct. It’s incredible how many times I discovered that BIOS settings of an expensive server were the reason of slow performance, so now it’s my first priority when I see a benchmark with suspicious numbers (compared to tech spec of the CPU).
Now, a common discussion I had is that “we have a standardized hardware and virtualization platform”. I completely understand that, but I like to remind that the goal is to get a better return of investment, and standardization has the ultimate goal to reduce costs. So we start to evaluate the cost of a solution that is compliant with the standards, but allocates different hardware to specific workload. The result of a this is spending less (in hardware and licenses) getting more (performance).
I’d like to hear your stories about that – write your experience in the comments!
Last week we launched a new page in SQLBI website: DAX Puzzle (you can also use www.daxpuzzle.com).
The idea is very simple: we describe a scenario, we ask to solve a problem in that scenario and we might provide some hint to help you finding the solution. You can download a sample file (in Power BI Desktop, which is freely available to anyone – but we might consider Excel too, please provide your feedback) and spend some time finding the solution. When you are done, or if you are curious and don’t have enough time, you can access to the page with the solution, read our description, download the file with the solved problem.
There are no prizes. It’s just workout for your mind. But it’s a good way to check whether you have something to learn in DAX. For every puzzle, we also provide some link to particular section of our Definitive Guide to DAX, which described the topic related to the puzzle in a deeper way. Yes, I admit, it’s also a marketing initiative, but it’s funny if you like DAX!
The first puzzle we published is about USERELATIONSHIP. We already received several comments and I suggest you to use the scenario page only for comments about the question, whereas the solution page is the right place to discuss alternative solutions. There are interesting conversations about the performance of different approaches, and I would like to advise you that this first puzzle is not about performance. In fact, the faster solution doesn’t use neither USERELATIONSHIP at all, and requires the new GROUPBY function. Now you also have another reason to read not only the solution, but also all the comments!
The next puzzle will be published in a few days… subscribe to our newsletter to be notified about new puzzles!
I am so happy to announce that The Definitive Guide to DAX is finally available!
I and Alberto Ferrari spent one year writing this book, and several years collecting the knowledge necessary to do that. The complete title is The Definitive Guide to DAX: Business intelligence with Microsoft Excel, SQL Server Analysis Services, and Power BI. You can imagine why we like to shorten it! However, the complete title gives you an important hint: this book cover the new DAX syntax of Excel 2016, Power BI Desktop, and Analysis Services 2016. For example, we covered all table functions useful for calculated tables, which is a feature released in Power BI Desktop after we completed the book writing. This has been an additional challenge, but our goal was to publish a book dedicated to the DAX language, independent from the product and completely up-to-date.
But everything has a cost. It took us a huge amount of time to reach the depth and completeness we wanted in this book. And it will took you weeks if not months to read it cover-to-cover. Yes, I know, you no longer read technical books in this way. You open it at the right chapter and you get the content you need, you copy the pattern, you get the good hint. I do that at least once a week. But you will be able to use this book in that way once you have a solid understanding of DAX. At the beginning, my suggestion is to start from chapter 1, even if you are an experienced DAX developer.
What if you are a DAX beginner? This book will be your guide, but you might consider a more introductive book to start (you can find other books from us and from Rob Collie, depending on the product you use and the writing style you prefer). This is particularly important because we don’t spend a line in the book discussing about user interface. We wrote a book about the DAX language, so you have to know in advance the UI of a product that use this language. Today, the list ranges from Excel (2010/2013/2016), Analysis Services (2012/2014/2016), and Power BI Desktop.
Why am I so excited about this book? After all, I wrote many books (this should be the 10th in English, and I wrote other three books in Italian). Well, first of all, after a few months after completing the writing, I and Alberto would not add or modify anything in this book. As you will read in the introduction, we made no compromises. We thought the size would have been 450-500 pages, but the result is 530 pages of content (plus indexes, table of contents, and so on). Is it the perfect book? No, I am pretty sure we will discover some error and something to clarify and to fix. It always happens. But we set the bar very high this time, and we are very satisfied about the final result. Only reviews will tell us if our perception I right, but we know this is the best result possible today. We had technical reviewers that helped us so much in getting the point of view of the reader, and I would like to mention the incredible job made by Gerhard Brueckl. Believe me, if you wrote a technical book, your worst nightmare is the technical reviewer that review too much, so that you spend more time explaining why you were right instead of fixing the content. Well, Gerhard had the skills and the ability to highlight the right thing. Thanks Gerhard, you deserve a public mention!
After this self-celebration, let me spend some paragraph about the content. We use this book as companion content for our courses Mastering DAX *and* Optimizing DAX. During the courses we have hands-on-labs and a lot of interactions, but we constantly refer to the book to get more detailed information about specific functions and behaviors. Thus, if you attend these courses, you will find it easier to read the book. But you will not be able to skip it! Here is the table of contents, with some comments:
- Foreword: three of the authors of the DAX language and the VertiPaq engine wrote the foreword of our book: Marius Dumitru, Cristian Petculescu, and Jeffrey Wang.
- Introduction: read the introduction before buying the book. You will understand if it is the book of you or not.
- Chapter 1: What is DAX?
- Chapter 2: Introducing DAX
- Chapter 3: Using basic table functions
- Chapter 4: Understanding evaluation contexts
- Chapter 5: Understanding CALCULATE and CALCULATETABLE
- Chapter 6: DAX examples
- Chapter 7: Time intelligence calculations
- Chapter 8: Statistical functions
- Chapter 9: Advanced table functions
- Chapter 10: Advanced evaluation context
- Chapter 11: Handling hierarchies
- Chapter 12: Advanced relationships
- Chapter 13: The VertiPaq engine
- Chapter 14: Optimizing data models
- Chapter 15: Analyzing DAX query plans
- Chapter 16: Optimizing DAX
Topics in chapters 1 to 12 are covered in our Mastering DAX workshop. We organized the content so that you can read them one after the other. The content is very dense, at the beginning we use simpler examples, but we never repeat the same concepts, so if you skip one chapter you might miss some knowledge to fully understand the following topics. Even in chapter 6, which tries to consolidate previous content with practical examples, you will find something new in terms of ways you can use DAX.
Topics in chapters 13 to 16 are covered in our Optimizing DAX workshop. Please, don’t jump to this part if you didn’t read the previous chapters before. Also for attendees of the course, we suggest to complete the self-assessment for prerequisites to attend the course, and you can try to do the same for the book. If you are not ready, you will simply see a huge amount of numbers, without understanding how to connect the dots. You need a solid and deep knowledge of how evaluation context works in DAX before doing any optimization.
My personal estimate is that if you dedicate one week to every chapter, you will be able to complete the learning in 4 months. Read the book, absorb the content, make practice. You might be faster at the beginning if you already know DAX. But be careful, you never read anywhere what we describe in chapter 10 (we rewrote that chapter 3 times… but this is another story), and this is of paramount importance to really “understand” DAX. You hardly have seen the complete description of all DAX table functions in chapter 9. You will not find an extensive use of variables, but the VAR / RETURN syntax is described early in the book and you will see this used more and more with the advent of Excel 2016 / Power BI Desktop / SSAS 2016.
Finally, the goal of the book is not to give you patterns and best practices, but to teach you how DAX works, how to write good code, and how to measure the performance, find the bottlenecks and possibly optimize it. As I always say, do not trust any best practice when it comes to DAX optimization. Don’t trust blogs, articles, books. Don’t trust my writings, too. Simply, measure and evaluate case by case. And the reason is the first answer to any question that the consultant receive: it depends!
If you want to order the book on Amazon, here is a quick reference to links in all the available versions of this site:
- North America
- South America
Have a nice reading!
This is one of the few non-technical posts of this blog. Just skip it if you want to quickly come back to 100% BI related topics.
Last Saturday we run the SQL Saturday 454 in Turin. I was part of the organization, and actually I was one of the promoters for this event, running on the same city just a few months after SQL Saturday 400. The reason for that was an idea we had a few months ago. Running a SQL Saturday very close to Milan, the city hosting Expo 2015 until October 31, 2015. In our plans, we should have been able to attract a large number of foreign attendees interesting in combining a week-end in Italy, one day in Turin for SQL Saturday, and one day in Milan for Expo 2015. The initial target was more than double the attendees of a “regular” SQL Saturday in Italy, reaching 250 people and maybe also 300. After all, everyone was looking forward to visit Expo 2015, right?
Unfortunately, I was wrong.
Part of my job is reading through the numbers. It took me just a few hours after opening a survey through our SQLBI newsletter and other social media to realize that Expo 2015 was not the worldwide attraction we assumed initially. Our ambitious goal was completely unreachable, and this was clear to me before anyone else accepted that. So we downsized the venue, but we wanted to run the best event we can. After all, it was still the SQL Saturday close to the Expo 2015. And we kept the event in English. We requested all the speakers to delivery their speeches in English, regardless of the fact 90% of attendees would have been Italian.
Now, if you never visited Italy, you might be not aware of the lack of English skills of the majority of the population. You might think that people working in IT should have English skills in their CV by default. While this is true for reading technical documents, it is not entirely true for listening and speaking. From this point of view, the situation in Europe is very different between different countries. Smallest countries have better English skills. My guess is that movies are not dubbed, many have just subtitles, whereas largest countries (Germany, France, Spain, and Italy) tend to distribute only the dubbed version of the movies, keeping the original version only for limited number of cinemas in large cities. This fact alone makes a big difference in listening and speaking capabilities. I don’t have any study to demonstrate this correlation, it’s just my experience as a frequent traveler.
I wanted to write this disclaimer to describe another challenge we had for SQL Saturday 454. We were at risk of not having enough foreign attendees (a certainty for me) and not having a good number of Italian attendees, frightened by the fact that all the sessions would have been in English. In the past, we had only a few sessions in English, but a complete conference in a foreign language without simultaneous translation was an unprecedented experiment. However, I was confident this would have stopped someone, but not many of the interested attendees.
At this point, you might be curious to know whether the event was a success or a failure. Well, in terms of numbers, we reached our predicted (downsized) target. It was an event slightly larger than the average in Italy and, ignoring our initial unreachable dreams of glory, it has been a success. But what impressed me was something unexpected.
There is a number of IT professionals in Italy that can attend an event, following all the sessions, engaging the speakers, making questions and keeping the conversation without the language barrier I was used to see a few years ago. I was wrong again, but this time in a pleasant way.
The economic turmoil of the recent years has been very though in this country. I have a privileged position and a particular point of view, clearly seeing the issues that limit the competitiveness of companies and professionals in the global market, especially in IT. Language barrier is one of the many issues I see. Lack of self-investment in education is another one. And the list does not end here. I am an optimist by nature, but I am also realistic in any forecast. People around me know I don’t predict anything good for Italy in the short and medium term. However, even if I still don’t have data supporting that, I feel something has been changing.
I have a new hope.
There is a number of people spending a sunny Saturday in Italy to attend a conference in English, and they are able to not only listen, but to interact in a foreign language. I am sure nobody (myself included) would have bet anything on that ten years ago. For one day, I felt at home in my city doing my job. If you attended SQL Saturday 454 in Turin, I would like to thank you. You made my day.
In the September release of Power BI Desktop, Microsoft introduced a new important feature: calculated tables.
Chris Webb wrote a quick introduction to this feature, and Jason Thomas published a longer post about when to use calculated tables.
The reason of this excitement about this feature is that it adds an important tool to the data modeling capabilities of DAX based tools (even if, at the moment, only Power BI Desktop shows this feature, but I guess that at least Analysis Services 2016 will provide the same capability). Using calculated columns you can materialize the result of a table DAX expression in the data model, adding also relationships to it. Without this tool, you should read the data from outside Analysis Services and then push the data back - and this wouldn't be possible in Power BI. I implemented similar techniques in the past by using SQL Server linked servers, materializing the result of a DAX query in a SQL Server table, and then importing that table again in the data model. Thanks to calculated columns, today I wouldn't to this roundtrip and I would save processing time and reduce data model complexity.
Alberto Ferrari wrote an article describing a good use case for calculated tables. The article presents an implementation of a transition matrix between customer categories evaluated automatically based on other measures (for example, the revenues). I suggest you reading Transition Matrix Using Calculated Tables and then try to implement the same intermediate table for the calculation with other techniques (ETL, SQL, ...). You will discover that calculated tables help you writing cleaner and faster code for a transition matrix pattern.
Today (September 30, 2015) is the last day to submit an entry in the Power BI Best Visual contest. I and Daniele Perilli (who has the skills to design and implement UI) spent hours thinking about something that would have been challenging and useful at the same time. Daniele published a couple of components (Bullet Chart and Card with States) that have been useful understanding the interfaces required to implement a Power BI visual component. But the “big thing” that required a huge amount of time was another.
We wanted a component to color areas of a diagram, of a planimetry, of a flow chart, and of course of a map. From this idea, Daniele developed (and published today – what a rush!) the Synoptic Panel component for Power BI.
The easiest way to see it is watching the video. However, an additional description can help. Let’s consider a couple of scenario. For a brick and mortar shop, you can color the areas corresponding to categories (and subcategories) of products, using either saturation of colors or three-state logic (red-yellow-green, but you can customize these colors, too).
But what if you are in the airline industry? No problem, it’s just another bitmap.
Wait a minute, how do you map your data to the graphics? How can you start from a bitmap, and define the areas that you want to relate to airplane seats or product categories and subcategories? We don’t have coordinates like latitude and longitude, right?
Well, you can simply go in http://synoptic.design, import a bitmap and design your area, straight in the browser, no download, no setup, no fees required. Each area has a name, that you will use to connect data to your data model. Yes you read it right. You will not change your data model to use the Synoptic Panel. For example, here you draw seats area in an airplane:
And with some patience you locate all the areas of a shop, too:
In the right panel you have the coordinates you can modify manually, and the editor also has grid to help you in alignment (snap to grid feature is also available).
Once you finished, you export the area definition in a JSON file that you have to save in a public accessible URL so that it will be read by the component (we will add the capability to store this information in the database, too – yes, dynamic areas will be available, too).
At this point, in Power BI you insert the component, specify the URL of the bitmap, the URL of the JSON file with the areas, the category, the measure to display, the measure to use for the color (as saturation or color state), you customize the colors, and your data are now live in a beautiful custom visualization.
Thanks Daniele for you wonderful job!
If you use SSAS Tabular, this is a very important news!
Microsoft released a very important update for Analysis Services 2012 that provides performance improvements to pivot tables connected to an Analysis Services Tabular model: it is SQL Server 2012 SP2 Cumulative Update 8.
Microsoft discussed some of these improvements in this blog post: Performance problems on high cardinality column in tabular model
UPDATE 2015-09-22: I fixed this post in the following part.
In a previous version of this post, I wrongly reported that this fix was fixing the unnatural hierarchies problem, too! This is described in the article Natural Hierarchies in Power Pivot and Tabular. In reality, only Power Pivot for Excel 2016 and SQL Server Analysis Services 2016 fixed the issue, which is still present in previous versions of Analysis Services (2012/2014) and Power Pivot for Excel (2010/2013).
Almost 3 years ago I wrote an article with the rules for DAX code formatting. If you quickly look at the article, you might think that it is all about readability of the code, and this is fundamentally true. But there two rules that have a particular importance for performance, too:
- Never use table names for measures
- Always use table names for column reference
- Even when you define a calculated column within a table
Well, it is not that writing/omitting table name has a direct impact on performance, but you can easily miss an important bottleneck in your formula. Let me clarify with an example. If I read this:
= [A] + SUMX ( Fact, Fact[SalesAmount] )
I would say that SalesAmount is a column of the Fact table, and the SUMX iteration will not perform a context transition. But if I read this:
= [A] + SUMX ( Fact, [SalesAmount] )
I would start to be worried about the number of rows in Fact table, because each one will invoke a context transition for the measure SalesAmount evaluated for each row of the Fact table, creating a different filter context for each evaluation.
This simple detail makes a huge difference in performance. Context transition is fast, but doing it million times require time.
Wait a minute: are you asking yourself what is a context transition and why a measure generate it? No problem: read the article Understanding Context Transition to get a quick recap of the question (and if you want to dig deeper, preorder The Definitive Guide to DAX, available in October 2015!
The growing adoption of Power BI Desktop is gathering new users of the DAX language. At the same time, there are a few new features, such as bidirectional filter propagation, that introduce new concepts to existing knowledge. For this reason, in the last weeks we published two articles describing important basic concepts and clarifying the behavior of filter context propagation with the bidirectional filter.
A more complete description is included in our new book, The Definitive Guide to DAX, which will be available in October 2015.
The August 2015 update of Power BI Desktop added two important features for existing Excel and Analysis Services users:
In case you didn't try it before, Power BI Desktop can connect to Analysis Services Tabular (connection for Multidimensional will arrive later, but Microsoft is working on it). It is interesting to consider that Power BI Desktop sends DAX queries that are different for Analysis Services 2012/2014 and Analysis Services 2016. The latter has better performance, thanks to the many new DAX functions and other improvements in the query engine. Thus, especially in complex reports, consider a test using the latest available CTP of SQL Server 2016 (at the moment of writing the CTP2.3 is the latest available, but consider that new versions might be released every month).
The other important news is that you can import in Power BI Desktop an existing data model created in Power Pivot for Excel. In reality, you import also Power Query scripts and Power View reports. I found some minor issue when I imported linked tables, but overall the experience I had is very good. After you import the data model, you can refresh it within Power BI. If you used Excel linked tables, you have a Power Query script that reads the same data from the original Excel files when you refresh the data model.
The opposite is not possible, so you cannot import in Power Pivot for Excel a data model created in Power BI Desktop. Since a real pivot table is not present in Power BI today, it would be very useful being able to connect an Excel pivot table to an existing Power BI data model. If you like having this feature integrated and supported, please vote the Ability to connect Excel to Power BI Data Model and create Pivot/Charts suggestion in Power BI support web site.
Now, as it is described in the proposal, there are two ways to obtain this feature:
- Connect the pivot table to the model hosted on powerbi.com: this would be similar to the connection to a model hosted in SharePoint. I guess that the only existing barrier to implement this feature is the authentication, in fact such a feature is not available in SharePoint online, too. Of course, such a feature would be more than welcome.
- Connect the pivot table to a local PBIX file: this is a completely different story, and it would part of the scenarios I described in the Power BI Designer API feature request a few months ago (with around 1,400 votes it is the fifth most requested feature). In this case, the implementation might be realized in two ways: by integrating the Power BI engine within Excel, or by connecting Excel to Power BI Desktop. The former is unlikely to happen, because Power Pivot for Excel is already the engine we are talking about and I think that the release cycle of the two products will be always different in order to enable this scenario. The latter i simpler, and actually is already possible and completely unsupported. It would be nice if Microsoft simply enable the support for it.
At this point you might be curious about how to connect an Excel pivot table to Power BI Desktop. Well, let me start with an important note.
DISCLAIMER: the following technique is completely unsupported and you should not rely on that for production use, and you should not provide this to end users that might rely on that for their job. Use it at your own risk and don't blame neither me nor Microsoft is something will not work as expected. I suggest to use this just to quickly test measures and models created in Power BI using a pivot table.
Well, now if you want to experiment, this is the procedure:
- Open Power BI Desktop and load the data model you want to use
- Open DAX Studio and connect to Power BI Desktop model
- In the lower right corner of DAX Studio, you will find a string such as "localhost:99999", where 99999 is a number that is different every time you open a model in Power BI Desktop (the same model changes this number every time you open it). Remember this number
- Open Excel (2007, 2010, 2013 - you can use any version) and connect to Analysis Services (in Excel 2013 go in Data / Get External Data / From Other Sources / From Analysis Services), specifying the previous string "localhost:99999" as server name (using the right number instead of 99999) and using the Windows Authentication
- At this point you will see a strange name as database, and a cube named Model. Click Finish and enjoy your data using a pivot table, a pivot chart, or a power view report (why should you use the latter in this scenario I don't know...)
I will save your time describing the problems you will have using this approach:
- If you close the Power BI Desktop window, the connection will be lost and the pivot table will no longer respond to user input.
- If you save an Excel file created with this connection, the next time you open it the connection should be updated, using the right server name with correct number (if you try to refresh the pivot table, you get an error and you can change the server name in a dialog box that appears).
- This feature might be turned off by Microsoft at any moment (in any future update of Power BI Desktop).
That said, I use this technique to test the correctness of measures in a Power BI Desktop data model, because the pivot table is faster than other available UI elements to navigate in data examining a large number of values. But I never thought for a second to provide such a way to navigate data to an end user. I would like this to be supported by Microsoft before doing so. Thus, if you think the same, vote for Microsoft supporting it.
After many years of helping several companies around the world creating small and large data models using SQL Server Analysis Services Tabular, I’ve seen a common performance issue that is underestimated at design time. The VertiPaq engine in SSAS Tabular is amazingly fast, you can have billion of rows in a table and query performance are incredible. However, in certain conditions queries made over tables with a few million rows are very slow. Why that?
Sometime the problem is caused by DAX expressions that can be optimized. But if the problem is in the storage engine (something that you can measure easily with DAX Studio), then you might be in bigger troubles. However, if you are not materializing too much (and this is a topic for another post of for the Optimizing DAX course), chances are that you are paying the price of expensive relationships in your data model.
The rule of thumb is very simple: a relationship using a column that has more than 10 million unique values will be likely slow (hopefully this will improve in future versions of Analysis Services – this information is correct for SSAS 2012/2014). You might observe slower performance already at 1 million unique values in the column defining the relationship. As a consequence, if you have a star schema and a large dimension, you have to consider some particular optimization (watch my session at Microsoft Ignite to get some hint about that).
If you want to know more, read my article on SQLBI about the Costs of Relationships in DAX, with a more complete discussion of the problem and a few measures of the timings involved.