SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus
I received the communication from the PASS Program Committee that none of my proposals have been accepted at PASS Summit 2013. Probably this year there was too many proposals and I made a mistake not using all the possible proposals available per speaker. In fact, I just proposed one pre-conference day and two regular sessions:
- From 0 to DAX (preconference)
DAX is the new language for creating Business Intelligence in the Microsoft stack. You can use it to query and program both Microsoft PowerPivot for Excel 2013 and the new SQL Server Analysis Services Tabular model. Learning the basics of DAX is very simple, but mastering it is a different story because it requires a mindset that involves some study, both for BI Professionals and Information Workers.
During this one-day workshop, we will guide you in the process of learning the DAX language, through many examples and scenarios that help you understand what are the important concepts of DAX and how to leverage them in your everyday work.
The seminar is useful to many profiles: BI developers who want to learn the DAX language and information workers interested in exploiting the many interesting features of PowerPivot for Excel 2013.
- Optimizing Data Models in Tabular & PowerPivot (regular session)
Is your new Tabular solution performing at its best? Are you using the best practices to reduce memory footprint, increase query speed and get the best out of the new engine? In this session, we will look at several techniques that can really make the difference in a Tabular solution. Distinct count reduction, join optimizations, condition consolidation, pros and cons of normalized data models, selection of the columns to store in the database are some of the topics covered in this session, which will highlight not only the best practices, but also practices that were best in Multidimensional and became worst in Tabular. As we are used to teach: if you known Multidimensional, you need to forget it in order to let the new concepts of Tabular shape your model.
- Time Intelligence with DAX (regular session)
Time Intelligence is probably the most used feature of any analytical solution. Computing Year To Date, Month To Date, Same Period Previous Year is quite easy in DAX. However, the DAX formulas start to be harder to write as soon as the requests for time intelligence involve working days evaluation, complex and custom calendar shapes, seasonal pattern recognition, ISO-based weeks handling, 445 fiscal calendar.
In this session, we start showing how to compute the classical time intelligence by using the built-in DAX functions. Then, we will see some more complex time-intelligence formulas that require thinking out of the box, using advanced data modeling and querying techniques to produce interesting and useful formulas.
I have been informed that this year a new selection process has taken place. Abstracts reviews and speakers reviews have been made by two separate teams, and a final judgment has been made by combining the two ratings. In particular, abstracts team didn’t know the speaker’s name of each session. It seems a smart way to handle 850 submission in a transparent way. Unfortunately, I wasn’t aware of this new process.
In the last years, I and Alberto Ferrari had too many sessions selected, so the board decided to assign us only a few of them and the selected ones wasn’t always the best ones (because it wasn’t possible for us to establish an order of preference in case more submissions were selected). For this reason, this year we limited our submissions to the sessions that were more likely to be interesting, considering the current request on the market, the questions we received usually and the need to avoid presenting something already presented in the previous year. So we made a small selection of sessions avoiding to submit the maximum number of sessions per speaker. But this has been a wrong choice, because based on the new evaluation system, it would have been certainly better to propose more sessions and not only in the area in which we are most specialized.
I am not saying I have proposed sessions that was better than others selected. I have seen the proposals here and there are very interesting sessions (I still don’t know which ones have been selected at the moment of writing). I would have just proposed at least other 2 regular sessions more advanced and focused on specific use cases. Being part of a board selecting sessions in other conferences (like this) I know that every selection process has some drawbacks and the best thing you can do is to communicate very well how the selection process will work, so that everybody is able to help the board producing the best proposals. I thought that proposing a lower number of session would have been better, but I was wrong.
Well, I have been a speaker at PASS Summit in the last 4 years in a row (2009, 2010, 2011, 2012), so it’s probably time to take some rest from speaking!
I and Alberto Ferrari will moderate the following Birds of a Feather session at TechEd North America 2013:
TITLE: How Far Can You Push Tabular?
SPEAKER: A. Ferrari; M. Russo
TIMESLOT: June 5, 2013 at 1:30 pm
Abstract: Tabular is the new engine in SQL Server Analysis Services. It is an in-memory columnar database capable of unprecedented performance but… how far can we go with an in-memory database? Will you be able to load your data warehouse in-memory to obtain the best performance? What should you care about when building the DWH? What does “fast” mean? What about real-time data? Can you query many billions rows really fast? In this BOF we will show some theory and share some of our experience, but we will welcome your questions and experience, to build a round-table of experts in order to speak about this fascinating topic.
I really look forward to this event – I and Alberto have some experience using Tabular in many different scenarios, but every time we meet new people at conferences we discover that there are new challenges ahead. This session will be the place to discuss together where are the limits today. Much far than many people realize, in my opinion!
In June I and Alberto will deliver a pre-conference seminar at both TechEd North America (New Orleans, LA) and TechEd Europe (Madrid, Spain).
This day is a very good quickstart for those of you that still didn't complete one of our books, or those of you that missed one of our workshop about Tabular or PowerPivot. If you are planning to go to TechEd, you might also consider attending a full day about DAX, following the From 0 to DAX one-day seminar. Here are the links:
- TechEd North America – From 0 to DAX Pre-Conference Seminar (New Orleans, LA - June 2, 2013)
- TechEd Europe – From 0 to DAX Pre-Conference Seminar (Madrid, Spain - June 24, 2013)
And in case you are underestimating the importance of DAX in your future BI projects... read this blog post from Dandy Weyn - his privileged point of view inside Microsoft highlights how much DAX is important today and will be pervasive in the future!
I published an article describing how to implement the grouping of sales transactions by age of customer at the moment of the transaction by using PowerPivot or Analysis Services. The same pattern can be used also for any kind of banding operation, this specific case is useful also to recycle the formula that gets the exact age of the customer for each transaction.
An interesting point is related to performance optimization. The technique is based on adding a calculated column in a table that might contain millions of transactions. This is less expensive than adding a column that contains a foreign key and then a relationship between a table containing group definitions and the transactions table. Every relationship is expensive and generates additional structures (you can see more files in the Analysis Services database, too). Adding one or two columns that have a low number of distinct values (10-15 rows) usually has a lower memory cost than creating a relationship with another table. The article also contains PowerPivot examples for both Excel 2010 and Excel 2013.
If only I could decouple attribute visualization from physical structure, I would put these “degenerate dimensions” in a separate folder, because in this way such attribute will be included in attributes belonging to the fact table, which might not be so clear in presenting data. However, I understand that such a decoupling could make live very hard to DAX clients (but probably for MDX it could be not a big issue).
I'm so happy that DAX Studio finally supports Excel 2013! As Darren Gosbell described in his blog, this release has a few internal changes that will better support future enhancements. I will port the code to capture the query plan for a query in this new release, but unfortunately it will require some weeks because I'm traveling a lot in these days.
If you write DAX formulas and queries for PowerPivot or Analysis Services Tabular, DAX Studio is a must have tool: do you really want to live without a DAX Editor? There are a lot of possible improvements and I hope other contributors will give their help to this Codeplex project.
Microsoft released the Workbook Size Optimizer for Excel, the first version of an Excel add-in for Excel 2013 that inspects the data model and suggest possible optimizations. Fundamentally, it tries to apply the best practices descripted in a white paper I mentioned a few weeks ago, removing useless columns and changing granularity to those that could reduce the overall memory cost of a table.
There are different setup available in the download page, depending on operating system (Windows 7 or Windows 8) and on Office version (32 or 64 bit). Once installed, you have a new tab in the Excel ribbon, called Workbook Size Optimizer, showing a single button that starts a wizard.
I tried to run the optimizer with a workbook where I imported several tables from Adventure Works Data Warehouse sample database. The first page shows a few information about the workbook size and the option of automatic detection or manual choice of rules. The latter is an option you can request also later, so I started with the default.
After a short analysis, I received three smart suggestions (considered the model I have). We might wonder that removing UnitCost is a smart thing, because it could be required in order to perform calculations and rounding the value might be not correct for our analysis.
Since I requested to apply some changes, I have the option of changing which rules to apply. This corresponds to the choice you have if you choose “Let me choose the rules myself” in the first screen of the wizard.
I kept all the rules and after I click Next I had to wait several seconds in order to complete the optimization process. The result shows a few information about the result of the job.
This is a good starting point. Don’t blindly trust any suggestion and try to consider carefully the rules to apply in order to avoid losing important data for your analysis. Moreover, you might have a better knowledge of your data model than a wizard and consider the deletion of many useless columns (for your analysis) that are not identified by the wizard. My article Checklist for Memory Optimizations in PowerPivot and Tabular Models contains several best practices that you can apply to your data model.
One year after the release of SQL Server 2012 I see the growing demand for DAX. There are two reasons for that: an higher number of PowerPivot users started to build more complex data models, and SSAS Tabular is starting to be adopted by a larger number of companies, with and without a previous experience on former versions of Analysis Services.
For this reasons we decided to offer a first public edition of our Advanced DAX Workshop, a training on DAX that is aimed at Advanced PowerPivot users and Analysis Services developers that want to master the DAX language. Up to now, we offered this course only for private classes, because of the limited demand, but now there is enough interest and adoption to justify an open class.
The goal of this DAX training is learning to write DAX expressions for measures and calculated columns, DAX queries for reporting needs, read DAX query plans and optimize DAX formulas. The course is a three-day workshop that includes many hands-on lab sessions, with exercises that will guide you in the learning process of the more advanced DAX concepts, enabling you to master the writing of DAX code.
The course will be in London on May 13-15, 2013. There are direct flights with a huge number of countries and cities, also outside of Europe. We do not expect to deliver other editions of this course before other 5-6 months, so don’t lose the chance to attend this intensive DAX master course. I will be the teacher in this edition and Chris Webb will assist me in organization with Technitrain. So don’t wait, early bird discount will expire in a few days, register now and join us in London!
During a PowerPivot Workshop course we received an interesting question from a student: “Can I use LASTNONBLANK (and FIRSTNONBLANK) with a column which is not a date column?”
The reason is that we introduce LASTNONBLANK in the Advanced Time Intelligence module, because its typical use case is on a date column. However, you can use these functions on any column, which raises the question about what happens at that point. The sort order used is the one that depends on the data type of the column. If it is a Text column, the alphabetical sort order is the reference order. If it is a number, then the numeric order is the reference.
What happens if a column has the “Sort By Column” property set to another column? This sort order is *not considered* by LASTNONBLANK and FIRSTNONBLANK functions. Even if a PivotTable shows you data sorted according to Sort by Column property, any DAX formula ignores such a sort order. Thus, be careful writing your DAX queries if you have to do some assumptions on the sort order of a column using DAX functions that rely on sort order, such as LASTNONBLANK and FIRSTNONBLANK.
On April 23 I will present DAX in Action in London and Cardiff at SQLLunch event.
How is it possible I will be in two places at the same time?
This will be a remote presentation delivered in two locations, where you can have lunch while watching the session.
What is this session about? This is the session description:
Tabular is the new SSAS modeling experience and DAX is the new language to use to create BI solution with Tabular. How does it compare with MDX and Multidimensional? In this session, which is mostly based on demos, we will build a complex BI solution from scratch, starting from simple analysis and moving through complex scenarios, showing how you will leverage the tremendous speed of DAX to create complex solution on simple data models, focusing on the differences in building the same solution in MDX or DAX.
These free events are organized by the UK SQL Server User Group. If you are interested, you can register using the following links.
London : http://sqlserverfaq.com/events/534/SQLLunch-All-stuff-no-fluff-Marco-Russo.aspx
Cardiff : http://sqlserverfaq.com/events/535/SQLLunch-All-stuff-no-fluff-Marco-Russo.aspx
The PASS Business Analytics Conference starts today in Chicago. In the next two days, there will be two keynotes. The most famous Steven Levitt, author of Freakonomics, will be on stage on Friday, and tomorrow (Thursday) we will see Kamal Hathi and Amir Netz. I will have two speeches at the conference that I already described here, but before them I will cover the keynotes on my blog and on my twitter account (@marcorus).
I do not expect a detailed coverage of technical details of the products we know and love. I look forward to see a conversation about the goals of data analytics: the why, the how and what are the tools available. Advances in tools and software are important, because they are enabling scenarios that were simply not possible before, but the directions for the road ahead are not written in stone. I will write my thoughts here during the conference – stay tuned!
Microsoft published an interesting article about how to optimize memory consumption of a PowerPivot Data Model in Excel 2013. All these suggestions are also valid for SSAS Tabular. I also wrote an article Checklist for Memory Optimizations in PowerPivot and Tabular Models with a summary of the best practices.
The short list of things to do is very valuable:
- Removing columns non necessary for analysis
- Identity column (PK) of a fact table
- Timestamps, guid and other info useful for auditing and replication, but with no data for analysis
- If a column has too many distinct value and cannot be removed (i.e. transaction ID in a fact table for drillthrough), consider splitting the column into multiple distinct parts.
- Each one of the parts will have a small number of unique values, and the combined total will be smaller than the original unified column.
- Always separate date and time in two columns, instead of the original datetime.
- In many cases, you also need the distinct parts to use as slicers in your reports. When appropriate, you can create hierarchies from parts like Hours, Minutes, and Seconds.
- Keep only the granularity you really need.
- Normalize columns keeping only those with the lower number of distinct values
- For example, if you have quantity, price and total line amount, import quantity and price and calculate total line amount as SUMX( Sales, Sales[quantity] * Sales[price] ) instead of SUM( Sales[line amount] ) importing line amount.
- Reduce precision of number to reduce distinct values (i.e. round to integer if decimal values are not relevant).
The reason is that VertiPaq compress data at column level, creating a dictionary for each column and storing for each row only the number of bits required to store the index to the dictionary. More details in the article Optimizing High Cardinality Columns in VertiPaq I wrote a few months ago and on the SSAS 2012 Tabular book.
A useful macro to analyze memory consumption and quickly identify the most expensive tables and columns in a PowerPivot workbook is available on Kasper De Jonge blog What is eating up my memory the PowerPivot / Excel edition. There is also a version for a Tabular database in his What is using all that memory on my Analysis server instance post.
Our new book about PowerPivot 2013 is finally available in printed edition, too!
The title is Microsoft Excel 2013: Building Data Models with PowerPivot and it is a partial rewriting of the previous book about PowerPivot for Excel 2010. In the previous book we had a target audience that included advanced Excel users and BI developers, because at that time there was no option to get the same engine in Analysis Services. But 30 months are elapsed, a new version of Analysis Services has been released and in this new book we focused mainly on Excel users. For this reason, we wrote a comprehensive book of all the feature of PowerPivot, but most important we tried to pass concepts of data modeling that might be pretty obvious for a DBA and a BI developer, but are completely new to an Excel user that never had the ability to create a data model with more than one table.
This book is focused on Excel 2013, so we included specific feature of this release related to PowerPivot, such as writing DAX queries and linked back tables, and features unique to Excel 2013, such as Power View. However, all of the PowerPivot features (so the 85% of the book) are good also for PowerPivot for Excel 2010 in its latest release (SQL Server 2012 SP1 PowerPivot for Microsoft Excel 2010), so you can safely use this book for both version of Excel.
You can download the first chapter of the book from the book page on SQLBI web site. And if you want to attend a training in a classroom or online, look at the complete list of available trainings on PowerPivot Workshop web site. The next online courses are scheduled on April 22-24, 2013 and June 17-19, 2013 (following online workshops are every other month).
Here are the links to directly order the book on Amazon around the world:
And here is the list of chapters:
- Chapter 1 Introduction to PowerPivot
- Chapter 2 Using the unique features of PowerPivot
- Chapter 3 Introducing DAX
- Chapter 4 Understanding data models
- Chapter 5 Publishing to SharePoint
- Chapter 6 Loading data
- Chapter 7 Understanding evaluation contexts
- Chapter 8 Understanding CALCULATE
- Chapter 9 Using Hierarchies
- Chapter 10 Using Power View
- Chapter 11 Shaping the Reports
- Chapter 12 Performing Date Calculations in DAX
- Chapter 13 Using Advanced DAX
- Chapter 14 Using DAX as a Query Language
- Chapter 15 Automating Operations Using VBA
- Chapter 16 Comparing Excel and SQL Server Analysis Services
This book should help you starting with PowerPivot at the very beginning, and you will probably use only the first chapters at that point. Over time, you will use following chapters and will learn more advanced techniques. This is not a book you can digest in a couple of days (after all, it is 500 pages long!), it will be your companion for several months, until you will master PowerPivot!
I’ve been interviewed by Greg Low in SQL Down Under show 58, and this is *not* an April fool! We talked for one hour about Tabular, Multidimensional, Data Warehouse and just a little bit about music (you can discover which music genre I’m used to listen…).
You can hear this interview from the SQL Down Under Show 58 page (it is an MP3 format) and if you like it there are many other past shows available. The PodCast is also available on iTunes and you can hear other podcasts in SQL Down Under page on iTunes Preview.
If you are using Microsoft BI stack and you still didn’t start learning DAX, you should not wait any longer. One of the option you have is starting with one of our books, or you can also attend one of our workshop about Tabular or PowerPivot. But if you are planning to go to SQLBits or TechEd, you might also consider attending a full day about DAX, following the From 0 to DAX one-day seminar. Here are the links:
- SQLBits – From 0 to DAX Training Day (Nottingham, UK - May 2, 2013)
- TechEd North America – From 0 to DAX Pre-Conference Seminar (New Orleans, LA - June 2, 2013)
- TechEd Europe – From 0 to DAX Pre-Conference Seminar (Madrid, Spain - June 24, 2013)
Take a look at Early Bird expiration date – you can save a good portion of your budget by registering within March 22, 2013 at any of TechEd conference. For SQLBits there is a discounted price until April 7, 2013.
Even if I’m not blogging much in these weeks, I can assure you we are working on more DAX content that we’ll publish in the next months. Stay tuned!
I recently received a question about the memory required to run a Process
Defrag on a Tabular model in Analysis Services. The Process Defrag is useful
when you run incremental processing of a table frequently, or when some of the
values in the dictionary are no longer used in the table, for example if you
process the same partition multiple times in a table and/or remove partitions
from a table. Cathy Dumas wrote an interesting blog
post about the savings you can obtain by running process defrag.
I made some investigation and I’ve been assured that data is not completely
uncompressed in this process, even if some coding/encoding happens. In a rough
estimate, you need a free space equivalent to the size of the table you are
going to defrag (which is already in memory) plus buffer for transient data
structure, but data are managed in a compressed form, without requiring larger
memory buffers for uncompressed raw data. So, if all partitions of your table
and the column dictionaries require 100MB in RAM, then you need another 100MB of free RAM in order to
execute a Process Defrag.
Here is a more detailed description I received from Akshai Mirchandani:
In addition to the master copies, it requires enough memory for a new dictionary, and the final compressed data + some small temporary buffer space for the transient data structures (no big buffers like normal processing).
It is essentially going to read each column value, insert it into a new dictionary, get a new DataID back from the dictionary insert, and append that DataID to the current segment. It doesn’t need to do VertiPaq (compression) again, and it doesn’t keep the uncompressed data in buffers like the data processing algorithm does.
The important point here is that the analysis of the segment data to come up
with the best compression strategy no longer needs to be performed – and that’s
typically the most expensive step of the compression (the VERTIPAQ_STATE in
DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS shows whether this was done).
This is a good news if you are concerned with memory required to perform this operation.