THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Performance of running sum calculations in SP2

Since I started a series of blogs about AS specific changes in SP2, few people asked me to explain in more details some of the things covered in the SP2 list of changes. The most popular request so far was to explain performance optimizations for running sum calculations. I tried to delay that, because I am writing a article especially about running sums, but I decided not to wait until it is ready for two reasons. First, running sum can mean many different things - there are certainly many techniques how to do it. While I believe in my technique, other people believe in their techniques. Therefore the term "running sum calculations" is somewhat vague. The SP2 change helps some methods, but not the others. Secondly, the actual change in SP2 is in fact much broader then the narrow area of running sums. It's just that it is easiest to demonstrate it using some variation of running sum, but as we will see it helps in many other scenarios as well. Lastly, the article I am writing is getting kind of out of control, and I don't know when (or whether) I will finish it, so instead I can focus on something smaller.

For the purpose of the discussion, we will consider scenario where we need to sum up values from the very beginning of time up to the current moment. This scenario comes up a lot in inventory applications where we keep product movements (deltas) in the fact table, and need to sum up in order to get the inventory levels (although better design there is to take periodic snapshots and only sum up from the beginning of the snapshot - but this is subject for another discussion), or in the financial reports, where balance from one year rolls into the next year etc. In order for the reader to be able to replicate the results, we will use example off the Adventure Works cube. Normally, we would create an utility attribute (or utility dimension), but for simplicity, I will just use Week of Year attribute as if was utility attribute and put calculated members on it. The simplest and most straightforward expression for such running sum would be

Aggregate([Ship Date].[Date].[Date].MEMBERS.Item(0):[Ship Date].[Date].CurrentMember)

Note, that this expression only works correctly when the granularity in the [Ship Date] dimension is on the Date attribute. There are various ways how to ensure that this is always true, with the simplest and most natural one being setting the aggregation function of involved measures to be LastChild. But again, let's ignore these details, and just write our test queries to always have granularity on Date. The query that we are going to test this expression on should use a lot of cells to make measurements interesting. We will pick the following one:

WITH MEMBER [Ship Date].[Week of Year].RSum AS
Aggregate([Ship Date].[Date].[Date].MEMBERS.Item(0):[Ship Date].[Date].CurrentMember)
SELECT
[Ship Date].[Calendar].[Month].&[2003]&[8].children ON 0
,[Product].[Product Categories].[Subcategory].MEMBERS
*[Customer].[Country].[Country].MEMBERS
ON 1
FROM [Adventure Works]
WHERE ([Internet Sales Amount],RSum)

There are 6882 cells inside the cellset, but for each cell we aggregate anywhere from 762 to 792 cells, depending on the current member on Date. So there are total about 5.3 million cells being touched by this query. Running it on SP1 results in running time of 1 minute and 20 seconds. This is certainly slow. The throughput is only about 66000 cells per second. So, let's check out the promised improvements on SP2. Running the same query on SP2 now returns in 869 milliseconds or 6.6 million cells per second. This is about 100 times faster then SP1 and it is sure fast.

So let's talk about the nature of the change. In one of my earlier blogs, we discussed how query optimizer can use bulk evaluators if MDX function supports it. Up until SP2, the Aggregate function (also Sum etc) could produce the bulk evaluator, but only under condition that the set that they were aggregating was independent to the query space. I.e. Aggregate over constant set, like { January, February, March } would always go through fast code, because no matter what the user puts on axes, the set is always the same. In our example, though, the set is not constant. It is a range set, where the left side of range is constant (always the first date in the dimension), but the right side of the range depends on the current member in Date attribute. Query optimizer then would look into the query space, and if it sees that the Date attribute is constant in the query (i.e. not used at all, or used in the WHERE clause), then Date.CurrentMember is constant as well, and bulk evaluator could work. In such scenarios, Aggregate works extremely fast even in AS2005 RTM and SP1. However, in our example we use [2003].August.Children on COLUMNS, so the Date.CurrentMember is not constant anymore. In SP2, the bulk evaluator of Aggregate and related functions was significantly improved. It still cannot do magic in most generic case, but it became much more intelligent than before. It, in fact, realizes, that there is no need to recompute every cell from scratch. The next day needs exactly same values as the previous day, plus just one cell value more. By choosing smartly the order in which cells are computed, the Aggregate's bulk evaluator reduces the O(N^2) problem to O(N) problem, and this gives huge performance improvement.

P.S. If there are other items on the SP2 list which you are particularly interested in - let me know, and I will try to cover them in the following blogs as well, although I don't promise anything :)

Published Friday, November 17, 2006 9:53 PM by mosha
Filed under:
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement