THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Inside OLAP Engine: SE cache sharing

After publishing the excerpt from my upcoming whitepaper on AS caching, I received a lot of feedback, both in blog comments and in email. I want to thank everybody who took time to read and assess that piece. Honestly, I was a little bit afraid, because I know that I have tendency to overcomplicate and confuse people when I explain something, but so far the feedback indicated that the technical level was OK, and everybody felt comfortable with the content. Several people asked if they can get preview access to the whitepaper. I, of course, would've done it, but the biggest and most important part of it - namely how to write cache friendly calculations - is not yet written. Everything else is an interesting knowledge, but that part are concrete steps that people can do in order to improve performance by writing their calculations in such a way that they take most benefit of the caching system. While I have a good idea about what I should put there, it is somewhat more difficult material, and I struggle with writing it up. However, I thought that I could share another excerpt here. It seems that readers of my blog are hardcode AS users, and releasing pieces in the blog would be mutually beneficial - readers may find this material useful and I will get free peer review. So in a sense, publishing this staff in the blog is my public "Beta" of the selected material from the whitepaper. This week I decided to publish portion about SE cache sharing. One more note before I start. Some people expressed hope that the acronyms and special terms that are used in the excerpts to be explained. And indeed, the whitepaper will explain all of them. For the purpose of this (and previous piece), SE means Storage Engine - component that deals with processed raw data before calculations apply. FE means Formula Engine - component which deals with all the calculations.

3. Cache sharing

Sharing the caches across multiple users is one of the most obvious benefits with doing all the computations centrally on the server. If first user has to wait for the data in his report to be computed, all the other users who open same or similar reports will experience much faster report load times. However, it is not always safe to share the caches across users, especially in the presence of security. In this chapter we will carefully review the conditions under which caches can and cannot be shared.

3.1 SE caches sharing

The rule for SE caches sharing is very simple: SE caches are always shared across all the users. This seems to be good to be true, but it is actually true! We will show how it is possible to deal with security and still share the cache. First of all, cell security is not applicable to SE, since it is done as post operation – i.e. SE caches may contain cells which are cell secured, but if such cells are needed by MDX expression, they will be marked as errors. (For more details about order of operations in the cube see my other article "Default members, MDX Scripts, Security, KPIs and Perspectives"). To understand how SE can handle shared caches in the presence of non trivial dimension security, we need to consider two impacts of dimension security:

  • Restricting the secured members
  • Applying visual totals

3.1.1 SE caches and restricting secured members

This part is fairly simple. It is similar to the way that datacache filtering (described in 2.1.2) works. The request to the datacache registry ignores dimension security restrictions, but then the attribute filtering is applied to the result datacache.

3.1.2 SE caches and dimension security visual totals

Making SE caching to account for dimension security visual totals is a little bit more complex. Indeed, it seems almost impossible to share caches, when for the same cell some users should see one value and other users see another value. Yet it is possible – and we are going to see how. But since it is somewhat tricky, it is best to illustrate this by example and doing some more tracing in profiler. We will again use stripped down version of Foodmart, with Sales cube containing only Customer dimension. There is one role with AttributePermission set on [State Province] attribute, allowing only WA and CA states, i.e. AllowedSet = { [Customer].[State Province].[WA], [Customer].[State Province].[CA] }, and we will enable VisualTotals=true on it. Now, we would send the following query

SELECT [Customer].[Country].[USA] ON 0 from Sales

First, we send it from the Administrator account, such that no security applies to it, and we get the real total for USA, and second time we will send it from the account which is member of the restricted role, and therefore it will have visual total value for USA (i.e. total of WA and CA only). We are still most interested in the Query Subcube event, but to get the information which demonstrates visual totals handling, we will need to use another variant of that event called Query Subcube Verbose. Here is how the picture looks in profiler for it:

The information in Query Subcube Verbose event is a superset of what is available in Query Subcube event, but requires more skills to understand. We will limit discussion here only to the forms which are applicable for this discussion. First query (from connection 9) is query by Administrator. The interesting information is from dimension Customer of course. Unlike with Query Subcube event we don’t see bitmap of granularities but rather some symbolic representation of both slices and granularities. The string "2 0 0 0 0" shows slices on every attribute. Number 2 is internal data id corresponding to USA (in fact when we look into individual attributes later, it translates data ids and shows [Country]:[USA]). Data id is really internal thing, and should not be attributed any special meaning. You can even get data ids of attribute members using MDX function DataId, but this is internal undocumented function, so again, you cannot really rely on it. What is important here is that Country has slice of USA, and all other attributes don’t have neither slice nor granularity. If some attribute had granularity, but not a slice, then it would show as * instead of 0. For example

SELECT [Customer].[Customers].USA.Children ON 0 FROM Sales

Will show "2 * 0 0 0" to represent that [State Province] attribute has granularity, but no slice. If the slice is not a single member but set of members, it will show in Query Subcube Verbose event as +, and it is not possible to see the members of the set. For example

SELECT { [Customer].[City].[Redmond], [Customer].[City].[Seattle] } ON 0 FROM Sales

Will show "0 0 + 0 0" to indicate that City attribute has granularity and slice contains a set (the set of {Redmond, Seattle} that is).

Now let’s go back to our Administrator query. It gives "2 0 0 0 0" for the subcube, and it says that this wasn’t answered from the cache, since the cache was initially empty. Now we connect as a user of the restricted role (for example we can do it by specifying connection string parameter "EffectiveUserName=REDMOND\billg" to emulate Bill Gates – and, of course, only server administrator can do it). The query is the same, but let’s look more carefully into what we get for the subcube. It says "2 – 0 0 0". This is pretty interesting. We see that the only difference from the Administrator query is that [State Province] attribute has "-" sign instead of 0. In the AS jargon, "-" means slice below granularity. Let’s explain slowly what it means. First for the "below granularity" part. Obviously, in our query we don’t have granularity on [State Province]. This can be easily verified by looking at the bitmap in Query Subcube event too. However, even though we don’t have granularity, this attribute has a slice. The slice comes from the dimension security definitions – it is { WA, CA } set. Therefore we end up with slice without granularity. Having slice in the subcube causes data to be filtered, and not having granularity causes data to be aggregated. Therefore we end up with value for USA which is sum of only WA and CA – exactly what we need for visual totals functionality. Note, that this request too wasn’t served from cache. And this is right – we get different data, and this data couldn’t have been derived from the real total of USA, therefore we needed to go to the storage. And the way cache system knows to distinguish between real total for USA and visual total for USA is by comparing the slice on [State Province] attribute. Of course, there could be different slices on this attribute for different roles. All of them will show as "-" in the profiler, but internally AS knows how to distinguish between them. The concept of slice below granularity is an important one for AS. It is used not just for dimension security visual totals, but also for features such as subselects, sets in WHERE clause, multiselect over distinct count, many-to-many dimensions and semiadditive measures and for other features too. We will talk about this more in the chapter about cache lifetime. However, we need to emphasize again, that such datacaches are inserted into datacache registry and are shared across all users. As a proof, we can now start another Administrator session, and send the following query

SELECT [Customer].[Country].[USA] ON 0 FROM
(SELECT { [Customer].[State Province].[WA], [Customer].[State Province].[CA] } ON 0 FROM Sales)

We will see the following in the trace:

I.e. this query also generated slice below granularity subcube "2 – 0 0 0", but this time it was found in cache, because it was previously requested by dimension security visual totals and cached.

The conclusion is that when calculations are not involved and there is only SE data to worry about – the data will be shared across all users unconditionally. This is great news since it has very significant performance impact in heavy multiuser setups.

When there are calculations, the picture is more complicated and calculation designer has implicit control over how wide the caches will be shared. We are going to dive into details in the next chapter about FE cache sharing, as well as in the section about cache lifetime.

Published Thursday, January 12, 2006 1:16 AM by mosha
Filed under:
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement