THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Inside OLAP Engine: Cache Prefetching

I am writing a whitepaper about Analysis Services 2005 Cache Manager, and it gets longer and longer. I already spent few days on it, and I am still not even halfway done. So I decided to publish an excerpt from this whitepaper (chapter 2.1.5) and gauge whether I am getting it at the right technical level or whether I am going too deep or too shallow. The timely comments will be appreciated.

As we learned by now, the FE query optimizer decomposes the MDX query into multiple single granularity subcubes. Depending on which cell or group of cells are being computed at the moment, FE will choose the appropriate subcube to send request for data to SE. However, since the query optimizer has the global picture of the entire query, it can tell which cells will be requested in the future. This knowledge is used by the Cache Manager to reorder the subcube requests in order to take advantage of datacache filtering and aggregation described in the previous chapter. This is called cache prefetching Cache prefetching is best illustrated by example. Let’s consider the following MDX query:

SELECT Descendants(USA, City, SELF_AND_BEFORE) ON 0 FROM Sales

I.e. this query asks for USA, all the states in USA and all the cities in USA. If we were to send requests in the order in which cells appear in the query result, then the first request will be for USA only – since there is nothing in the cache that can satisfy it, it will go to the storage (MOLAP or ROLAP). Next, there will be states in USA, and again cache cannot satisfy it, so again it will go to the storage. And at the end it has to get all the cities in USA, which again will miss the cache and hit the storage. Essentially, the pattern of cache/data access here looks the same as for the following three single grain queries:

SELECT Customers.USA ON 0 FROM Sales
SELECT Descendants(Customers.USA, Customers.[State Province], SELF) ON 0 FROM Sales
SELECT Descendants(Customers.USA, Customers.[City], SELF) ON 0 FROM Sales

It is very useful to be able to trace cache access patterns, and it is possible to do with Profiler. Below is a screenshot of the Profiler after executing the above three queries:

The interesting event to watch here is "Query Subcube" – this event is generated every time FE needs to interact with SE. The TextData column contains the bitmap which represents subcube granularity (remember – FE/SE interface happens through single grain subcube). When the bit is 1 – it means that corresponding attribute has granularity, and when the bit is 0 – it means that corresponding attribute doesn’t have granularity. The order of attributes corresponds to the order of dimensions and attributes inside measure group. In our simple cube – there are only 5 attributes in the order Country, [State Province], City, Customer and Gender. The EventSubclass column tells us whether or not we hit the cache. Now, when we execute

SELECT Customers.USA ON 0 FROM Sales

The subcube granularity is 10000 – i.e. we only have granularity on countries and nothing else. There is nothing else in the cache, so obviously we don’t hit it. Next query

SELECT Descendants(Customers.USA, Customers.[State Province], SELF) ON 0 FROM Sales

Has granularity on both country and state – therefore subcube grain is 11000, but we still miss the cache. Finally, the last query

SELECT Descendants(Customers.USA, Customers.[City], SELF) ON 0 FROM Sales

Has grain on all three attributes – country, state and city – 11100, and we miss the cache again, because cities cannot be derived from states or country. So total 3 cache misses and 0 cache hits.

However, our original query contained all three levels of the hierarchy, and the query optimizer knows that three SE subcube requests will have to happen to fully satisfy the entire query. Therefore it decides to reorder the requests to take advantage of the cache system. Let’s take a look at Profiler screenshot for this query

Here we see cache prefetching in action. First "Query Subcube" don’t go for country granularity, even though USA is the first cell in the query result. Cache Manager inserts request to the country,state,city granularity instead – 11100. This first request misses the cache (the cube was reprocessed to force cache cleanup). Obviously, we need to go to storage at least once. However, all the subsequent requests for 10000 and 11000 are going to benefit, since they can be aggregated from cache of 11100 – and we see that EvenSubcube value for them shows that indeed cache was hit.

Watching Profiler carefully, can often reveal important information about cache behavior and therefore about query performance.

Published Thursday, January 5, 2006 4:02 PM by mosha
Filed under:
Anonymous comments are disabled
Privacy Statement