THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alberto Ferrari

Dimensional modeling with Ranged Dimensions

A ranged dimension is a dimension that is used to have a discrete view of a continuous measure.

A good example of this is the analysis of amount sold per order. In AdventureWorks we have, for each line of an order, the amount and quantity sold. We would like to divide orders in three groups (HIGH, MEDIUM, LOW) based on the total of the order. This is a very frequent kind of analysis that is used to determine how much money customers are willing to spend in a single order and may be very useful to check whether there is any kind of relationship between the characteristics of a customer and his/her medium spending capability.

In SQL it is very easy to get this analysis:

WITH
    OrderTotals AS (
        SELECT 
            SalesOrderId = SalesOrderId, 
            OrderValue   = SUM (OrderQty * UnitPrice)
        FROM 
            Sales.SalesOrderDetail
        GROUP BY
            SalesOrderId),
    OrdersRanged AS (
        SELECT
            SalesOrderId,
            OrderValue,
            OrderRange = CASE
                WHEN OrderValue <= 1000  THEN 'LOW'
                WHEN OrderValue <= 10000 THEN 'MEDIUM'
                ELSE                          'HIGH'
            END
        FROM
            OrderTotals)
SELECT
    OrderRange  = OrderRange, 
    OrderNumber = COUNT (*),
    OrderValue  = SUM (OrderValue)
FROM
    OrdersRanged
GROUP BY
    OrderRange

 

The query, executed on AdventureWorks, leads to this result:

OrderRange OrderNumber OrderValue
HIGH 1,827 75,745,964.3559
MEDIUM 12,574 31,075,706.5634
LOW 17,064 3,552,218.3941

This SQL query is correct but what we really want to do is to create a new OrderRange dimension in our BI solution to let the user further analyze the characteristics of the orders. The granularity of this attribute is at the Order level and so we have to add an ID_ValueRange column in the fact table of orders.

If we define the Dim OrderRange dimension we will get something like it:

  • ID_OrderRange Primary Key, usually INT IDENTITY
  • MinimunValue Left value for the range
  • MaximunValue Right value for the range
  • OrderRange Textual description of the range

At the end of the processing, our relational database will look like this:

image

This solution is good when we have a very clear definition of what the ranges are during design time. This often does not happen in the real world. More likely, we will let the user define ranges through a table in the configuration database and we will define several attributes and hierarchies in the dimension to analyze how orders are ranged. All these attributes and hierarchies should be derived from the configuration database and processed during the ETL phase.

Ranges are dimensions that have a high variation rate over time and – normally – they can be defined only after the data is available because, before then, users do not have a clear understanding of what kind of ranges can be useful or not.

This leads to a very annoying problem i.e. the ranges can be determined only at the cube process time and not at ETL time. In other words we are not able to compute the ID_OrderRange key used to join the dimension to the fact table because the ranges change over time due to the user change of mind. Please note that in this case it is the dimension that changes its meaning, not the facts.

Computing the ranges during cube process time will lead us to define queries for the fact table that are complicated and uselessly consume processor time. So the challenge is to avoid this situation by means of changing the dimensional model.

The solution that we recommend is to define ranged dimensions with a different surrogate key. Even if we are not able to define the exact ranges, we can normally define the maximum granularity that will be used to define ranges. In the example the granularity can be 1,000, but it can be more cleverly set to 100 in order to gain some flexibility at the lowest level of the range. Supposing that 100,000 is the maximum value of an order (at least from the DSS analyst point of view), we will have a maximum range of values that starts at 100 and ends at 100,000 with a step of 100, leading to a range of only 1,000 different values. This will be the Dim OrderRange dimension.

Once we have defined the granularity we can redefine the Dim_OrderRange assigning to ID_OrderRange a simple INTEGER field that represents the value of FLOOR (OrderRange / 100). The same value, computed for each column of the order fact table, will be the ID used to join the fact table to the dimension.

The dimension will contain all the values from 0 to the maximum order value divided by the granularity and so it will be larger but still contain only a few thousand rows. In our example the table switched from 3 values to 1,000 but, from SSAS point of view, it is still a very small dimension. Moreover, as the distinct values of the attributes are still three, the aggregation will work very well leading to optimum performances.

The situation, with ranged dimensions, will be this:

image

Using the ranged dimension pattern you will be able to change all the attributes of the ranged dimension without having to worry about changes. The fact table will remain valid unless you decide to change the granularity.

Granularity is not subject to changes because it can be normally set to a value clearly defined by the user and there is very rarely the need of updating it.

Clearly ranged dimensions are a variation of standard dimension because their surrogate key has a clear meaning while the surrogate keys, in Kimball’s methodology, should not have any meaning at all. Nevertheless, we believe that in specific situation (and ranged dimension are among those) the “no meaning at all” constraint can be relaxed in order to get maximum flexibility in the final solution the customer will use.

Another very common situation where ranged dimension are useful is in the ranging of time duration. When handling time you can easily set the granularity to days, months, years or seconds, depending on the minimum and maximum values that the duration will have. Once done it you can compute the duration using the correct granularity and define any ranged dimension on it.

Published Thursday, September 06, 2007 9:55 AM by AlbertoFerrari

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

SQLBI - Marco Russo said:

Alberto posted one of the patterns we use designing a dimensional model (this one is about the discretization

September 6, 2007 4:48 AM
 

Sqlgoof said:

This is a nice approach when customers are unsure of their "ranges", but i find normally they have very clear ideas on what they are. Still, good for exploratory BI.

September 10, 2007 1:39 PM
 

ponzie said:

We use something very similar.

What we additionally do is have multiple ranges defined in the ranged dimension but all keyed with the same surrogate key.

Thus we might have the value 9 in the fact keyed to a row with our first range attribute set to '£0 to £10' and to a second range attribute to '£5 to £10'.

Its early days yet, but we think this will allow us to use the ranged dimension in several roles aand support several ranges.  If we decide we want different ranges defined, we add an extra column to the ranges dimension.

The main limitation is getting the granularity reasonable at the start.  Changing the grain would require rebuilding the facts to pick up the revised surrogate keys.

We also have applied the same technique to durations.

February 14, 2008 3:40 AM

Leave a Comment

(required) 
(required) 
Submit

About AlbertoFerrari

Alberto Ferrari is a Business Intelligence consultant. He his interests lie in two main areas: BI development lifecycle methodologies and performance tuning of ETL and SQL code. His main activities are with SSIS and SSAS for the banking, manufacturing and statistical sectors. He is also a speaker in international conferences like European PASS Conference and PASS Summit.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement