THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Performance impact: Cartesian product comes to the rescue

Although Cartesian product as a concept is essential to the relational database theory, it is often a dirty phrase – something that is associated with bad performance and therefore should be avoided. But there are cases where a Cartesian product is highly desirable. This post shows you an example query—from the real world although presented here with artificial data—that can be significantly optimized with a manually created Cartesian product.

Here is the query in question:

select id_server_config,
      
id_thread_name,
      
id_tran_type,
      
id_tran_status,
      
id_users,
      
id_cpus,
      
COUNT(*) as cnt
from tran_stats t1
    
join Dim_server_config t2 on t1.server_config = t2.server_config
    
join Dim_thread_name t3 on t1.thread_name = t3.thread_name
    
join Dim_tran_type t4 on t1.tran_type = t4.tran_type
    
join Dim_tran_status t5 on t1.tran_status = t5.tran_status
    
join Dim_users t6 on t1.users = t6.users
    
join Dim_cpus t7 on t1.cpus = t7.cpus
group by id_server_config,
        
id_thread_name,
        
id_tran_type,
        
id_tran_status,
        
id_users,
        
id_cpus

This query joins a large table called tran_stats with six small lookup tables (or dimension tables). The dimension tables have dimension id’s and their corresponding values. The large table—which stores transaction statistics—is sort of a fact table but include the actual dimension values instead of dimension id’s. There is no index on any of these six dimension columns in the large tran_stats table. The following is a summary of the cardinalities of these tables:

Table Name

Row Count

Comment

tran_stats

200,000,000

The table is not very wide, and the total size of the table is about 36GB

Dim_server_config

9

Corresponding to the 9 distinctive values in the tran_stats.server_config column

Dim_thread_name

100

Corresponding to the 100 distinctive values in the tran_stats.thread_name column

Dim_tran_type

3

Corresponding to the 3 distinctive values in the tran_stats.tran_type column

Dim_tran_status

3

Corresponding to the 3 distinctive values in the tran_stats.tran_status column

Dim_users

5

Corresponding to the 5 distinctive values in the tran_stats.users column

Dim_cpus

8

Corresponding to the 3 distinctive values in the tran_stats.cpus column

In this post, I’ll remove query parallelism from consideration. I’ll cover how parallelism contributes to the discussion in a follow-up post. So for now let’s assume that max degree of parallelism has been set to 1.

When I run the above query on my test server, it comes back in about 740 seconds (or just over 12 minutes) with all the data already cached. The query plan is a series of hash matches shaped like a straight line going from the lower right corner to the upper left corner:

hash_match_cascades

Question: Can we do better than this?

Answer: Yes! An approach is to create and materialize the Cartesian product of all the dimension first, and then join the large table with this Cartesian product.

On the test server, the following Cartesian product query takes about 1 second and populates #tmp with 324,000 rows (that is 9*100*3*3*5*8).

select * into #tmp
 
from Dim_server_config,
      
Dim_thread_name,
      
Dim_tran_type,
      
Dim_tran_status,
      
Dim_users,
      
Dim_cpus

Then, the following join takes about 248 seconds to complete.

select id_server_config,
          id_thread_name,
          id_tran_type,
          id_tran_status,
          id_users,
          id_cpus,
          COUNT(*) as cnt
  from tran_stats t1 join #tmp t2
        
on t1.server_config = t2.server_config and
           
t1.thread_name = t2.thread_name and
           
t1.tran_type = t2.tran_type and
           
t1.tran_status = t2.tran_status and
           
t1.users = t2.users and
           
t1.cpus = t2.cpus
group by id_server_config,
         
id_thread_name,
         
id_tran_type,
         
id_tran_status,
         
id_users,
         
id_cpus

The query plan is a single hash match:

hash_match_single

To help drive home the performance difference between these two queries, let me re-present their elapsed times in the following bar chart:

hash_match_multi_vs_single

In this particular scenario, using a materialized Cartesian product results in a query that is three times faster than the query that directly joins with each of the lookup tables.

I have uploaded the DDL scripts and the script for loading the data into the large table and into each of the dimension tables. You can download and run the attached scripts and check out the performance difference for yourself.

Published Thursday, December 15, 2011 2:11 AM by Linchi Shea

Attachment(s): cartesian_product_DDL_load_scripts.zip

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

GrumpyOldDBA said:

Very interesting, when tuning I often I use temp tables to replace parts of joins to achieve similar performance improvements - this is something I'd not considered and I will certainly look into it. Just to explain although I'm not working on a DW, many queries I attempt to tune contain over 16 joins, many of these are "business rules" , small tables joining against much larger tables - your approach may well work wonders - cool!

December 15, 2011 8:11 AM
 

Whipper Snapper said:

Minor complaint.  Your chart doesn't compare the total elapsed times, which should be 740 versus 249 (248+1) to include populating the temp table.

Cool concept though and brings into question when a star-schema is appropriate or not.  It could be interesting to see where the star-schema makes more sense with lots of WHERE criteria on dimension fields, which should be appropriately indexed.

December 15, 2011 8:51 AM
 

Wiseman82 said:

Interesting idea. :-)  The point of the article was to demonstrate the use of a cartesian product, but my instinct from a performance point of view would be to aggregate the data in the fact table first then join to the dimension tables (assuming I'm not allowed to make any schema modifications).

This approach seems to work better for me on a small scale test (I didn't have the patience to load the full data set and I was running it on a slow client PC):

select server_config,

      thread_name,

      tran_type,

      tran_status,

      users,

      cpus,

      COUNT(*) as cnt

INTO #T

from tran_stats t1

group by server_config,

        thread_name,

        tran_type,

        tran_status,

        users,

        cpus

SELECT id_server_config,

         id_thread_name,

         id_tran_type,

         id_tran_status,

         id_users,

         id_cpus,

         SUM(cnt) as cnt

FROM #T T1    

join Dim_server_config t2 on t1.server_config = t2.server_config

join Dim_thread_name t3 on t1.thread_name = t3.thread_name

join Dim_tran_type t4 on t1.tran_type = t4.tran_type

join Dim_tran_status t5 on t1.tran_status = t5.tran_status

join Dim_users t6 on t1.users = t6.users

join Dim_cpus t7 on t1.cpus = t7.cpus

GROUP BY id_server_config,

         id_thread_name,

         id_tran_type,

         id_tran_status,

         id_users,

         id_cpus

drop table #t

Cartesian products can be useful.  They are usually a problem when they are unintentional...

December 15, 2011 11:35 AM
 

Wiseman82 said:

Interesting idea. :-)  The point of the article was to demonstrate the use of a cartesian product, but my instinct from a performance point of view would be to aggregate the data in the fact table first then join to the dimension tables (assuming I'm not allowed to make any schema modifications).

This approach seems to work better for me on a small scale test (I didn't have the patience to load the full data set and I was running it on a slow client PC):

select server_config,

      thread_name,

      tran_type,

      tran_status,

      users,

      cpus,

      COUNT(*) as cnt

INTO #T

from tran_stats t1

group by server_config,

        thread_name,

        tran_type,

        tran_status,

        users,

        cpus

SELECT id_server_config,

         id_thread_name,

         id_tran_type,

         id_tran_status,

         id_users,

         id_cpus,

         SUM(cnt) as cnt

FROM #T T1    

join Dim_server_config t2 on t1.server_config = t2.server_config

join Dim_thread_name t3 on t1.thread_name = t3.thread_name

join Dim_tran_type t4 on t1.tran_type = t4.tran_type

join Dim_tran_status t5 on t1.tran_status = t5.tran_status

join Dim_users t6 on t1.users = t6.users

join Dim_cpus t7 on t1.cpus = t7.cpus

GROUP BY id_server_config,

         id_thread_name,

         id_tran_type,

         id_tran_status,

         id_users,

         id_cpus

drop table #t

Cartesian products can be useful.  They are usually a problem when they are unintentional...

December 15, 2011 11:37 AM
 

Jason Kohlhoff said:

Linchi,

I've looked at your create table script.  None of your tables have Clustered Indexes (PKs), and your tran_stats table does not have any Nonclustered Indexes on its foreign key columns.

I agree that a Cartesian product is actually useful (i.e. the "Report Writer's Magic Formula"), but all that you've successfully proven here is that heap (non-indexed) tables are a lot slower than indexed tables because of all the resulting tables scans and hash matches.

December 15, 2011 2:50 PM
 

Linchi Shea said:

Hi Jason Kohlhoff;

The large table has a clustered index. The following statement is in one of the scripts:

create clustered index cix_tran_stats on tran_stats(tran_begin)

Correct, none of the dimension columns has an index. Not that the selectivities on these columns are extremely low for any index to be useful. The real world scenario that triggered this sample query is a very similar query that replaces the dimension values into their id's in the large table.

BTW, I actually tested with the large table being a heap, and that did not change the observation.

December 15, 2011 3:40 PM
 

Linchi Shea said:

For the specific scenario discussed in my previous post ,  we observed that using an intermediate

December 15, 2011 5:14 PM
 

Linchi Shea said:

In commenting on my previous post , Greg Linwood and GrumpyOldDBA raised questions about various implications

December 16, 2011 12:24 PM
 

Alain Krikilion said:

Really interesting post.

But I have some questions.

1) Can creating a clustered index on the temptable speed the query up even more? I can't test it because I don't have the necessary hardware for it.

2) Can using a table variable instead of a temptable speed up the query. (I guess that, as always, "it depends" on the size of the resulting table).

3) Can using a CTE instead of a temptable speed up the query.

December 21, 2011 3:02 AM
 

Linchi Shea said:

There are a lot of questions on hyperthreading, but not a lot of answers.  There is no shortage

January 5, 2012 12:36 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement