Although Cartesian product as a concept is essential to the relational database theory, it is often a dirty phrase – something that is associated with bad performance and therefore should be avoided. But there are cases where a Cartesian product is highly desirable. This post shows you an example query—from the real world although presented here with artificial data—that can be significantly optimized with a manually created Cartesian product.
Here is the query in question:
select id_server_config,
id_thread_name,
id_tran_type,
id_tran_status,
id_users,
id_cpus,
COUNT(*) as cnt
from tran_stats t1
join Dim_server_config t2 on t1.server_config = t2.server_config
join Dim_thread_name t3 on t1.thread_name = t3.thread_name
join Dim_tran_type t4 on t1.tran_type = t4.tran_type
join Dim_tran_status t5 on t1.tran_status = t5.tran_status
join Dim_users t6 on t1.users = t6.users
join Dim_cpus t7 on t1.cpus = t7.cpus
group by id_server_config,
id_thread_name,
id_tran_type,
id_tran_status,
id_users,
id_cpus
This query joins a large table called tran_stats with six small lookup tables (or dimension tables). The dimension tables have dimension id’s and their corresponding values. The large table—which stores transaction statistics—is sort of a fact table but include the actual dimension values instead of dimension id’s. There is no index on any of these six dimension columns in the large tran_stats table. The following is a summary of the cardinalities of these tables:
|
Table Name |
Row Count |
Comment |
|
tran_stats |
200,000,000 |
The table is not very wide, and the total size of the table is about 36GB |
|
Dim_server_config |
9 |
Corresponding to the 9 distinctive values in the tran_stats.server_config column |
|
Dim_thread_name |
100 |
Corresponding to the 100 distinctive values in the tran_stats.thread_name column |
|
Dim_tran_type |
3 |
Corresponding to the 3 distinctive values in the tran_stats.tran_type column |
|
Dim_tran_status |
3 |
Corresponding to the 3 distinctive values in the tran_stats.tran_status column |
|
Dim_users |
5 |
Corresponding to the 5 distinctive values in the tran_stats.users column |
|
Dim_cpus |
8 |
Corresponding to the 3 distinctive values in the tran_stats.cpus column |
In this post, I’ll remove query parallelism from consideration. I’ll cover how parallelism contributes to the discussion in a follow-up post. So for now let’s assume that max degree of parallelism has been set to 1.
When I run the above query on my test server, it comes back in about 740 seconds (or just over 12 minutes) with all the data already cached. The query plan is a series of hash matches shaped like a straight line going from the lower right corner to the upper left corner:
Question: Can we do better than this?
Answer: Yes! An approach is to create and materialize the Cartesian product of all the dimension first, and then join the large table with this Cartesian product.
On the test server, the following Cartesian product query takes about 1 second and populates #tmp with 324,000 rows (that is 9*100*3*3*5*8).
select * into #tmp
from Dim_server_config,
Dim_thread_name,
Dim_tran_type,
Dim_tran_status,
Dim_users,
Dim_cpus
Then, the following join takes about 248 seconds to complete.
select id_server_config,
id_thread_name,
id_tran_type,
id_tran_status,
id_users,
id_cpus,
COUNT(*) as cnt
from tran_stats t1 join #tmp t2
on t1.server_config = t2.server_config and
t1.thread_name = t2.thread_name and
t1.tran_type = t2.tran_type and
t1.tran_status = t2.tran_status and
t1.users = t2.users and
t1.cpus = t2.cpus
group by id_server_config,
id_thread_name,
id_tran_type,
id_tran_status,
id_users,
id_cpus
The query plan is a single hash match:
To help drive home the performance difference between these two queries, let me re-present their elapsed times in the following bar chart:
In this particular scenario, using a materialized Cartesian product results in a query that is three times faster than the query that directly joins with each of the lookup tables.
I have uploaded the DDL scripts and the script for loading the data into the large table and into each of the dimension tables. You can download and run the attached scripts and check out the performance difference for yourself.