THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

TPC-H and Columnstore (Update)

Earlier I had commented on the TPC-H results published in April of this year for SQL Server 2014 using clustered column store storage, noting that two of the 22 TPC-H queries did not perform well in column store. I had speculated on the reason without investigation (I should have learned by now not to do this), that perhaps the cause was that the row store result benefited from date correlation optimization. Thomas suggested otherwise (see below) pointing out that column store has an alternative mechanism of greater general usefulness in the keeping min/max on each columns, along with citing the join to Customers as a more likely explanation, evident in the query plan (which is why one should always provide the plan).

Thomas Kejser Comments
I am not sure your theory is correct in the case of Q10. It is noteworthy that the column store requires that the join with CUSTOMER is performance before the sort on revenue. The row store on the other hand can do a loop join (so can the column store, but that is not the plan you get it seem).

This must mean that the sort buffer is significantly larger for the column store (as reflected in the plan estimates) - which in turn can cause a rather significant memory consumption. It is also noteworthy that the column store does not seem to push the return flag predicate into the storage engine.

With column storage segments storing the min/max of all values contain in each column, it is unclear if the date correlation provides any benefit that isn't already gain from the segment header.

Another odd thing about the column store plan of Q10 is that the join of LINEITEM/ORDER is hashed, while the probe happens on CUSTOMER. Unless the predicate on RETURNFLAG is very selective (I don't recall) this is the wrong way around and may cause further spilling

This is easy enough to test. First strategy is to remove the CUSTOMER and NATION tables from the query, making it a pure Below is the test version of Query 10.

SELECT TOP 20 O_CUSTKEY, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE
FROM ORDERS, LINEITEM
WHERE L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE>= '1993-10-01' AND
O_ORDERDATE < dateadd(mm, 3, cast('1993-10-01' as date)) AND
L_RETURNFLAG = 'R'
GROUP BY O_CUSTKEY
ORDER BY REVENUE DESC

The execution plan for the test query is below with row-store. (Top operation to the left not shown for compactness).

tpcE

The execution plan for the test query is below with column-store. (Top operation also not shown).

tpcE

The two plans are essentially the same, with the difference being that the column-store plans applies the ReturnFlag as a filter operation instead of as a predicate in the LINEITEM access. I suppose this is because each column is stored separately, or perhaps this is just the way the column-store plan is shown. About 25% of rows in LINEITEM meet the ReturnFlag = R condition.

On my 1 socket 4 core, HT enabled test system, at Scale Factor 10 (SF10), the SQL Server query execution statistics for the original version of Q10 are:
Row Store CPU time =  5876 ms, elapsed time =  814ms
Col Store  CPU time = 10826 ms, elapsed time = 1758ms

This is somewhat in-line with the 3 official TPC-H reports at SF 1000, 3000 and 10000 (1, 3 and 10TB) compared against different systems and SQL Server 2012 or earlier.

For just the core ORDER - LINEITEM query
Row Store CPU time = 5248 ms, elapsed time = 769ms
Col Store   CPU time = 4030 ms, elapsed time = 565ms

So it is clear that TK had the correct explanation for the poor column store performance relative to row store in the case of Q10. The counter test for my original suggestion, is to explicitly apply the date range on LINEITEM discovered by the date correlation optimization in row-store, L_SHIPDATE between 1993-09-20 and 1994-06-17. As pointed out earlier, the actual ship date range 1993-10-02 and 1994-05-02. This further improved the Columnstore result to
Col Store   CPU time = 2686 ms, elapsed time = 403ms

This is a small improvement over the existing Columnstore min/max feature. My thinking is that the row store date correlation feature is not particularly useful in real world databases with highly irregular date correlation, and that if such date correlation did exist, the analyst should spell it out rather than depend on a database engine feature. I am tempted to speculate that it might be better to partition on join columns instead of date range, but perhaps I should not do so without investigation? unless of course, this prompts someone else to do the investigation.

Now that we know were the problem occurred in Q10, we can attempt to rewrite the query to avoid the error, as shown below.

SELECT C_CUSTKEY, C_NAME, REVENUE, C_ACCTBAL, N_NAME, C_ADDRESS, C_PHONE, C_COMMENT
FROM (
 SELECT TOP 20 O_CUSTKEY, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE
 FROM CUSTOMER, ORDERS, LINEITEM, NATION
 WHERE L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE>= '1993-10-01' AND
 O_ORDERDATE < dateadd(mm, 3, cast('1993-10-01' as date)) AND
 L_RETURNFLAG = 'R'
 GROUP BY O_CUSTKEY
 ORDER BY REVENUE DESC
) x
JOIN CUSTOMER ON C_CUSTKEY = O_CUSTKEY
JOIN NATION ON C_NATIONKEY = N_NATIONKEY
ORDER BY REVENUE DESC

The alternate query improved both the row and column-store query plans in pushing out the join to Customers and Nation to after the Top clause. The row-store plans is:

tpcE tpcE

The column-store plan is below.

tpcE tpcE

The impact is minimal in the row-store plan because the reduced number of index seeks for Customers from 115 to 20 is small in the overall query. For the column store plan, the performance retains most of the gains achieved in the Order-Lineitem only test.
Col Store   CPU time = 4218 ms, elapsed time = 598ms

In my test system, I have Q4 as 3 times faster with column-store over row-store, so I do not know why the published reports have it as comparable or slower.

ps
columnstore is a very powerful. but the query optimizer is as mature as row store. So pay attention to the query plan.
this is an update to my previous post on this topic, not about updateable columnstore, which is updateable.

Published Saturday, July 05, 2014 4:37 PM by jchang

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Thomas Kejser said:

Hi Joe

Great follow up investigation. I have a few more comments to attach to the results:

First, I think the column store might have an implementation problem with the filter on RETURNFLAG. I would expect this to be pushed down into the column store scan. Perhaps it is worth trying a filter on an integer column instead or playing with the statistics to see if you can get the benefit of the filter

Second, I suspect the reason your Q4 is faster is because you may have a clustered row store index on CUSTOMER, while the TPC-H test requires only one index (a column store). This may be a compromise they had to do to gain performance in other queries. This is one of the quarrels I have with TPC-H - it does not represent what you would do in real life.

Third, with RETURNFLAG being quite selective (25% is good for a column store) - I am wondering if pre-sorting the data (on ORDERDATE, RETURNFLAG) before building the column store might yield better predicate filtering

Joe, if you have a test system with this data, I would like to run a small experiment together with you. I am most curious to know how big a difference we could make if we compared the standard MS benchmark result with a our own hand tuned (query hints, indexes etc) variant of the same. If you have time to do this together, please do look me up on Skype.

July 6, 2014 6:14 AM
 

jchang said:

by my calculation, a 3X improvement in one of the 22 TPC-H queries (plus 2 refresh statements) contributes a 4.7% improvement in the Power test X^(1/24)

If two queries can be improved by 3X each (Q10 and Q4) then that's a 9.6% gain on Power.

July 6, 2014 8:52 PM
 

jchang said:

note to companies publishing TPC-H results on SQL Server. It might be wise to take up TK's offer. It would be hard to find better quality advise. I would be interested too. It would be even helpful if you provided the actual sqlplan files, if not generally, then just to us.

July 7, 2014 8:40 AM

Leave a Comment

(required) 
(required) 
Submit

About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement