Three TPC-H benchmark results were published in April of this year at SQL Server 2014 launch,
where the new updateable columnstore feature was used.
SQL Server 2012 had non-updateable columnstore that required the base table to exist in rowstore form.
This was not used in the one published TPC-H benchmark result on SQL Server 2012,
which includes two refresh stored procedures, one inserting rows, the second deleting rows.
It is possible that the TPC-H rules do not allow a view to union two tables?
and perhaps a delete via the partitioning feature?
(meaning the delete range must match the partition boundaries).
Another possibility is that SQL Server 2012 columnstore was considered to be a multi-column index
which is also prohibited to reflect the principle of being ad-hoc queries.
SQL Server Columnstore
First a few quick words on SQL Server columnstore.
Columnstore is not actually an index of the b-tree index form.
The MSDN Columnstore Indexes Described
states that columnstore index is "a technology for storing, retrieving and managing data by using a columnar data format,
called a columnstore."
In SQL Server 2012, it is called a nonclustered columnstore index not because it is nonclustered or an index,
but because the base table must exist in traditional rowstore form.
In SQL Server 2014, there is a clustered columnstore index not because data is stored in order of the index key,
as there is no key, but rather that there is no rowstore table, just the columnstore.
TPC-H details Date Columns and Query SARGs
The full details of the TPC-H Decision Support benchmark are described on the
There are a few details of relevance to the use columnstore.
The largest table in TPC-H is LINEITEM, which has 3 date columns, ShipDate, ReceiptDate and CommitDate,
and is clustered on ShipDate.
The second largest table is ORDERS, clustered on the one date column OrderDate.
These two tables are joined on OrderKey.
There is correlation between values in the date columns in these two tables,
some natural, and others based on reasonable business conditions.
ShipDate must be greater than OrderDate obviously, and is also no more than 121 days greater
than OrderDate per benchmark specification built into the data generator.
CommitDate is between -89 and 91 days of ShipDate.
ReceiptDate is between 1 to 30 days after ShipDate.
The date values ranges from Jan 1992 to Dec 1998.
There are 22 Select queries in the TPC-H benchmark, along with the 2 refresh stored procedures.
Many of the Select queries specify a date range on one of the date columns
or a lower bound one date column and an upper bound on a different column.
Ideally, for queries that target rows in a limited date range,
we would like to have upper and lower bounds on for the cluster keys
on both the ORDERS and LINEITEM tables, OrderDate and ShipDate.
However the TPC-H rules do not permit re-writing the query SARGs
based on inferable knowledge.
That said, apparently the rules do not preclude the query optimizer
from discovering such knowledge.
One of the other RDBMSs was probably first to do this,
and Microsoft followed suit in order to be competitive in the TPC-H benchmark
with the Date Correlation Optimization
feature in 2005.
Personally, I am not aware of any production server using this feature.
Realistically, any organization that was having query performance issues
related to date range bounds would probably have directed the
analyst to apply appropriate date bounds on the cluster key.
This is most probably a benchmark specific optimization feature.
The date correlation optimization statistics do not exist when using clustered columnstore,
because there is no underlying rowstore table with foreign key relations?
The date correlation statistics do exist when using rowstore tables with foreign keys
and are used by nonclustered columnstore indexes?
TPC-H on SQL Server 2014 with Columnstore
That said, let us now look at the 3 new SQL Server 2014 TPC-H results published
making use of the new clustered columnstore indexes.
One is from IBM at Scale Factor 1000 (1TB)
and two from HP at 3TB and 10TB respectively.
The new results are compared to prior results with traditional rowstore
on previous versions of SQL Server and previous generation server systems.
Because Columnstore is not really an index in the b-tree sense,
given that queries frequently involve date ranges,
it is presumed to be important to use partitioning with Columnstore.
The three new TPC-H reports on SQL Server 2014 partition both ORDERS and LINEITEM
by OrderDate and ShipDate respectively (the cluster key in previous versions)
with a partition interval of 1 week (7 years x 52 weeks per year = 364 partitions).
Perhaps of interest, the scripts show that a rowstore partitioned index is first build
before building the partitioned clustered columnstore index.
TPC-H at SF 1000 (1TB)
The new TPC-H 1TB result on SQL Server 2014 using columnstore
is compared with 3 previous results on SQL Server 2008 R2.
There is a difference in memory configurations between the four systems below.
For SQL Server 2014 with clustered columnstore indexes,
the TPC-H SF1000 total database size is just under 430GB, so with 1TB memory,
the benchmark is running entirely in memory after the initial data load,
with the exception of hash operation spills to tempdb.
For rowstore, the TPC-H SF1000 total database size is 1420GB,
so the two systems with 2TB memory are mostly running with data in memory,
again except for the initial load and spills to tempdb.
There is definitely disk IO for data in the Cisco system at 1TB physical memory.
The performance impact is noticeable, but probably not as severe as
one might think based on how people talk of memory.
The reason is that all of these systems make correct use of massively parallel
IO channels to storage capable of 10GB/s plus table scans,
and many of these also use SSD storage capable of more random IOPS than
SQL Server can consume even at very high parallelism.
The new SQL Server 2014 result is 2.36X higher on composite score (QphH)
and 3X higher in the Power test than previous versions with conventional rowstore.
The 22 individual query run times from the Power test at 1TB are shown below.
The SQL Server 2014 result is the left item in the legend label 4x60 (sockets/cores) and for the succeeding charts as well.
Query 1, a single table aggregation, is more than 10 times faster on SQL Server 2014 using
columnstore on 60 cores (Ivy-Bridge, 2.8GHz)
than 2008R2 using rowstore, on 80 cores(Westmere-EX).
Per TPC-H benchmark procedure, the test is run immediately after data load and index creation?
The second largest speed-up is Query 16, joining 3 tables, at 6.4X.
Query 10 is 40% slower with columnstore, and Query 4 about the same between columnstore and
This query is listed near the end of this section on TPC-H.
Notice that in the 3 SQL Server 2008R2 results, Query 2 becomes slower
as the degree of parallelism increases from 40 cores (threads unspecified) to 80 cores/80 threads
and then to 80 cores/160 threads.
Elsewhere I had commented that SQL Server really needs a graduated approach to parallelism
instead of the all or nothing approach.
TPC-H at SF 3000 (3TB)
The new TPC-H 3TB result on SQL Server 2014 is compared with a previous result on SQL Server 2008R2.
Here, the difference in memory is a significant contributor.
The SQL Server 2014 system has more memory than the columnstore database,
while 2008R2 systems has much less memory than the 3TB rowstore database (4.5TB).
I am supposing that the reason the HP 2010 report only configured 512GB
(128 x 4GB priced $29,440 in 2010)
was that there would not be a significant performance improvement for the
TPC-H 3TB result at either 1TB or even 2TB memory in relation to the higher price
(128 x 16GB priced $115K in 2011).
Several of the TPC-H queries involve nearly full table scans of the large tables.
If there is not sufficient memory for the entire database,
then the next objective is to have sufficient memory for reducing the spill to disk in hash operations?
HP may have elected for the better price-performance?
Or perhaps someone just wanted to make a point.
The point being that it is important for the SQL Server engine to function correctly
when heavy IO is required.
In the SQL Server 2014 result, the system has 3TB memory (96 x 32GB priced $96K in 2014)
which is sufficient to hold the entire data set for TPC-H 3TB in columnstore.
The overall composite score is 2.8X higher with columnstore and 3.4X higher on the Power test.
The 22 individual query run times from the Power test at 3TB are shown below.
The largest gain with column-store is Query 19 at 19.7X.
Query 4 and 10 show degradation,
similar to the case at SF1000.
TPC-H at SF 10000 (10TB)
The new TPC-H 10TB result on SQL Server 2014 is compared with a previous result on SQL Server 2012.
Strangely, supporting documentation for the HP 2013 report on SQL Server 2012
is missing so there is no indication as to whether nonclustered columnstore is
I am guessing that columnstore was not used because the results are
in line with expectations on rowstore.
The full data size in columnstore at SF 10000 should be 5TB,
and 14TB in rowstore,
so there should have been heavy disk IO in both results.
The overall composite score is 2.55X higher with column-store and 3.1X higher on the Power test.
The 22 individual query run times from the Power test at 10TB are shown below.
The largest gain with column-store is Query 6 at 23.2X.
Query 4 and 10 show degradation
as in the two previous cases.
TPC-H Query 10
Below is Query 10. This query is consistently slower in columnstore relative
/* TPC_H Query 10 - Returned Item Reporting */
SELECT TOP 20 C_CUSTKEY,
AS REVENUE, C_ACCTBAL,
N_NAME, C_ADDRESS, C_PHONE, C_COMMENT
FROM CUSTOMER, ORDERS,
WHERE C_CUSTKEY = O_CUSTKEY AND
L_ORDERKEY = O_ORDERKEY AND
O_ORDERDATE>= '1993-10-01' AND
L_RETURNFLAG = 'R'
AND C_NATIONKEY = N_NATIONKEY
GROUP BY C_CUSTKEY,
ORDER BY REVENUE DESC
The execution plan for rowstore at SF 10 is shown below.
(Click for full-size)
The execution plan for columnstore at SF 10 is shown below.
(Click for full-size)
Below are the details on ORDERS and LINEITEM from the rowstore plan.
Notice that there are seek predicates on LINEITEM
for 1993-09-20 to 1994-06-17.
The actual range should be 1993-10-02 to 1994-05-02, for
1 day after the OrderDate lower bound and 121 days after the
OrderDate upper bound.
Below are the details on ORDERS and LINEITEM from the columnstore plan.
In columnstore, every operation is a scan.
There is a predicate for the ORDERS table
but not on the LINEITEM table.
Presumably storage engine must scan entire set of LINEITEM partitions
while only scanning the ORDERS partitions encompassing the SARG date range
I am thinking the reason is that with date correlation in conventional row-storage,
the SQL Server query optimizer knows that the data range in LINEITEM ShipDate is also restricted
by the lower OrderDate and the upper OrderDate plus 121 days,
corresponding to 1 day after the lower bound on OrderDate
to 121 days after the upper bound on OrderDate.
TPC-H Query 4
TPC-H Query 4 below is slower than row storage in the 3 and 10TB results. I am thinking that the reason is the same?
/* TPC_H Query 4 - Order Priority Checking */
COUNT(*) AS ORDER_COUNT
WHERE O_ORDERDATE >= '1993-07-01'
AND O_ORDERDATE <
WHERE L_ORDERKEY = O_ORDERKEY
AND L_COMMITDATE < L_RECEIPTDATE)
GROUP BY O_ORDERPRIORITY
TPCH Query Plans
for the TPC-H reference queries and execution plans
at SF1 on SQL Server 2005.
The parent page TPCH Interim
has links for the SF1000 query plans with and without parallelism.
TPC-H Columnstore Summary
As with every other new feature, Columnstore is a really interesting new technology.
But think hard about what is really happening,
experiment, and remember to get good execution statistics and plans
prior to making changes,
then get the new execution statistics and plans after the change.
One reason I like to look at official TPC-H benchmark results over "real-world"
is that the benchmark system is properly configured for both before and after results.
There is a significant difference in the data size involved for each query between
rowstore and columnstore.
If the reference system has a poor storage system
(and how often have we seen this? this is guaranteed when the SAN vendor assisted in configuration),
then it is possible to produce almost any performance ratio.
The charts below show the progression of performance over time for the selected TPC-E results
spanning Core 2, Nehalem, Sandy Bridge and Ivy processors at 2, 4 and 8 sockets.
For the 2-socket systems, West-1 is from the first set of TPC-E results reported for Westmere X5680 with HDD storage
and West-2 is the later X5690 report with SSD storage. Both are 6-core Westmere-EP processors.
The West-3 is the E7-2870 10-core (Westmere-EX) on SSD storage.
For the 4-socket systems, West-1 is on HDD storage, and West-2 on SSD, both 2K8R2 and 1TB memory.
The West-3 is on Win/SQL 2012, 2TB memory and SSD storage.
The same data is shown below with reverse organization
showing scaling with sockets for each of the processor architectures.
2-socket 4-core, 2.93GHz (11.72 core x GHz),
4 & 8-socket 8-core 2.26GHz (18.08 core x GHz)
2-socket 6-core, 3.46GHz (20.76 core x GHz/socket),
4 & 8-socket is 10-core 2.4GHz (24 core x GHz)
2-socket 8-core, 2.9GHz (23.2 core x GHz/socket),
4-socket is 8-core 2.7GHz (21.6 core x GHz)
2-socket 12-core, 2.7GHz (32.4 core x GHz/socket),
4 & 8-socket is 15-core 2.8GHz (42 core x GHz)