THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Large Query Performance from SQL Server 2000 to 2008, 32 & 64-bit

I had been meaning to do a somewhat comprehensive review of SQL Server performance from versions 2000 to 2008 for both 32 and 64-bit on Data Warehouse type queries, with in depth examination of scaling in parallel execution plans.


For now, I can provide a short summary. The test platform is a Dell PowerEdge 2900 with 2 quad-core Xeon E5330 2.66GHz processors, and 24GB memory. The operating system is Windows Server 2008 64-bit for both 32 and 64-bit SQL Server versions. Technically SQL Server 2000 is not supported, but this is just a performance comparison, not a production environment. The database is generated using the TPC-H dbgen kit for scale factor 10, meaning the Lineitem table is approximately 10GB, and the entire database is approximately 17GB, which fits entirely in memory. There was some tempdb activity, which is spread across 10 15K drives.


All tests are run twice to load data into memory and pre-compile the execution plan for the second run. All results shown are for the second run. For SQL Server 2008, the tables use the new Date data type in place of Date Time, and queries are modified to avoid conversion anomalies as noted below. Below is the total (sum) CPU time in milli-seconds to execute the 22 queries in sequence  for max degree of parallelism: 1, 2, 4, and 8.


                        DOP 1         DOP 2         DOP4          DOP 8

2000 RTM             534,912    663,848    656,232    697,794

2000 bld 2187      514,881    589,245    657,543    770,272

2005 RTM 32        463,526    444,479    456,567    498,623

2005 SP2 32        464,478    403,668    413,685    452,134

2005 RTM 64        379,363    377,570    394,962    474,200

2005 SP2 64        370,206    327,149    345,155    436,491

2008 RTM             375,136    324,264    343,250    410,220


Duration in milli-seconds to run 22 queries by max DOP.


                        DOP 1         DOP 2         DOP4          DOP 8

2000 RTM             553,900    293,411    191,552    149,568

2000 bld 2187      566,333    276,085    188,497    164,677

2005 RTM 32        480,839    237,933    134,644    84,721

2005 SP2 32        483,842    214,804    119,525    72,515

2005 RTM 64        379,563    194,199    107,409    65,094

2005 SP2 64        370,374    166,579    94,844      59,388

2008 RTM             375,135    171,390    94,028      56,795


On SQL Server 2000 build 2187, notice that CPU increases from 514.8 to 589.2 seconds going from Max DOP 1 to 2 and so on to Max DOP 8. This is expected because there is overhead to employing a parallel execution plan, and the overhead increases with the number of threads involved. Between SQL Server 2000 RTM and build 2187, there was a sharp jump in the CPU required at Max DOP 8. I will disregard this as there were significant changes and code fixes between the two builds concerning correctness of parallel execution plan results. Still, there is an overall performance gain from Max DOP 4 to 8. Several years ago, I mentioned that SQL Server 2000 performance is very problematic beyond Max DOP 4. That was before multi-core processors, and there were at most, 4 cores per NUMA node. So the more correct interpretation is that SQL Server 2000 is very problematic on NUMA systems. An earlier look at SQL Server 2005 RTM showed no such problems on NUMA.


In SQL Server 2005, and 2008, there is actually a decrease in CPU going from Max DOP 1 to 2. This is mostly attributed to the bitmap filter in hash operations. Some queries show a significant drop in CPU from DOP 1 to 2, others no change, and some an increase. From DOP 2 to 4 there is a slight increase in CPU and more significant in going from DOP 4 to 8. This might indicate that DOP 2 and 4 are very good for overall efficiency, benefitting from bitmap filters in hash join operations, yet without incurring excessive parallelism overhead. (This is unrelated to the recommendation of Max DOP 4 on Itanium systems based on cores per NUMA node). Unrestricted parallelism on the 8 core system yields the best single stream completion times, although this should really be tested on 16 or more cores before setting any rules.


In the transition from SQL Server 2000 to 2005 RTM, both 32-bit, the duration performance gain is a modest 15% for non-parallel plans and a very substantial 49% at Max DOP 8. From SQL Server 2005 32-bit to 64-bit, both RTM builds, the performance gain was a solid 20%. The CPU efficiency improvement was a little less, so the tempdb configuration affects the results. Even though the entire data fits in memory, a large query with intermediate results is more likely to spool to tempdb in 32-bit than 64-bit. From SQL Server 2005 64-bit RTM to Service Pack 2, an additional 10% was realized at DOP 2 and higher.


SQL Server 2008 RTM is marginally better than SQL 2005 SP2. There is significant variation from query to query, so improvements should be expected over time hopefully to correct the query plans that are slower while maintaining the performance advantage of plans that are better. One of the big disasters in 2008 parallel execution plans occurs on Query 5, Local Supplier Volume. The query is:






 AND O_ORDERDATE >= '1994-01-01'




The MaxDOP 1 plan is essentially:




  AND O_ORDERDATE >= '1994-01-01'



At MaxDOP 1, the actual CPU is 21,965 ms for the original query,

the MaxDOP 2, the CPU is 28,721ms for the original.

MaxDOP 2 CPU for the forced query is 13,323.

So this one query added 15.4 CPU-sec to the total 22 query 324.3 CPU-sec,

close to 5%, and about 8.0sec duration.

Query 8 was also bad news on the parallel plans, with about 5 CPU-sec lost on the MaxDOP 2 parallel plan compared with a forced parallel plan modeled on the non-parallel plan. One might think that MS should have caught these anomalies. I think the reason they do not is that MS does not look at SF1-30 TPC-H results. The minimum for publication is 100GB, and that will probably increase to 300GB soon, because 30GB is not a real data warehouse. I do think MS should look very carefully at SF1-30. The queries are at the onset of eligibility for parallelism. The really big queries in SF100 and higher are less likely to encounter plan problems. While not strictly a data warehouse, most transactional databases I have seen do not remotely resemble TPC-C or E. I would say most have TPC-H SF1-10 sized queries mixed in with smaller transactions. So a bad execution plan can be really bad news.


I am sufficiently satisfied that SQL Server 2008 has a very powerful engine, and a decent optimizer. However, I have complained in the past about the rigid assumptions that all query costs factor in IO time, the use a fixed random to sequential IO performance model (320 IOPS to 10.5MB/sec) and an out of balance IO-CPU ratios. If a proper calibration of the true cost formulas were to be done, there would probably be fewer silly mistakes resulting in goofy execution plans. Given that many people do not know how to diagnose this type of problem, a simple test of 2000 or 2005 and 2008 can encounter this matter, leading to a decision to stay with 2000/2005, when a few simple adjustments would have corrected the 2008 results.


SQL Server Settings

Generally I follow the HP TPC-H publications on optimization settings, particularly -E and -T834. Neither changed results by more than 1% either way. I had also looked at -T2301 in the past finding no apparent differences. I really would like MS to provide more details on T2301. Are there set points below which it has no effect?


SQL Server 2008 new Date data type changes

The 3 datatime columns in the LineItem table from 2005 become Date columns, for an apparent savings of 12 bytes. The 2005 tpch SF10 database was 13.77GB (rather million KB) data and 3.68G indexes for a total of 17.46G. In 2008, using the Date data type in place of datetime, the size is 12.77 data and 2.96G index for a total of 15.74G. The average bytes per row of LineItem drops from 169 to 153, because one of the DateTime/Date columns was the cluster key.

Nornally, a simple reduction in size on column width, not row count, does not improve performance unless it impacts fit in memory. I always try to exclude this factor because one can generate any difference in performance by adjusting the amount of disk IO.

The original TPC-H queries may have SARG of the form

  AND O_ORDERDATE >= '1994-01-01'

  AND O_ORDERDATE < DATEADD(YY, 1, '1994-01-01'))

Even before SQL 2008, the date functions would return a datetime or smalldatetime result as appropriate. In SQL 2008, the nature extension is to return a date type when the comparison is a date column. I made this request in connect and was told to bugger off. So SQL 2008 will convert the column to date time to equate with the function, losing the benefit of a proper SARG. Anyone upgrading to SQL 2008 with the date type and not changing code as below may get a nasty suprise. 

  AND O_ORDERDATE >= '1994-01-01'


Little things like this can cause people to refuse to budge from SQL 2000, which really needs to be retired.


Duration for SQL 2008 64-bit 

1P D 2P D 4P D 8P D
Q1 50,013 26,317 12,591 7,159
Q2 504 268 150 107
Q3 16,296 5,186 3,158 1,902
Q4 19,232 5,288 3,452 2,340
Q5 21,648 16,150 8,371 5,120
Q6 1,845 929 496 312
Q7 17,397 4,369 2,388 1,376
Q8 5,734 6,765 3,628 1,849
Q9 48,361 22,034 11,335 6,372
Q10 15,281 5,822 3,595 2,425
Q11 4,423 1,238 657 600
Q12 9,363 4,828 4,356 2,365
Q13 21,699 11,310 5,751 2,967
Q14 2,146 1,033 547 334
Q15 1,368 970 521 249
Q16 6,599 3,615 2,018 1,848
Q17 1,243 521 294 213
Q18 50,909 27,945 15,439 9,365
Q19 2,096 1,093 607 378
Q20 841 430 255 165
Q21 69,191 22,064 12,826 8,337
Q22 8,946 3,213 1,592 1,010
375,135 171,390 94,028 56,795

my apologies, Linchi post SQL 2005 64-bit results, so my duration results for SQL 2005 64-bit, SP2 (no cu) below

1P D 2P D 4P D 8P D
Q1 64,761 32,553 16,428 8,344
Q2 504 295 158 106
Q3 14,733 4,782 3,003 1,989
Q4 17,506 5,338 3,747 2,519
Q5 19,716 7,376 4,654 3,159
Q6 1,609 893 471 309
Q7 15,855 5,472 3,306 2,403
Q8 5,225 2,391 1,333 2,147
Q9 44,611 23,291 12,213 7,222
Q10 13,989 6,384 3,934 2,754
Q11 4,093 1,192 669 495
Q12 8,166 4,497 4,022 1,714
Q13 25,830 13,566 7,521 4,260
Q14 2,060 1,020 526 352
Q15 1,358 1,931 1,139 235
Q16 6,476 3,476 2,429 1,215
Q17 1,012 524 291 199
Q18 46,954 26,156 13,896 9,209
Q19 2,133 1,172 623 450
Q20 830 446 253 172
Q21 64,231 20,850 12,536 9,087
Q22 8,722 2,972 1,692 1,049
370,374 166,579 94,844 59,388

Published Sunday, August 24, 2008 2:25 PM by jchang
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



Jason said:


Any chance you can rerun the x64 tests with trace flag 834 turned on? I have been meaning to test but I haven't had a chance yet.


August 17, 2008 6:47 PM

jchang said:

I added additional notes to the end. Normally I run with -E -T834, There no meaningful difference compared with default. I did not run with -T834 only. The difference was about 1% either way, depending on the DOP. Lock pages in memory was already set. Apparently the 64-bit OS is aggressive about paging out SQL memory without it. It is possible -T834 will matter more when you really big memory, like 128G+.

August 17, 2008 11:54 PM

grumpy old dba said:

I'm very grateful - solid comparisions between versions of sql server are very time consuming to produce and coming from yourself adds a certain weight when presented to clients. Thanks.

August 18, 2008 3:11 AM

Grumpy Old DBA said:

August 18, 2008 3:29 AM

Anonymous said:


Great post.  Can you give a bit more information on the following sentence:

"For SQL Server 2008, the tables use the new Date data type in place of Date Time, and queries are modified to avoid conversion anomalies."

Two questions:

 A) How much impact on index size did the modification from date to datetime produce?  Even if everything is in memory, reducing the footprint there could create quite a bit of an advantage for 2008.  Is this really a fair comparison given that change?

 B) How were the queries modified?  Just to fix date/datetime-related issues?  Or did you make some other changes as well?

August 18, 2008 10:57 AM

jchang said:

clarifications made above regarding Adams questions

August 18, 2008 5:23 PM

Guti said:

Very comprensive post, to get a rought idea on the performance imporvements during SQL Server.

Glad to confirm that x64 versions are considerable faster than its x86 counterparts.


Guti (

August 19, 2008 5:37 AM

Linchi Shea said:

Joe Chang posted some interesting results using the TPC-H queries with the scale factor set to 10. I

August 24, 2008 1:35 PM

Leonardo said:


Very interesting blog.

I found it because i am looking for a response to one project.

The project will manage a database of 20000000 registers,yes, twenty millions registers.

But i am not sure about my recommendation.

I have some questions

Is Sql Server capable to manage a simple query like

Select name,sum(value)

From mytable

from a table with 20000000 registers ???!!!

I use generally that new technology call linq to manage all kind of collections on my projects.

Do you know if would be possible to use link against sql server to retrieve the information with that workaround ?

Thanks in advance,


December 10, 2008 11:38 AM

jchang said:

of course SQL Server can handle it, but do you really want to return 20M rows frequently? (I am assuming there are 20M names, and some table that lists transactions for each name/register) I do not think that is very smart and will probably swamp the web/app server. LINQ is a way of programming(?), with local storage capabilty(?) and is not meant to be a multi-user data engine(?)

December 12, 2008 11:39 AM

Denis Gobo said:

Wow, it has been already a year since I wrote A year in review, The 21 + 1 best blog posts on SQLBlog

December 31, 2008 10:38 AM

Acronyms said:

Thank you for simple and very visual comparison!

February 14, 2009 2:18 AM

Soni said:


 Currently I am facing a problem. My sqlserver 2005 database has some tables with more than 10M records. Everyday there is transaction of around 50000 records with many concurrent users. We did partitioning with current 3 days data in current partition and rest in other partition but still it doesn't help as due to heavy data the queries run too slow.

Can u throw some light as what would be the solution to solve this problem as we rarely need data beyond 3 days and the other partition also contains data only uptill 3 months. Rest data is deleted as its not required.

Please give some solution.

September 3, 2009 11:49 PM

jchang said:

go find the SQL Exec Stats tool on my site

run it at the end of the day when the plan cache should be fully populated,

this is a matter of query efficiency

10M records is a small database by modern standards

September 4, 2009 12:03 PM

Roberto Santoro said:

Great article! I was wondering if you were planning to update the stats now that SP1 for 2008 has come out.

Thanks anyway,

this really made my project much simpler,


November 5, 2009 11:05 AM

Jame said:

I have one question. Now my company currently use SQL 2000 server standard edition and enterprise edition on window 2003 server. But now we want to upgrade to window 2008 server 64 bit. Can I run SQL 2000 server 32 bit on window 2008 server 64 bit?

respect all your ideas.

thanks alot

February 7, 2010 10:22 PM

jchang said:

i recall that SQL 2000 is not supported on W2K8, and may refuse to install

it is possible to hack the registry and make S2K run

but not recommended for production

I suggest putting S2K 32-bit on W2K3 64-bit

but move to S2K5/8 soon

February 8, 2010 9:16 PM

Jame said:

Thanks alot Joe.

February 9, 2010 2:17 AM

Jame said:

Hi Joe,

I need your idea now. I saw in my SQL server log file.

--Using dynamic lock allocation. [2500] Lock Blocks,[5000] lock owner blocks.

What does mean?

Thanks in advance.

March 16, 2010 5:39 AM

kelvin said:

hi... i'm a newbie in sql 2000 and i wanna know is there a possiblity to get data faster for 2million row....

i really2 need your help because this 2million row will be use frequently....

Thanks in advance... ^^

June 30, 2010 5:39 AM

kelvin said:

oh i forgot to mention that i already try to select all the data and it take around 2 minute but when i put order by... it become 12 minute...

Thanks once again.. ^^

June 30, 2010 5:42 AM

jchang said:

first, you need to figure out if 2min is how long it takes for the SQL server to process the 2M rows, or for the client to receive it

second, the 12min is probably because you have a bad tempdb IO performance,

I suggest you go to,

its better set up to for this type of help

also, comment on a more recent blog

July 1, 2010 8:48 PM

Ekrem said:

Hi all;

I got a question. I am writing a program which can show informations about processes and i want it to work on lots of  different computers. my question is how can i get informations about processes in sql server 2000 and 2005 and 2008?

Do i have to change my query?

My query is for 2008 R2 here:

select top 20

loginame = convert(varchar(128),loginame),





memusage, hostname = convert(varchar(128),hostname),

program_name = convert(varchar(128),program_name),     hostprocess = convert(varchar(10),hostprocess)


master.dbo.sysprocesses (nolock)

order by spid

Thanks a lot...

July 4, 2011 4:23 AM

snreddy said:

good information Nice post the way  you have written is appreciable

July 17, 2011 6:25 AM

Leave a Comment


About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog


Privacy Statement