THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

About 64-bit

Back in the 1997 time frame, I gave a presentation projecting out when 64-bit operating systems and applications should become pervasive in the high volume platforms. My best estimate was this would be some time in 2003 or 2004. This meant that hardware platforms should be ready in 2001 or 2002 to allow for reasonable software availability, beginning with the operating system and a few key applications. And yet here we are in 2008 with most people are still running predominantly 32-bit environments. There is even reluctance to run 32-bit applications on 64-bit operating systems to facilitate the transition to full 64-bit.

First, what do we mean by 64-bit? It used mean any of: the size of internal registers, or external data or address busses. Recently, this term is used mostly for the virtual address space (VAS, sometimes called linear address), which requires registers of the same size (or larger) for an efficient implementation. The data bus width long ago went to 64-bit or 128-bit in the case of Itanium. Going forward, the bus width is no longer relevant because off-chip communications are transitioning to point-to-point links whose width is not related to the internal architecture.

Just like in the time of the 16 to 32-bit transition, people ask if or even presume that 64-bit is faster than 32-bit. Technically, this is not true or false by itself. It is somewhat complicated as most things in life. When applications out grow the current VAS architecture, a series of convoluted procedures are employed to get around limited the VAS size. In the next larger VAS, these elaborate mechanisms are no longer necessary, which allows simpler operation and possibly better performance. That is until the next transition is required.

If the entire program resides in L1 cache and if there are no differences in the instruction set architecture (ISA), there should be no difference between a 32-bit and 64-bit program. Now a 64-bit program has 64-bit pointers, just as 32-bit programs have 32-bit pointers. The size of the pointer is usually tied to the size of the VAS. For the AMD64 (Intel uses the term Intel 64, and IA-64 for Itanium) ISA, instructions can be 32 or 64-bit. The Itanium ISA packs three instructions in a 128-bit word. In any case, even if most instructions are 32-bit, the 64-bit program memory working set is larger because of the 64-bit size of pointers. This means that the 64-bit program can have more L1 and L2 cache misses. Depending on the latency differences between L1, L2 (L3 in some cases) and any off-die cache or main memory, the will be a performance penalty accounting for delays accessing the next instruction. The penalty should be smaller for L1 misses caught by L2 for processors that have very low latency L2, but become more significant for on-die L3 if applicable and much more for off-die cache or memory accesses.

Now, the AMD64 ISA is not a simple extension of the 32-bit X86 (Intel IA-32) ISA to handle 64-bit linear addressing. The 32-bit X86 ISA has 8 (somewhat) general purpose registers, a liability inherited from the old 8086 16-bit X86 ISA. The 8086 was not meant to be a high performance microprocessor, having 29K transistors on a 3 micron process. The slightly later Motorola 68000 with 70K transistors on a comparable 3.5 micron process had 16 (8 data, 8 address) registers. The later Intel 80386 had 276K transistors on 1.5 micron, so 130K transistors might have been possible on mature 3 micron process. RISC microprocessor in the early 1980’s went with 32 general purpose registers. Itanium, intended for 1998-99 introduction, implemented 128 general purpose registers.

The AMD64 ISA features 16 general purpose registers in 64-bit mode, which is a significant innovation for an extension of the X64 ISA. This has important performance implications. The 8 register architecture of X86 resulted in binary code with a sizeable portion of code consisting of instructions that copy the contents between register and memory and back and forth to free up registers for the immediate tasks. In the 1990’s, it was thought that the X86 ISA incurred a 15-20% performance penalty relative to an otherwise comparable RISC microprocessor with 32 registers, just on the reduction of superfluous instruction execution with the higher number of registers.

An examination of an open source data compression algorithm with a short code sequence showed about 10% of the instruction were register to memory copies of a temporary nature. The same code compile for AMD64 showed no register to memory copies for temporary storage. When compiled for Itanium with optimization, entire loops were unrolled entirely in the available registers. The general idea is that more registers allow for complex code with relatively few avoidable register to memory temporary copies, with the balance criteria of not slowing register access with rarely used registers.

The actual performance characteristics of the compression code on 32-bit and 64-bit showed the 32-bit version to be about 1-2% faster. There was no difference in performance between the 32-bit program on a 32-bit or 64-bit OS. It is possible that the temporary memory copies were handled by L1 cache. Another explanation might have to do with Intel Pentium 4 NetBurst architecture. The NetBurst has 96(?) actual registers even only 8 are visible to the ISA. A register renaming scheme is implemented along with out of order execution. So it is possible the temporary register memory copies never occurs or otherwise causes no penalties.

All of the above discussion supports the premise that 64-bit should not have significant performance difference better or worse unless it avoids complication contortions that a 32-bit code would do to stay within the 32-bit VAS.

The Windows operating system requires address space for certain memory structures like the non-paged and the paged pool, and System PTEs. This is discussed in detail in various Microsoft documents. A number of environments have shown that the OS can be constrained due the 32-bit size limits.

Now SQL Server 32-bit can use more than 4GB of physical memory for data buffers. This involves PAE on the 32-bit OS and AWE feature. The AWE memory cannot be used for the stored procedure cache or other functions. Personally, I have never thought that procedure caches should be allowed to become very large. It is usually an indication something has gone horribly wrong with the design of the application (not the SQL Server engine). This can happen when the application does not use stored procedures, or uses dynamic SQL within the stored procedure defeating the purpose of the stored procedure, or someone decided it would be a good idea to have hundreds or thousands of databases. In any case, the procedure cache is almost entirely plans that will not be used again and plans that do need to be reused may get evicted.

In a well designed transaction processing application, the vast majority of physical memory is used for data buffers. The penalty for using AWE memory instead memory within the VAS appears to be small, probably between 5-10%. Still, a particular application that uses address space outside of the 8KB pages managed by the SQL engine may eventually cause severe VAS fragmentation after extensive uptime (or sooner) such large contiguous spaces are not available. This can cause severe performance degradation or it could cause the internal garbage collector to active, which appears to be a complete system lockup. These events are not easily revealed in a simple performance test.

It is in large query performance that benefits of a full 64-bit SQL Server and Windows operating system are more directly observed. A large will need address space (and memory) to handle the intermediate results, particularly hash and sort operations. This needs to be directly addressable, and not in the AWE region. If the intermediate results are too large, it is spooled out to the temp table. In the 32-bit version, the size of address space that can be allocated for this purpose is limited. In the 64-bit version, far more can be kept in memory before the temp database is required.

A quick performance test can be done using the TPC-H data generator and the tables, indexes and queries from recent published reports. The scale factor 1 database is used as a control. This database has a Line Item table with just less than 1GB data and total data plus indexes of 1.7GB, which fits within the 3GB 32-bit address space (or nearly 4GB for 32-bit SQL Server on Windows 64-bit). AWE memory is not enabled. Performance counters indicate no disk activity, and nearly no activity to temp. The second test is the SF 10 database with total size of test of 17GB.

Surprisingly, SQL Server 2005 64-bit showed about a 10% performance improvement in both CPU usage and runtime duration (based on the sum of the 22 queries, not the geometric mean for TPC-H reporting requirements) over SQL Server 2005 32-bit. At SF 10, the improvement was about 20% in both CPU and duration. There was very little disk activity to data in both 32 and 64-bit test runs. There was a moderate level of temp database activity in the 32-bit test and almost none in the 64-bit test.

I have just started SQL Server 2008 performance testing. When I have properly vetted the results, I will discuss in the usual light detail.

Published Friday, March 28, 2008 4:16 PM by jchang

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Linchi Shea said:

Joe;

can you leave some space between the paragraphs?

March 28, 2008 3:42 PM
 

jchang said:

hmm, previously when i had a line between paragraphs, this site would display 2 lines, i will try a few things next time, but don't expect me to get it right

by the way, the small 2M read cache really applies to SANs, I recall a long time ago a discussion on special tricks to avoid hot-spotting on a single, which meant a stripe strategy that kills sequential performance,

March 28, 2008 4:00 PM
 

Paul Nielsen said:

Joe, just edit and re-publish the post, then delete this comment.

March 28, 2008 10:47 PM
 

Greg Linwood said:

My input buffer is also too small to consume this post in a single operation. Line breaks would indeed make this apparently very interesting post a lot more readable joe!

One thing I did notice before my buffer ran out of space is that your post seemed to be focussing on the Windows platform, but other platforms were doing 64 bit long ago of course.. I remember reading internal memos about DEC's 128 bit Alpha project plans back in the early-mid '90s. I don't think they ever got terribly far though as they were embroiled in so many fabrication problems with the 64 bit Alpha products at that time. I wonder where DEC would have been today if their transition to 64 bit Alpha was smoother, maybe we'd be working on 128 bit platforms already?

Do you know whether anyone else is even thinking about 128 bit platforms yet?

March 29, 2008 5:29 AM
 

jchang said:

Linchi, happy now? this is why I am strictly database engine, nobody pays me to make it look pretty

Paul, I think I will keep your comment, there is no erasing ones track on the internet!

Greg: Yes, I am fully focused on Windows, I have done some analysis of other processor architectures in the past, but then the looming onset of Itanium killed all the RISC except IBM, who were very smart, looked at what Intel was doing on Itanium, figured out where Intel would not go, that is very high SMP scaling, because of the system IO complexity, and then went there. In actuality, both Opteron and the Pentium 4/M/Core 2 would have killed the RISC processors in the 1-4 socket sub-$20K space.

My own views on Alpha, the architecture was much admired, but people seemed to imagine it could do astonishing performance when in fact it was just very good and fully competitive. I think the real un-appreciated achievement of the Alpha architecture was its simplicity, it could be designed by a relatively small team and have world class performance. The Intel X86 processors required massive design teams to make the old ISA run in the big leagues. Serious spoon bending was happening behind the scenes.

As for why the Alpha platform failed, I will leave that to the people who were there. My opinion was that in the mid/late-90s, it was Sun that got the platform strategy right. SPARC was probably the weakest of the RISC processors, well maybe MIPs to, but the platform strategy was to have a solution at every price point, from 1P sub-$3K, to 2P, 4P, 6P, 14P, and $1M at 64P. To a degree, Sun got lucky in acquiring the Cray spin-off and picking up the E10K at just the time when the market wanted such a thing. Neither Alpha, HP or IBM had such a range of platforms at the time. HP and IBM have since rectified this. SGI did also have range, but was more viewed as an HPC company. Sun also showed leadership in the Unix OS space, so many new age developers preferred Sun over the other grandpa Unix flavors. Of course, when the internet bubble crashed, Sun got burned more than HP/IBM who had mpre brick and mortar customers.

On 128-bit. Some one once commented that we seemed to be consuming 1-bit of VAS per year. This seems reasonable, 2000 was about the point when mid-range RISC outgrew 32-bit, I think 32-bit RISC workstations started around. So we should out grow 64-bit in the 2030 time frame. It is not clear to me that the next step in 128-bit, the penalty of such wide integers is too much. I think the next step might be 96

March 29, 2008 12:30 PM
 

Dave Markle said:

You know, I think we'll certainly see CPUs or ISAs being *branded* as 128 bit, but we'll never see what I consider to be a true 128 bit CPU in our lifetimes.  We certainly won't see flat, 128 bit address spaces... People like to throw out the famous "640k quote" when I say this, but nobody has a use for more than 18 (pentillion?) bytes of addressable space.  

What's really depressing to me is the slowness to adopt 64-bit on the desktop.  Given the fact that anyone can have 4GB for $107 (!!!) on their machine (http://www.crucial.com/store/partspecs.aspx?IMODULE=CT2KIT25664AA800), this is darn near inexcusable -- that's a full gig of memory wasted for most people -- especially as a developer.  

I wonder when the last 32-bit SQL Server is going to be released.  I'll bet it ends with SS 2008.  I'd also be unpleasantly surprised to see if Windows 7 gets released for a 32-bit platform nowadays, too...

March 29, 2008 8:36 PM
 

aspiringgeek said:

Having been in the industry only ten years (hence my sobriquet), I am marvelling at the history all y'all have at the tips of your fingers.  Joe & Dave, I've copied-&-pasted this post in a Task flagged for a decade hence.  It'll be interesting to compare your architectural prognostications to reality.

March 29, 2008 11:12 PM
 

Scott R. said:

Responding to David Markle's comment on the last 32-bit SQL Server:

Saying that SQL 2008 may be the last 32-bit version is both possible and consistent with the fact that Exchange has already dropped 32-bit versions with last year's Exchange 2007.  However, I suspect there are other factors that make the SQL Server situation different than Exchange, and may actually keep 32-bit SQL Server versions around for longer (past SQL 2008):

-  Desktop development environments that also host a SQL Server instance (and are slower to move to 64-bit, as previously mentioned) - not the case with Exchange deployments.  My guess is that some development desktops will convert over to 64-bit, but not all for some time.

-  SQL Express is 32-bit only today (possibly part of the throttling mechanisms it has - along with 1 GB memory and 4 GB DB size - for being a free version)

-  SQL Server Compact Edition will need to run on a variety of OS versions for some time, including older 32-bit OSs

I do agree the primary versions / editions that IT groups should be deploying are Enterprise / Standard / Developer Editions for x64 (or Itanium for those that choose that path) - assuming that IT groups finally get over the hesitation to do deploy x64 OS and DB instances and more fully exploit the great x64 64-bit capable servers they are already buying (some are, but many are not).

It will be interesting to see if Microsoft forces this issue in the next post-SQL 2008 version or waits longer.  It is probably less a technical decision and more of a marketing and market positioning issue for Microsoft.

Scott R.

March 30, 2008 4:57 PM
 

Dave Markle said:

Scott:

That is a good point -- though I expect SS 2008 to be the last 32-bit (and Windows 7 might ? be the last 32-bit OS) version of the product, who knows...  It's probably going to happen that Windows 7's core will be built for a 32-bit x86 platform, but that platform will probably be a cell phone or other embedded application -- you might not be able to buy Windows 7 as a PC operating system.  By the time Windows 7 comes out, 4GB of memory will probably cost something like $75 or $50...  Same goes with SQL Server...  Again, who knows?  Maybe SQL Server Mobile will go away in 2 or 3 versions, to be unified with the full SQL Server product...  This is going to be interesting.    

March 30, 2008 10:21 PM
 

jchang said:

Exchange went 64-only because they had a pressing need. customers want to consolidate mail servers. the cpu power to support this is there, but Exchange burns VAS. Way back in Exchange 5.5 days, there was an effort to connect 30K+ users to Exchange. Each connection consumed 30-40K VAS. between that, the internal plumbing, there was barely 1GB left for buffering data. MS reset the Exchange benchmark for 2000 drastically lowering the number of users.

On the desktop/notebook. If I can get a 64-bit driver for my Verizon/Novatel EVDO card, I'd have ditched 32-bit last year. Right now, I have 32-bit XP on one disk, I swap out another that has W2K8-64. I also just got a Dell Vostro that can hold 2 internal disk. Unfortunately I did not buy the mounting bracket for the second disk and I stripped the screw holding the main disk in, so thats stuck Vista-32 for now.

March 31, 2008 9:31 AM
 

Welsi from peru said:

Hi man thanks for yoir blog is very interesting.

I have a question for you, hope you can help

the sql server 2000 64 bits is available for all processors 64 bits? xeon for example

SQL Server 2000 64-bit software is available only for 64-bit IA opeating? what is IA?.

call microsoft, but tell me that this version is not available, you know of any places to get?

you are eternally grateful for the information, thanks

Welsi Tuesta from Peru.

wtuesta@iticsa.com

May 4, 2009 11:21 AM
 

0xC000005 said:

The 10% performance improvement for 64-bit occurs because the 64-bit architecture has 16 registers rather than 8 for the 32-bit architecture. The compiler will generate code where less swapping of operands between registers and memory occurs.

October 18, 2011 8:49 AM
 

jchang said:

the 16 versus 8 GP registers is the most reasonable guess, but it is a difficult matter to prove. When working on compression code, an examination of the assembly definitely showed that the 64-bit code with 16 registers had no register to memory copy with a later memory back to register, but the performance was exactly the same. We have to consider that the assembly is only the X86/x64 instruction set architecture represention. The actual micro-architecture of modern microprocessors is vastly different from the X64 (Intel 64 in intel speak) ISA, specifically, even though the X86 ISA has 8 GP registers, and I64 has 16, the actual CPU since Pentium Pro has 36 and P4+ is something like over 100? So even though the assembly may show r-m-r, this may not actually occur.

October 18, 2011 11:48 PM

Leave a Comment

(required) 
(required) 
Submit

About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement