THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Linchi Shea

Performance Impact of Disk Misalignment

Just google for Windows disk alignment best practice, and you would find thousands of articles, whitepapers, and posts, all preaching the practice of aligning disk partitions on the 64K boundary. For instance, one of the EMC recommendations prescribes a disk alignment value of 64K for the host file systems when deploying SQL Server 2005. Microsoft states that, "misalignment can defeat system optimization of I/O operations designed to avoid crossing track boundaries."

But I must admit that I wasn't sure how much of it was based on solid first-hand and current data and how much of it was due to the sheer number of times it was recommended, giving it a life in itself. Perhaps, since everybody else was recommending it, it had to be true--one of those best practice folklore, urban legends, or myths. My attempt to scour the web for related empirical evidence always seemed to have come up empty handed. I'd be happy if it turns out that my failure to find empirical evidence was simply a direct result of my inadequate Google search skill. Regardless, I like to see some real data points.

That was the motivation, and here's the test design.

Disk I/O Test Tool sqlio.exe (from Microsoft)
Performance Measure I/Os per second
Tool to Set Partition Boundary diskpar.exe  (from Microsoft)
Test File Size 80GB
Test Drive The test file was on a 100GB LUN presented from a SAN
I/O Block Size 8KB
I/O Types Random Reads and Random Writes
I/O Queue Depths 4
Threads to Issue I/Os 1, 2, 4, 8, 16, 20, 24, 28, 32, 36
Disk Partition Offsets 63.5KB (misaligned),
64KB (recommended alignment),
64.5KB (misaligned)

The choice of focusing on 8K random reads/writes was partly arbitrary, and partly due to the difficulty of referencing multiple charts in a blog post at this site. More importantly, as will become evident, because the result of the 8K random reads/writes is enough for us to see the wisdom of the recommendation of aligning disk partitions on the 64K boundary, including additional test scenarios would have a rather diminished margin of return.

The following two charts summarize the test results.

The first chart is a bit difficult to decipher. For 8K random reads, aligning the partition on the 64K boundary didn't produce the best result. This remained consistent in multiple runs of the same tests. Misalignment on the 63.5K boundary, on the other hand, was the worst performer, though it is debatable whether the performance difference among the three boundary settings is really significant.

The second chart shows the results of 8K random writes. Unlike in the case of 8K random reads, the performance difference is no mistake in the second chart, especially between the 64K alignment and the 63.5K alignment (or misalignment). The former outperformed the latter by a whopping ~34%. Also note that the 64K alignment was consistently the best performer for 8K random writes.

Thus, based on these data points alone, it is advisable to align your disk partitions on the 64K boundary.


Published Thursday, February 01, 2007 8:04 AM by Linchi Shea
Attachment(s): diskAlign.gif

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Jonathan B. said:

Linchi,

Thank you so much for going through this case scenario to put a data-point based in reality out there on the net.  Like you said, there is scant hard evidence out there on this topic which makes it hard to know if it is worth going through the hassle of doing.

On a related topic, I wonder if you have thoughts on the proper usage of sqlio and what a user should expect in their results.  I am personally having trouble making heads or tails of the results I am getting.  I can try different configs to get the best value for my system, but that still doesn't help me know if I am in the ballpark of where I should be with my hardware.

For example, if I try sqlio on a RAID0 array with 5 drives, I should get much better results than on a single drive, right?  But then again, that difference depends if I am doing sequential or random access, right?  Anyhow, I've used tools like HD Tach and h2benchw which give me great lower-level profiles of my hardware, but those doesn't seem to relate too well with what I get from sqlio.

I hope my frustration sparks someone to post a nice blog on sqlio and how to use it other than just part of the routine.

Thanks Again!

FYI, here's a thread that has another person's results comparing use of track-aligned partitions and its impact on sqlio benchmarking.  It is not quite as detailed, but it provides some additional support for aligning partitions to the beginning of the track.

http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=10495

February 2, 2007 5:07 PM
 

Linchi Shea said:

Jonathan;

Good that you raised the issues of using sqlio.exe and conducting storage I/O tests in general. I happen to have a few more blog entries planned on the very subject. Interpreting sqlio.exe results is really no issue. The key is how you design your controlled disk I/O tests for the results to be meaningful. I'll leave the details for the upcoming blogs.

Linchi

February 2, 2007 8:10 PM
 

Chuck Boyce said:

Good stuff, man!

February 3, 2007 12:14 PM
 

Jonathan B. said:

While you're at it, what about NTFS cluster sizes and its impact on performance?  What about matching the RAID stripe size to the NTFS cluster size or some multiple thereof?  Do these have any predictable impact on performance from a theorectical perspective?

I've read around that changing the stripe size to a bigger value or a lesser value, depending on the usage profile, would increase performance.  However, from my low-level testing with h2benchw, I've found that DELL's PERC4 and the PowerVault 220S with Seagate's 10K.7 drives do best with a 64K stripe.  This was true even for extreme tests that should have favored other stripe sizes (both smaller and larger).  The point I'm making here is that one should note that optimal RAID stripe size is very heavily dependant on the controller's optimal configuration and not necessarily based on the usage profile.

So, any messing around with stripe size should be weighed against hard measuring of the capabilities of ones own hardware.  h2benchw makes this determination relatively easy; at least it did in my case.  However, that type of performance test only does so from a pre-partition level with no clusters involved.  Hopefully I can make similar judgments from the OS filesystem (post partition) level using test like you provided here using sqlio.exe.

Thanks,

Jonathan

February 5, 2007 10:12 AM
 

JasonM said:

I tested many different combinations with SQLIO looking for the best combination. 64k nt allocation unit, 128k offset and 64k cluster(HP smart array setting) was the best overall setting in my tests. 10%+ gain on sequential writes(log traffic) I was focused on finding the best setting for mixed IO patterns so there may be better settings for strictly backup drive for example. This was a few years back so hardware and the OS may have changed.

February 6, 2007 9:40 AM
 

Linchi Shea said:

Since we use SAN exclusively, there is not much we can do with RAID stripe size. Conceptually, NTFS allocation unit size should not have a huge impact on SQL Server performance except when additional space is being allocated to a database. Once space is allocated, SQL Server manages the space and NTFS allocation unit should not matter. But as with most other things in storage, I have learned to keep a completely open mind. There could be factors unknown to me that change the whole eqution and render my conceptual reasoning totally invalid.

Linchi

February 7, 2007 10:43 AM
 

Scott R. said:

I enjoyed reading your article and test results.  I especially appreciate your empirical evidence to substantiate what is largely hearsay evidence in other articles (“… up to 20% improvement with alignment”, “… as much as 30% improvement with alignment”, etc. – but no documented proof).  I can buy into the value of disk volume alignment, based on the descriptions of the phenomena and proposed solutions.  I have an easier time selling it to colleagues with the evidence and conditions you and others presented above.

The tests in your article use unaligned partition / volume offsets of 63.5 KB and 64.5 KB, but do not use the most common unaligned partition / volume offset: the default starting offset of 63 sectors (32,256 bytes or 31.5 KB).  Any interest in retesting to include that most common unaligned partition / volume offset size?

You didn't mention the storage frame vendor / model used in the tests, possibly to avoid the appearance of recommending a brand or model.  However, it would be useful to know the vendor-recommended alignment value (I assume the stated 64 KB alignment value is vendor recommended for the storage frame model used in the tests, and not drawn from other conclusions) and characteristics that determine the chosen alignment value (logical track size, stripe size, other storage frame caching issues, etc.), which is often mentioned in the vendor recommendation for disk volume alignment value.

I have observed that different storage vendors and storage frame models have different recommended alignment values - some at 32 KB, some at 64 KB, some higher or dependent on other factors (like configurable stripe size).  These vendor recommendations appear to be stated as minimum values ("not less than...") and most often as binary-multiple values (32 KB, 64 KB, etc.), based on the assumption that storage frame caching and logical track lengths are often managed as binary-multiple sized values.  Variation of alignment values between storage frame models can make the choice of an appropriate alignment value more challenging, and possibly result in choosing inappropriate values.  I have recently chosen to use a large-enough binary-multiple value of 1 MB (1,048,576 bytes or 2,048 sector offset), which also happens to be an aligned value for all smaller binary-multiple values such as 512 KB, 256 KB, 128 KB, 64 KB, 32 KB, etc.  This single alignment value may serve to simplify the disk volume alignment process across multiple storage frame models (possibly with different vendor recommendations).  Any comments on this approach, or interest in retesting with 1 MB alignment?

Keep up the good work!

February 21, 2007 10:27 PM
 

Linchi Shea said:

Scott;

Thanks for the comments! I have actually tested the 63K offset and various offsets around 32K . I chose to exclude the results to limit the size of this blog entry. I wanted each blog entry to address a specific topic that would not require a lengthy and convoluted discussion.

But briefly, the 63K offset didn't perform as well as the 64K offset, so the recommendation of aligning on the 64K offset holds. The results for the various offsets around 32K were less clear cut, though in most cases the 32K offset was among the offsets that produced better results. It's just that I could not categorically conclude that the 32K offset was the way to go.

It's good news that the results of the offsets around 32K do not materially impact the main point of this blog entry, which is to present some data points in support of the 64K offset. No other offsets I have tested produced significantly better results. Well, I have not tested the 1MB offset. To be honest, an offset of that size was never considered when I was thinking about testing the performance impact of various offsets.

Yes, I deliberately chose not to mention the vendor or the SAN model on which the tests were performed. And yes, the vendor recommended alignment offset was 64K. True, it may have added value to identify them, and I wish I could. But then even if I identified the vendor and the SAN model, I could not possibly include all the configuration details in this little blog. And these configuration details often matter. So regardless, it remains that you should perform your own tests on your own system and use whatever I have reported here only as a reference.

February 22, 2007 10:21 PM
 

Ian Posner said:

My experience has been that 64k offset with 64k cluster size and large stripes of 256KB works best providing you've got a controller with a large amount of battery-backed write-back cache. Also note that there is an undocumented sql server switch (/E) that allocates 4 x 64k offsets to a single file in filegroup before switching to another -- therefore allowing you to lay out 4 contiguous extents on a 256KB stripe.

March 16, 2007 6:03 AM
 

McTavish said:

Hi guys, I’m seriously out of my depth here and just researching why Microsoft decided on a 1MB – 2048 sector – offset for the first partition created on a hard drive by Vista. They say it’s for future large-sector hard drive support, but would it need to be 1MB just for that? http://support.microsoft.com/kb/923332

Scott R.s comment that 1MB “happens to be an aligned value for all smaller binary-multiple values”  has pushed me towards thinking that there is more to the 1MB decision than large-sector support and perhaps it has do with the subject being discussed here. A couple of replies in a TechNet chat session suggests it is.

“We have also changed the default alignment for partitions on disks so that they are cache aligned”

http://blogs.technet.com/filecab/pages/448274.aspx

“This change results in better performance, by aligning the volume offset with the cache line size.”

I’ve been goggling for days and can’t find anything substantial from Microsoft. Only two other KB pages mention the new offset, but are no help.

http://support.microsoft.com/kb/931854

http://support.microsoft.com/kb/931854

One Microsoft forum thread on the subject:

http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=994065&SiteID=17

The replies from Hale Landis in this forum thread are the most interesting.

http://forums.storagereview.net/index.php?showtopic=25416

Cheers

McTavish.

May 26, 2007 12:50 PM
 

Impedance Mismatch said:

Prima di installare SQL Server è bene verificare che l'allineamento dei settori del proprio storage

July 17, 2007 10:47 AM
 

Linchi Shea said:

In SQL Server related storage literature, it is almost univerally recommended that disk partitions be

July 19, 2007 1:52 PM
 

Roust_m said:

Hi guys,

I could find a number of papers telling how to align a partition, but could not find a simple way of checking if the partitions are aligned on an existing server.  Can you guys help?

Thanks.

July 19, 2007 7:38 PM
 

Neil Hambly said:

Depends on the windows OS Version

Use DiskPar or Diskpart to determine the offsets of the Partitions

I for example would on a W2K3 OS use diskpart and use the align = 64 option to configure the Raid partition..

July 24, 2007 7:09 AM
 

Scott R. said:

Responding to Roust_m's question and Neil Hambly's reply regarding how to determine the offset for current partitions (to see if they are aligned or not, and to what degree - 32 KB, 64 KB, etc.):

-  Neil's reply that DiskPar or DiskPart can be used is correct with conditions.  DiskPar shows the exact offfset in place for a given partition (shown in bytes, I believe).  But be aware that the answer from DiskPart is not always correct.  It appears that DiskPart shows the offset for a partition in KB, but the value is rounded up to the nearest KB value.  Since offsets are expressed in whole sectors (512 bytes or 0.5 KB), it is possible to have an actual offset value at n.5 KB but DiskPart will incorrectly show the offset value as N+1 KB.  The best example of this phenomenon is with an unaligned partition - 63 sector offset / 32,256 bytes / 31.5 KB is displayed by DiskPart as 32 KB.  This display implies alignment to 32 KB or smaller multiples, but is incorrect.

-  I find a more reliable utility to use for displaying alignment offsets for current partitions is DiskExt (free from Sysinternals - http://download.sysinternals.com/Files/DiskExt.zip).  It will show you the correct offset in bytes, with no rounding or misrepresentation.  The added benefit of DiskExt is that you request the display for all drives (no parameter) or a specific drive by drive letter.  The display from DiskExt is shown with the drive letter for each requested drive and it's corresponding alignment offset.  I find this easier than having to find the Windows disk # for a given drive letter.

-  Also note that DiskExt works for mounted volumes as well as drive letter volumes.  DiskPar and DiskPart may also work with mounted volumes (after you manually translate the mounted volume path to a Windows disk # - as with drive letters).

Scott R.

December 19, 2007 5:49 PM
 

Mark Sullivan said:

We run an HP EVA 8000 SAN.  The EVA has its own mechanism for dealing with the 64k offset.  The EVA best practices guide states not to use it.

February 13, 2008 1:23 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement