THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Performance Impact of Disk Misalignment

Just google for Windows disk alignment best practice, and you would find thousands of articles, whitepapers, and posts, all preaching the practice of aligning disk partitions on the 64K boundary. For instance, one of the EMC recommendations prescribes a disk alignment value of 64K for the host file systems when deploying SQL Server 2005. Microsoft states that, "misalignment can defeat system optimization of I/O operations designed to avoid crossing track boundaries."

But I must admit that I wasn't sure how much of it was based on solid first-hand and current data and how much of it was due to the sheer number of times it was recommended, giving it a life in itself. Perhaps, since everybody else was recommending it, it had to be true--one of those best practice folklore, urban legends, or myths. My attempt to scour the web for related empirical evidence always seemed to have come up empty handed. I'd be happy if it turns out that my failure to find empirical evidence was simply a direct result of my inadequate Google search skill. Regardless, I like to see some real data points.

That was the motivation, and here's the test design.

Disk I/O Test Tool sqlio.exe (from Microsoft)
Performance Measure I/Os per second
Tool to Set Partition Boundary diskpar.exe  (from Microsoft)
Test File Size 80GB
Test Drive The test file was on a 100GB LUN presented from a SAN
I/O Block Size 8KB
I/O Types Random Reads and Random Writes
I/O Queue Depths 4
Threads to Issue I/Os 1, 2, 4, 8, 16, 20, 24, 28, 32, 36
Disk Partition Offsets 63.5KB (misaligned),
64KB (recommended alignment),
64.5KB (misaligned)

The choice of focusing on 8K random reads/writes was partly arbitrary, and partly due to the difficulty of referencing multiple charts in a blog post at this site. More importantly, as will become evident, because the result of the 8K random reads/writes is enough for us to see the wisdom of the recommendation of aligning disk partitions on the 64K boundary, including additional test scenarios would have a rather diminished margin of return.

The following two charts summarize the test results.

The first chart is a bit difficult to decipher. For 8K random reads, aligning the partition on the 64K boundary didn't produce the best result. This remained consistent in multiple runs of the same tests. Misalignment on the 63.5K boundary, on the other hand, was the worst performer, though it is debatable whether the performance difference among the three boundary settings is really significant.

The second chart shows the results of 8K random writes. Unlike in the case of 8K random reads, the performance difference is no mistake in the second chart, especially between the 64K alignment and the 63.5K alignment (or misalignment). The former outperformed the latter by a whopping ~34%. Also note that the 64K alignment was consistently the best performer for 8K random writes.

Thus, based on these data points alone, it is advisable to align your disk partitions on the 64K boundary.

Published Thursday, February 01, 2007 8:04 AM by Linchi Shea

Attachment(s): diskAlign.gif

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Jonathan B. said:

Linchi,

Thank you so much for going through this case scenario to put a data-point based in reality out there on the net.  Like you said, there is scant hard evidence out there on this topic which makes it hard to know if it is worth going through the hassle of doing.

On a related topic, I wonder if you have thoughts on the proper usage of sqlio and what a user should expect in their results.  I am personally having trouble making heads or tails of the results I am getting.  I can try different configs to get the best value for my system, but that still doesn't help me know if I am in the ballpark of where I should be with my hardware.

For example, if I try sqlio on a RAID0 array with 5 drives, I should get much better results than on a single drive, right?  But then again, that difference depends if I am doing sequential or random access, right?  Anyhow, I've used tools like HD Tach and h2benchw which give me great lower-level profiles of my hardware, but those doesn't seem to relate too well with what I get from sqlio.

I hope my frustration sparks someone to post a nice blog on sqlio and how to use it other than just part of the routine.

Thanks Again!

FYI, here's a thread that has another person's results comparing use of track-aligned partitions and its impact on sqlio benchmarking.  It is not quite as detailed, but it provides some additional support for aligning partitions to the beginning of the track.

http://www.sql-server-performance.com/forum/topic.asp?TOPIC_ID=10495

February 2, 2007 5:07 PM
 

Linchi Shea said:

Jonathan;

Good that you raised the issues of using sqlio.exe and conducting storage I/O tests in general. I happen to have a few more blog entries planned on the very subject. Interpreting sqlio.exe results is really no issue. The key is how you design your controlled disk I/O tests for the results to be meaningful. I'll leave the details for the upcoming blogs.

Linchi

February 2, 2007 8:10 PM
 

Chuck Boyce said:

Good stuff, man!

February 3, 2007 12:14 PM
 

Jonathan B. said:

While you're at it, what about NTFS cluster sizes and its impact on performance?  What about matching the RAID stripe size to the NTFS cluster size or some multiple thereof?  Do these have any predictable impact on performance from a theorectical perspective?

I've read around that changing the stripe size to a bigger value or a lesser value, depending on the usage profile, would increase performance.  However, from my low-level testing with h2benchw, I've found that DELL's PERC4 and the PowerVault 220S with Seagate's 10K.7 drives do best with a 64K stripe.  This was true even for extreme tests that should have favored other stripe sizes (both smaller and larger).  The point I'm making here is that one should note that optimal RAID stripe size is very heavily dependant on the controller's optimal configuration and not necessarily based on the usage profile.

So, any messing around with stripe size should be weighed against hard measuring of the capabilities of ones own hardware.  h2benchw makes this determination relatively easy; at least it did in my case.  However, that type of performance test only does so from a pre-partition level with no clusters involved.  Hopefully I can make similar judgments from the OS filesystem (post partition) level using test like you provided here using sqlio.exe.

Thanks,

Jonathan

February 5, 2007 10:12 AM
 

JasonM said:

I tested many different combinations with SQLIO looking for the best combination. 64k nt allocation unit, 128k offset and 64k cluster(HP smart array setting) was the best overall setting in my tests. 10%+ gain on sequential writes(log traffic) I was focused on finding the best setting for mixed IO patterns so there may be better settings for strictly backup drive for example. This was a few years back so hardware and the OS may have changed.

February 6, 2007 9:40 AM
 

Linchi Shea said:

Since we use SAN exclusively, there is not much we can do with RAID stripe size. Conceptually, NTFS allocation unit size should not have a huge impact on SQL Server performance except when additional space is being allocated to a database. Once space is allocated, SQL Server manages the space and NTFS allocation unit should not matter. But as with most other things in storage, I have learned to keep a completely open mind. There could be factors unknown to me that change the whole eqution and render my conceptual reasoning totally invalid.

Linchi

February 7, 2007 10:43 AM
 

Scott R. said:

I enjoyed reading your article and test results.  I especially appreciate your empirical evidence to substantiate what is largely hearsay evidence in other articles (“… up to 20% improvement with alignment”, “… as much as 30% improvement with alignment”, etc. – but no documented proof).  I can buy into the value of disk volume alignment, based on the descriptions of the phenomena and proposed solutions.  I have an easier time selling it to colleagues with the evidence and conditions you and others presented above.

The tests in your article use unaligned partition / volume offsets of 63.5 KB and 64.5 KB, but do not use the most common unaligned partition / volume offset: the default starting offset of 63 sectors (32,256 bytes or 31.5 KB).  Any interest in retesting to include that most common unaligned partition / volume offset size?

You didn't mention the storage frame vendor / model used in the tests, possibly to avoid the appearance of recommending a brand or model.  However, it would be useful to know the vendor-recommended alignment value (I assume the stated 64 KB alignment value is vendor recommended for the storage frame model used in the tests, and not drawn from other conclusions) and characteristics that determine the chosen alignment value (logical track size, stripe size, other storage frame caching issues, etc.), which is often mentioned in the vendor recommendation for disk volume alignment value.

I have observed that different storage vendors and storage frame models have different recommended alignment values - some at 32 KB, some at 64 KB, some higher or dependent on other factors (like configurable stripe size).  These vendor recommendations appear to be stated as minimum values ("not less than...") and most often as binary-multiple values (32 KB, 64 KB, etc.), based on the assumption that storage frame caching and logical track lengths are often managed as binary-multiple sized values.  Variation of alignment values between storage frame models can make the choice of an appropriate alignment value more challenging, and possibly result in choosing inappropriate values.  I have recently chosen to use a large-enough binary-multiple value of 1 MB (1,048,576 bytes or 2,048 sector offset), which also happens to be an aligned value for all smaller binary-multiple values such as 512 KB, 256 KB, 128 KB, 64 KB, 32 KB, etc.  This single alignment value may serve to simplify the disk volume alignment process across multiple storage frame models (possibly with different vendor recommendations).  Any comments on this approach, or interest in retesting with 1 MB alignment?

Keep up the good work!

February 21, 2007 10:27 PM
 

Linchi Shea said:

Scott;

Thanks for the comments! I have actually tested the 63K offset and various offsets around 32K . I chose to exclude the results to limit the size of this blog entry. I wanted each blog entry to address a specific topic that would not require a lengthy and convoluted discussion.

But briefly, the 63K offset didn't perform as well as the 64K offset, so the recommendation of aligning on the 64K offset holds. The results for the various offsets around 32K were less clear cut, though in most cases the 32K offset was among the offsets that produced better results. It's just that I could not categorically conclude that the 32K offset was the way to go.

It's good news that the results of the offsets around 32K do not materially impact the main point of this blog entry, which is to present some data points in support of the 64K offset. No other offsets I have tested produced significantly better results. Well, I have not tested the 1MB offset. To be honest, an offset of that size was never considered when I was thinking about testing the performance impact of various offsets.

Yes, I deliberately chose not to mention the vendor or the SAN model on which the tests were performed. And yes, the vendor recommended alignment offset was 64K. True, it may have added value to identify them, and I wish I could. But then even if I identified the vendor and the SAN model, I could not possibly include all the configuration details in this little blog. And these configuration details often matter. So regardless, it remains that you should perform your own tests on your own system and use whatever I have reported here only as a reference.

February 22, 2007 10:21 PM
 

Ian Posner said:

My experience has been that 64k offset with 64k cluster size and large stripes of 256KB works best providing you've got a controller with a large amount of battery-backed write-back cache. Also note that there is an undocumented sql server switch (/E) that allocates 4 x 64k offsets to a single file in filegroup before switching to another -- therefore allowing you to lay out 4 contiguous extents on a 256KB stripe.

March 16, 2007 6:03 AM
 

McTavish said:

Hi guys, I’m seriously out of my depth here and just researching why Microsoft decided on a 1MB – 2048 sector – offset for the first partition created on a hard drive by Vista. They say it’s for future large-sector hard drive support, but would it need to be 1MB just for that? http://support.microsoft.com/kb/923332

Scott R.s comment that 1MB “happens to be an aligned value for all smaller binary-multiple values”  has pushed me towards thinking that there is more to the 1MB decision than large-sector support and perhaps it has do with the subject being discussed here. A couple of replies in a TechNet chat session suggests it is.

“We have also changed the default alignment for partitions on disks so that they are cache aligned”

http://blogs.technet.com/filecab/pages/448274.aspx

“This change results in better performance, by aligning the volume offset with the cache line size.”

I’ve been goggling for days and can’t find anything substantial from Microsoft. Only two other KB pages mention the new offset, but are no help.

http://support.microsoft.com/kb/931854

http://support.microsoft.com/kb/931854

One Microsoft forum thread on the subject:

http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=994065&SiteID=17

The replies from Hale Landis in this forum thread are the most interesting.

http://forums.storagereview.net/index.php?showtopic=25416

Cheers

McTavish.

May 26, 2007 12:50 PM
 

Linchi Shea said:

In SQL Server related storage literature, it is almost univerally recommended that disk partitions be

July 19, 2007 1:52 PM
 

Roust_m said:

Hi guys,

I could find a number of papers telling how to align a partition, but could not find a simple way of checking if the partitions are aligned on an existing server.  Can you guys help?

Thanks.

July 19, 2007 7:38 PM
 

Neil Hambly said:

Depends on the windows OS Version

Use DiskPar or Diskpart to determine the offsets of the Partitions

I for example would on a W2K3 OS use diskpart and use the align = 64 option to configure the Raid partition..

July 24, 2007 7:09 AM
 

Scott R. said:

Responding to Roust_m's question and Neil Hambly's reply regarding how to determine the offset for current partitions (to see if they are aligned or not, and to what degree - 32 KB, 64 KB, etc.):

-  Neil's reply that DiskPar or DiskPart can be used is correct with conditions.  DiskPar shows the exact offfset in place for a given partition (shown in bytes, I believe).  But be aware that the answer from DiskPart is not always correct.  It appears that DiskPart shows the offset for a partition in KB, but the value is rounded up to the nearest KB value.  Since offsets are expressed in whole sectors (512 bytes or 0.5 KB), it is possible to have an actual offset value at n.5 KB but DiskPart will incorrectly show the offset value as N+1 KB.  The best example of this phenomenon is with an unaligned partition - 63 sector offset / 32,256 bytes / 31.5 KB is displayed by DiskPart as 32 KB.  This display implies alignment to 32 KB or smaller multiples, but is incorrect.

-  I find a more reliable utility to use for displaying alignment offsets for current partitions is DiskExt (free from Sysinternals - http://download.sysinternals.com/Files/DiskExt.zip).  It will show you the correct offset in bytes, with no rounding or misrepresentation.  The added benefit of DiskExt is that you request the display for all drives (no parameter) or a specific drive by drive letter.  The display from DiskExt is shown with the drive letter for each requested drive and it's corresponding alignment offset.  I find this easier than having to find the Windows disk # for a given drive letter.

-  Also note that DiskExt works for mounted volumes as well as drive letter volumes.  DiskPar and DiskPart may also work with mounted volumes (after you manually translate the mounted volume path to a Windows disk # - as with drive letters).

Scott R.

December 19, 2007 5:49 PM
 

Mark Sullivan said:

We run an HP EVA 8000 SAN.  The EVA has its own mechanism for dealing with the 64k offset.  The EVA best practices guide states not to use it.

February 13, 2008 1:23 PM
 

Sql Servers said:

I was also going to mention about NTFS cluster sizes?

September 27, 2008 12:07 PM
 

Kevin Kline said:

Kevin talks about the little known but very important issue of disk alignment partitioning.

October 8, 2008 4:01 PM
 

Kevin Kline said:

Kevin talks about the little known but very important issue of disk alignment partitioning.

October 8, 2008 11:13 PM
 

aspiringgeek said:

Nice job, Linchi!  I've referred a lot of my customers to this post. Some of our customers in *real-life* testing are seeing 30% - 40% improvements.  Until Windows Server 2003 is  deprecated & all existing partitions are re-built, partition alignment will remains a relevant technology.

To see a comprehensive characterization of disk partition alignment--including explicit implementation details--see the following:

http://blogs.msdn.com/jimmymay/archive/2008/10/14/disk-partition-alignment-for-sql-server-slide-deck.aspx

Keep up the great work, Linchi.

October 20, 2008 1:41 PM
 

aspiringgeek said:

@Roust_m:  I'm sympathetic. When I first dived into the rabbit hole that is the black art of disk optimization & discovered disk partition alignment, not only could I not find what you seek, I even had trouble finding reliable implementation details.  See my deck for the answers you seek (the URL is in the previous comment).  Beware--it's a different ballgame for basic disks (well-documented) vs. dynamic disks (the truth is out there, yet elusive).  I'll be posting more about dynamic disks.  Stay tuned.

@Mark Sullivan: I am concerned about HP's claims.  I don't yet have definitive information.  Here's what I do know:  HP states that starting with the 5.xxx versions of StorageWorks Enterprise Virtual Array (EVA) XCS Controller Software, the need for partition alignment is eliminated, adding that explicit alignment “neither enhances nor detracts from EVA sequential performance”.  Ref: HP StorageWorks 4x00/6x00/8x00 Enterprise Virtual Array Configuration Best Practices white paper http://h71028.www7.hp.com/ERC/downloads/4AA0-2787ENW.pdf, p. 27, August 2007.

HP’s claims are intriguing, but corroborating data is lacking or in dispute.  The statement explicitly cites sequential IOPs, but fails to address random IOPs--optimal performance of which is important for OLTP databases.  I have a quote from an internal resource which I'm not authorized to share which lends credence to my claim.  I'll let you know if I get an opportunity to do tests.

@McTavish:  The reason they 1MB was chosen as the starting partition offset for Windows Server 2008 is that there are two important correlations which must be satisfied for optimal performance:

 Partition_Offset ÷ Stripe_Unit_Size

 Stripe_Unit_Size ÷ File_Allocation_Unit_Size

Of the two, the first is by far the most important.  Windows can't reliably determine stripe unit size, but 1MB aligns with all current & putative future stripe unit sizes, i.e., 64KB, 128KB, 256KB, 512KB, 1024KB.  For this reason, the 1MB starting partition offset default for Windows Server 2008 strikes me as an excellent choice.

October 20, 2008 3:45 PM
 

Linchi Shea said:

There were discussions on disk misalignment on this site. See my previous post on “ Performance Impact

November 24, 2008 7:01 PM
 

Smitha R said:

Hi Linchi,

Thanks for your tests - they provide direction to less experienced folks.  I'm testing IO performance for a new server whose data and log files will be placed on a SAN.  I ran SQLIO tests after the SAN folks set it up and then again after performing disk alignment partitioning (recreated the partitions using a 64K offset and also formatted the partitions using 64K allocation units).  This time around, I find that sequential reads and writes improved (8K, 64K and 128 K).  Random 8K writes don't show much difference.  Random 8K reads also improved.  But 64K random reads now show decreased performance.  Any idea why?  Thanks!

December 5, 2008 10:51 AM
 

kidsgame said:

thanks

December 14, 2008 3:19 PM
 

Linchi Shea said:

Smitha R;

In the case of 64K random reads, what's the decrease in performance?

December 22, 2008 1:17 PM
 

granit said:

Thank You...

March 14, 2009 2:42 PM
 

Ask Sohbet Chat Sevgi said:

Thank you...

May 10, 2009 2:56 PM
 

Sohbet said:

hallo i wish you verry  succes operator

May 12, 2009 7:57 PM
 

Edencity Chat said:

thank you admin

May 22, 2009 8:54 PM
 

Sohbet said:

hallo i wish you verry  succes operator

June 2, 2009 7:00 PM
 

xyvyx said:

I was testing alignment on an existing 4-disk RAID10 volume and ran into some odd performance numbers when expanding the array to use 6 disks.  This was on a EMC CX3 SAN using 146GB drives. The first time around, using the 64k offset worked fine.  When we rebuilt the array using 6 disks instead, running the identical SQLIO tests showed performance worse/equivalent to the original un-aligned tests.  When I changed this to use MS's reccomended 1024k offset, the numbers came back into the expected range (better than my prior 4-disk aligned tests)

October 5, 2009 10:40 AM
 

Balaji said:

Hi Linchi,

I am looking for usage based costing/reporting tool on SQL Server?

Can you please suggest any tools/articles/whitepapers?

Thanks a lot,

BalajiPalekar@yahoo.com

December 27, 2009 11:24 PM
 

Dizi said:

ASP.NET Tip - Use The Label Control Correctly [Via: Haacked ] Exiting The Zone of Pain - Static Analysis...

January 9, 2010 12:42 PM
 

NM said:

I have a SQL cluster where all volumes are accessed via mount points. The volumes are correctly offsetted however the mount points are not - problem?

January 11, 2010 6:06 AM
 

redresor said:

thankx web site administrator

April 1, 2010 6:04 PM
 

dizi izle said:

thanks for this useful article.

April 22, 2010 7:08 AM
 

William said:

Great blog Linchi. @Ian Posner So you're saying that even if you have a stripe size of 256KB, and a cluster size of 64KB, it will not waste disk space?

Any thoughts on this Linchi?

July 2, 2011 2:58 PM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement