THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Performance Impact: file fragmentation and SAN – Part III

256KB Sequential Reads

 

In my two previous posts (1, 2), I highlighted the fact that while file fragmentation had a huge adverse performance impact on directly attached storage (DAS), it did not have much, if any, impact on the drive presented from a high end enterprise class disk array. That observation was derived from running disk I/O tests with 1KB sequential writes.

 

What about other disk I/O workloads? In this post, let look at the test results from running 256KB sequential reads on the same DAS and SAN drives.

 

To see the behavior in even more extreme, I also ran the tests with the test file fragmented into 60,000 fragments. Each fragment was 128KB in size. So I checked three test scenarios:

  1. The 10GB file was created on a freshly formatted empty drive, thus without any fragmentation at all.
  2. The freshly formatted drive was first filled to the full capacity with 2MB files, and then some of these 2MB files were randomly deleted to make sufficient room for the 10GB test file. In this case, the 10GB test file was fragmented into more than 3000 non-contiguous fragments.
  3. The freshly formatted drive was first filled to the full capacity with 128KB files, and then some of these 128KB files were randomly deleted to make sufficient room for the 10GB test file. In this case the 10GB test file was fragmented into more than 60,000 non-contiguous fragments.

 

The following chart shows the results of many repeated tests, applying 256KB sequential reads at various load levels.

Again and clearly, as is the case with the 1KB sequential write tests, severe file fragmentation had no impact on the disk I/O performance of 256KB sequential reads on this drive presented from a high end enterprise class fibre channel disk array.

 

Note that the disk array had a cache that was much larger than 10GB. You may say, that’s cheating and not a fair comparison with a directly attached storage. Well, maybe, but that’s how high end disk arrays work.

Published Wednesday, December 10, 2008 12:55 PM by Linchi Shea

Attachment(s): 256KBReads.gif

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Brent Ozar said:

You're cheating!  :-)

About the cache - was this test the only thing running on the SAN at the time?  I wouldn't want to use that as a good test result if it was, since it would have all of the cache and throughput just dedicated to that one test.  Would it be fair to put the SAN under load, like it would be under typical production conditions, or reduce the amount of cache available?

It's like saying one person has all the space they need if they take a commuter train.  Well, sure they do, until the other people get on board....

December 10, 2008 12:29 PM
 

Linchi Shea said:

Brent;

These tests were repeated numerous times at different times in hope of randomizing out the uneven load on the disk array. That is really as much as I could do. At one time, I had a chance to get the SAN engineer to restrict the cache size, but I didn't have a chance to repeat this particular test.

But it's a fact that these disk arrays have a lot of cache. If it helps, it helps.

A more interesting question is, on a lower end disk array with much less cache, would I see similar behavior?

December 10, 2008 12:48 PM
 

Rudyx said:

Interesting ... however as is related to SQL server a more realistic result could possible be achieved by still using the 128/256 kb chunks of data initially written. Then reads of 64 kb with writes of 8 kb ocurring. Performing this test on an unfragmented 10 Gb file and then on a fragmented 10 Gb file. After all, SQL reads extents (64 kb) and writes pages (1 kb). The parameters explained pertain more directly to SQL server and before and after would give you the apples to apples comparison to rtuly determine accurate results.

December 13, 2008 11:05 PM
 

Erik Nielsen said:

I am not very surprised but this is the first time I have seen a documentation. Please write something about CPU use on server and Windows space used for anministrating a file with 60.000 extents

December 14, 2008 7:05 AM
 

Greg Wilkerson said:

Intersting post.  This is pretty misleading, though.  How can you say that SAN file fragmentation has no affect on performance when, by your own admission, the drives never get touched (the data is residing in cache)? True, SAN have huge stores of cache. The rubber hits the road when the cache HAS to be flushed to make room for more data.  Only then will you experience the impact of file fragmentation.  The cache sizes present in SANs make this type of comparison very diffcult to conduct.  If you were to restrict the SAN cache to the same size as the direct attached cache, you could make a proper comparison.  Your article is about file fragmentation and performance, not "How File Fragmentation can be Migitated by Huge Controller Caches"

December 14, 2008 10:05 AM
 

Linchi Shea said:

Greg;

> not "How File Fragmentation can be Migitated by Huge Controller Caches"

There is no controller cache. The cache is inside the disk array. Note that some disk arrays have more cache than others. They are not all equal. I'm not trying to paint anything with a broad stroke. As mentioned, the results here are only pertinent to the system under test. You can get in trouble trying to extrapolate too much. On the other hand, restricting disk array cache artificially may not be a good test either, because it is what it is, and the cache is always there.

It's a valid argument regarding the scenario where the disk array is loaded from different hosts and the cache is constrained.

Also note that cache is only one of the reasons we may see much less impact of file fragmentation.

In the end, the story is more complex than this post may suggests. I'll address that in a future post.

December 14, 2008 11:43 AM
 

Linchi Shea said:

Lies, damned lies, and statistics! If you have read my three previous posts ( 1 , 2 , 3 ), you may walk

December 22, 2008 11:40 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement