THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Performance impact: file fragmentation and SAN – Part IV

Lies, damned lies, and statistics!

 

If you have read my three previous posts (1, 2, 3), you may walk away with an impression that on a drive presented from a high-end enterprise class disk array, Windows file fragmentation does not have a significant performance impact. And I’ve given you empirical data—oh yeah, statistics—to support that impression.

 

But that is not the whole story! No, I didn’t lie to you. The numbers I presented were solid. It’s just that the story is not yet finished.

 

In these previous posts on the performance impact of file fragmentation, I presented the I/O throughput data as the evidence. The arguments were valid, especially we did see file fragmentation causing the I/O throughput to degrade in a directly attached storage. But I/O throughput is but one I/O performance metric, and it is not enough to look at the I/O throughput alone. 

 

Let me start with an analogy. So suppose we have an eight-lane super highway going from New York City to Los Angles. As we pumping (okay, driving) more cars from NYC to LA, we take measure at a checkpoint in LA to find out how many cars are crossing that checkpoint every hour, i.e. we are measuring the throughput of the super highway. Now, instead of building the eight-lane super highway straight from NYC to LA, we have it take a detour via Boston. At that same checkpoint in LA, we again measure the throughput. Everything else being equal, we should get the same throughput.

 

However, for a given car, the trip from NYC to LA would take a lot longer on this detoured highway.

 

An I/O path is similar to a super highway. While its throughput is an important measure, how long it takes for an I/O request to complete—I/O latency or response time—is also an important measure. The question is, will file fragmentation take your I/O traffic for a detour?

 

Indeed, empirical test data show that when a file is severely fragmented, the maximum I/O latency of large sequential reads and writes (e.g. 256KB reads and writes) can suffer significantly. The following chart shows the impact of file fragmentation on the maximum I/O latency. The data were obtained from the same tests whose throughputs were reported in Part III of this series of posts.

Clearly, when the test file was fragmented into numerous 128KB disconnected pieces, some of the 256KB reads suffered terribly latency degradation. And if your applications happen to be issuing these I/Os, you would most likely experience a performance degradation regardless whether the I/O path can maintain the same I/O throughput.

 

Having some valid statistics may seem to add force to an argument, which makes it so much easier to be misleading if the whole story is not told, and technically everything is valid, and nobody is lying. By the way, this is a trick often employed by the vendors, who tend to conveniently ignore the bad news, or intentionally bury it with statistics on the good news. In my book, that would be lies, damned lies, and statistics.

 

Published Monday, December 22, 2008 12:34 PM by Linchi Shea

Attachment(s): max_latency.gif

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Critic said:

There's a flaw with the analogy in the article. The way bandwidth is measured for I/O, the round trip is the metric that is most important. So in the highway sample that would be the time for a pool of cars to do a round trip from NYC-LA-NYC and detour at Boston will have a huge impact on the final number.

December 22, 2008 1:25 PM
 

Linchi Shea said:

If you are measuring th time of a round trip, you are measuring the response time (or latency). And if that's what you are measuring, the detour would have a huge impact. I don't think we are in disagreement.

December 22, 2008 2:24 PM
 

Jason Massie said:

From the DBA's perspective, it would be really hard to create a data file or log with lots of 128k or 2MB fragments. However, the default autogrow setting for a database created through the GUI is 1MB. If you had to guess, how would that behave on an enterprise SAN?

December 22, 2008 3:39 PM
 

Linchi Shea said:

128K fragments are a bit extreme and were chosen on purpose to help explore the boundary conditions. If I had to guess with the help of the results from 128K fragments and 2MB fragments, I'd say you probably will see very little degradation, if any. I have more results to post on this topic, and hopefully will help make it clearer.

December 22, 2008 4:00 PM
 

James Luetkehoelter said:

A great post Linchi...I think that file fragmentation presents potential issues in a number of counters. Great analogy of throughput as well.

December 22, 2008 8:04 PM
 

Grnpy Old DBA said:

Most san vendors will tell you that file fragmentation does not matter on a san. It's very easy to get fragmentation because if you load a number of databases onto a drive whereby you need over 50% of the capacity eventually as you create and load the databases they will fragment as the files will not positione themselves sequentially. Add database growth and attempts to defrag the drive and very soon you get fragmented mdf files. With lots of free space this doesn't happen, but most san vendors aim to get 80% or better utilisation, so you never get to fragment your files. Proving there is an performance hit is another matter, I always like your posts because you seem to get to do so much most of us probably can't - keep up the good work.

December 23, 2008 8:17 AM
 

Linchi Shea said:

Thanks Grnpy Old DBA! I should have highlighted that the latency numbers I have in the chart are maximum latency in each test. The average latency numbers were actually about the same in each of the three test scenarios.

The other thing is that on a drive presented from a SAN (disk array) is not a drive in terms of the traditional disk geometry. I don't believe the usage (i.e. how full it is) has as much an impact on performance as you would see on a traditional disk drive.

December 23, 2008 11:37 AM
 

Bernd Eckenfels said:

You have been measuring in-cache transactions in the tests before, so why does it affect you so badly, still?

BTW: the bigger problem with your tests is that you are a single appliction connected to the Storage Array and fit in cache. If you add multiple other applications you only get a fraction of the cache and also increase the time you have to wait for a seak (opposed to read aheads).

Gruss

Bernd

December 23, 2008 1:45 PM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement