Whether you are load testing SQL Server or evaluating the I/O performance of a storage subsystem, you need to make sure that you cover the entire spectrum of the load levels--from light and moderate all the way to heavy and saturated--before you draw any conclusion. This is especially important when you are doing performance comparison. Relying on a few isolated data points can easily lead you to erroneous conclusions.
How do you know your test has covered the entire spectrum of the load levels?
When you initially design your test, you would not know for sure. And you'll have to experiment a bit first. If for instance you are testing a storage subsystem, typically, you can control the load level by controlling the number of threads that issue the I/O requests or controlling the number of outstanding async I/Os (i.e. I/O queue depth). Sometimes, you may have to use a combination of both. For instance, sqlio.exe doesn't allow you to specify more than 256 threads. If you fix the queue depth at 1, the max of 256 threads may not be able to saturate a particular I/O system. In this case, you need to use a larger queue depth.
Let's say you are studying the I/O performance behavior of a storage subsystem. If you have no idea about the I/O performance of the storage subsystem, what levels of load should you start your test with? Well, there is no single answer that always works. But a reasonably good start is to probe the storage subsystem with threads 1 through 128 with each thread maintaining a small queue depth (say 2). Once you have completed the I/O test with this range of load levels, plot the results on a chart where the X-axis represents the load level (e.g. the number of threads) and the Y-axis represents the I/O performance (e.g. I/Os per second or MBs per second).
Ideally, such a chart should exhibit the following:
- As you start to add load, the I/O performance increases sharply.
- Then, when the load level reaches a certain point, the rate of the performance gain starts to drop though the performance itself still increases.
- And finally, when the load level is pushed further, the performance levels off, and adding more load only serves to increase response time (i.e. I/O latency) with no performance improvement or even causes performance degradation.
The area where the rate of the performance gain starts to diminish is the knee of the curve, which is also the area with the maximum curvature. For an illustration, let's look at three charts below. These charts show how three different I/O paths respond to 8K sequential writes (Figure 1), 8K random reads (Figure 2), and 8K sequential reads (Figure 3), respectively, under various levels of load. In all three cases, the I/O queue depth is set to 1, and the level of load is controlled with the number of threads issuing I/O requests, ranging from one thread to 40 threads.
In Figure 1, the range of the load levels is not sufficient to find the true throughput of the I/O path. As you see, the throughput in terms of MB per second is not leveling off towards the maximum load. In fact, it is safe to predict that the throughput will continue to move upwards beyond 40 threads. At best, you can state that, given the trend, the I/O path can sustain more than 70MB per second for 8K sequential writes. As it turns out, I can actually push as much as about 100MB per second through this particular I/O path when I apply even more load.
Figure 2, on the other hand, shows that the same range of the load happens to be sufficient to reveal the comprehensive performance behavior of the I/O path in response to 8K random reads. (Well, to be truly comprehensive you need to include some other performance parameters such as the I/O latency distribution) The chart shows both the knee of the curve and the trend that heavier load will not yield any more throughput. And if you have the I/O latency data, you'll see that more load only contributes to longer I/O latency. With the data points in Figure 2, it's relatively safe to conclude that the I/O path can do just slightly less than 200MB per second for 8K random reads.
Figure 3 is to highlight the point that (1) you need to have the knee of the curve, and (2) you should see that the throughput levels off. In Figure 3, you may have the data points for the knee of the curve, but the throughput is not leveling off towards to the max of the load. You'd be mistaken to conclude that the maximum throughput of the I/O path is about 140MB per second for 8K sequential reads (when in fact the I/O path can do as much as 180MB per second).