People often rely on rules of thumb to tell whether there is a problem. DBAs are no exception. For instance, we are told that if the I/O latency (i.e. Avg. Disk sec/Read or Avg. Disk sec/Write) is greater 20 milliseconds, it's bad. In addition, if the current disk queue length is significant, that adds to the evidence of the I/O subsystem being stressed.
The following is a snapshot of several perfmon counters on a working system. The values are not transient, but represent the counter values for a sustained time period. And the data file of the database is on the G drive.
Workload A:
| LogicalDisk |
G: |
| Avg. Disk sec/Read |
0.039 |
| Avg. Disksec/Write |
0.000 |
| Current Disk Queue Length |
18 |
| Disk Read Bytes/sec |
3471785.878 |
| Disk Reads/sec |
423.802 |
| Disk Write Bytes/sec |
11468.854 |
| Disk Writes/sec |
1.400 |
| |
|
| SQLServer:Wait Statistics |
Waits started per second |
| Log write waits |
1429 |
| Page IO latch waits |
2002 |
Workload B:
| LogicalDisk |
G: |
| Avg. Disk sec/Read |
0.049 |
| Avg. Disksec/Write |
0.006 |
| Current Disk Queue Length |
42 |
| Disk Read Bytes/sec |
6319289.797 |
| Disk Reads/sec |
770.998 |
| Disk Write Bytes/sec |
221183.335 |
| Disk Writes/sec |
27.000 |
| |
|
| SQLServer:Wait Statistics |
Waits started per second |
| Log write waits |
2420 |
| Page IO latch waits |
3062 |
Workload C:
| LogicalDisk |
G: |
| Avg. Disk sec/Read |
0.114 |
| Avg. Disksec/Write |
0.083 |
| Current Disk Queue Length |
188 |
| Disk Read Bytes/sec |
1226183.614 |
| Disk Reads/sec |
149.406 |
| Disk Write Bytes/sec |
589826.358 |
| Disk Writes/sec |
69.400 |
| |
|
| SQLServer:Wait Statistics |
Waits started per second |
| Log write waits |
5869 |
| Page IO latch waits |
7368 |
They all appear to suggest that the I/O subsystem is over stressed, though some workloads are obviously heavier than others. If we apply the I/O latency rule of thumb mentioned at the beginning of this post, our I/O subsystem is indeed in a pretty bad shape. The latencies of 39ms, 49ms, and 114ms are excessive for 8K reads. But do we have any real problem here? Well, you shouldn't rely on these perfmon metrics alone to answer that question. Rather, whether or not you have a real problem depends on whether or not you are getting enough of your business transactions done (i.e. the transaction throughput) or get each of your business transaction done fast enough (i.e. the transaction response time).
It turns out that the above three workloads correspond to the levels of load when the SQL Server database is subject to 20 users, 50 users, and 200 users with a business transaction performance profile shown in the following charts.
As far as the business transactions are concerned, Workload A represents relatively light load level, Workload C represents the workload that has saturated the SQL Server instance, and Workload B is somewhere in between. Notice Workload A in particular. Even though its read I/O latency is rather significant in general, the database still has a lot of room for more load and for higher transaction throughput while the transaction response time is well within the acceptable range.
The simple point is that when you are examining the SQL Server performance using perfmon counters and wait stats, don't forget to check the performance metrics of the business transactions, especially their throughput and response time, whichever is important to the business.