THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Kevin Kline

Understanding SQLIOSIM Output

One of the things that’s great about my job is the fact that I get to meet people who are a lot smarter than me.  One such person I count among my friends is Jimmy May, a Microsoftee from the Indianapolis area.  Jimmy has been using SQLIOSIM for a while, but has discovered that its reporting leaves a bit to be desired.  Jimmy and I were discussing the reporting characteristics of SQLIOSIM and he thought it’d be useful to share his findings with the public.  Thanks for sharing, Jimmy!

 

~~~~~

 

One thing which threw me off at first is the all-important “Target I/O” parameter.  As I understand it—be clear that I’m paraphrasing here—SQLIOSim tries to hammer the system trying to get to that threshold.  In fact, a well-tuned system with lots of capacity will resist high values.  Note my comment below loosely comparing it to disk latency.

 

 

SQLIOSim Sources

KB: http://support.microsoft.com/kb/231619/en-us

SQL Server Storage Engine blog: http://blogs.msdn.com/sqlserverstorageengine/archive/2006/10/06/SQLIOSim-available-for-download.aspx

PSS SQL Server Engineers:  http://blogs.msdn.com/psssql/archive/2007/04/10/sqliosim-outstanding-i-o-warnings.aspx

 

SQLIOSim Amalgamation of Undocumented Info

Characterization of Output:

Amalgamated from Bob Dorr & George Reynya

 

Target IO Duration (ms)

A threshold--SQLIOSim tries to hammer the system yet tries to stay under that threshold

If the average I/O duration exceeds the target I/O duration, the SQLIOSim utility throttles the number of outstanding I/O operations to decrease the load and to improve I/O completion time.

Pasted from <http://support.microsoft.com/kb/231619>

 

Running Average IO Duration (ms)

Low values are good

[Bob Dorr] Characterized as "Response time for the stress test"

[Bob Dorr] Target IO is 100ms; since ideal I/O is 4-8ms, anything higher is not a preferred outcome

[JWM] Though strictly speaking this isn't "disk latency" (Avg. Disk sec/[Read|Write|Transfer], acceptable  values are similar:

Data files:  0-5ms or so are excellent

Log files:  0-2ms are acceptable

Warnings for I/Os exceeding 15s (à la SP4 diags)

Algorithm is designed to mitigate big swings, esp. because of one very large I/O (similar to SQL Server Checkpoint logic)

Used to control the number of outstanding I/O requests

 

IO request blocks

High numbers are good

The more requests a system can handle, the more I/Os are issued by the tool

Characterized as "Concurrent IOs"

Capped by MaxIO; not uncommon to have 100 or more, esp. for Enterprise Edition for which read-aheads & such are attempted

 

Accumulators:  Reads, Scatter Reads, Writes, Gather Writes

High numbers are good

Simple Accumulators

Each time an operation is posted, the category for the specific request type is incremented

 

Total IO Time (ms)

Desirable numbers are relative

Useful for comparing drives, especially on the same system. 

If the values for the Accumulators are similar for two drives, but one drive has significantly less Total IO Time,

then the latter is more performant because it was able to service the same I/O in less time.

Characterized as "Accumulated total duration time for the I/O in ms"

The same duration is used to track the "Running Average IO Duration (ms)"

Sum of accumulators * running average = ~Running Average IO Duration

 

Number of times IO throttled

Low numbers are better

Number of times a request was removed because the duration was exceeded

 

Additional Documentation

Additional information will be based on demand & will be added to KB article & Wiki

 

Architecture

SQLIOSim is built with lightweight UMS/SQLOS-like layer to better simulate SQL Server, incl. logical CPUs

Output

Output Example #1:

Data

********** Final Summary for file S:\sqliosim.mdx **********

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 17, Number of times IO throttled = 9006, IO request blocks = 33

Reads = 75414, Scatter Reads = 88286, Writes = 3295, Gather Writes = 54845, Total IO Time (ms) = 44936446

********** Final Summary for file T:\sqliosim.mdx **********

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 0, Number of times IO throttled = 4135, IO request blocks = 106

Reads = 67091, Scatter Reads = 81249, Writes = 2501, Gather Writes = 53652, Total IO Time (ms) = 13033268

Interpretation

[George Reynya] The running average of 17ms for S: does not look all that bad as a target as it means that over the life of the test you maintained 17ms response time for the stress test.  Your test shows that the number of concurrent IOs has dropped to 33 which is not so good.  Especially when you compare the same basic work load against the T: drive--0ms w/ 106 I/O requests.   I would look at the S drive IO path a bit to determine the differences.

[JWM] Note:  The trial lasted 300s.  The interpretation is consistent with physical layout:  Disks were SAS DAS 73GB 15K RAID 10; S: was comprised of 6 disks, T: of 8 disks.

[JWM] Note:  George's comment regarding 17ms vs. 33ms was relative; in fact, a neither value would not be acceptable to me—I’m shooting for <5ms for my data drives.

Output Example #2:

Data

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 121, Number of times IO throttled = 1167, IO request blocks = 13

Interpretation

[George Reynya]  Customer disk system exceeded Target IO Duration by 21ms for this file and had to throttle 1167 times in an attempt to keep the IO within the target.    With the number of IOs we do for a file this is a bit concerning.  We tried to throttle several times and they could not keep the average below the 100ms target.

Output Example #3:

Data

04/27/07 14:24:55 ********** Final Summary for file M:\sqliosim.mdx ********** 

04/27/07 14:24:55 Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 15, Number of times IO throttled = 382, IO request blocks = 108

04/27/07 14:24:55 Reads = 187508, Scatter Reads = 259268, Writes = 15286, Gather Writes = 130498, Total IO Time (ms) = 5460993

04/27/07 14:24:55 DRIVE LEVEL: Sector size = 512, Cylinders = 163177, Media type = 12, Sectors per track = 63, Tracks per Cylinders = 255

04/27/07 14:24:55 DRIVE LEVEL: Read cache enabled = Yes, Write cache enabled = No

04/27/07 14:24:55 DRIVE LEVEL: Read count = 886653, Read time = 3333820, Write count = 616951, Write time = 6606954, Idle time = 170616, Bytes read = 74679347200, Bytes written = 74403336192, Split IO Count = 19, Storage number = 4, Storage manager name = FTDISK 

04/27/07 14:24:55 ********** Final Summary for file M:\sqliosim2.mdx **********

04/27/07 14:24:55 File Attributes: Compression = No, Encryption = No, Sparse = No

04/27/07 14:24:55 Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 16, Number of times IO throttled = 762, IO request blocks = 44

04/27/07 14:24:55 Reads = 183061, Scatter Reads = 256624, Writes = 15038, Gather Writes = 132071, Total IO Time (ms) = 5890665

04/27/07 14:24:55 DRIVE LEVEL: Sector size = 512, Cylinders = 163177, Media type = 12, Sectors per track = 63, Tracks per Cylinders = 255

04/27/07 14:24:55 DRIVE LEVEL: Read cache enabled = Yes, Write cache enabled = No

04/27/07 14:24:55 DRIVE LEVEL: Read count = 886717, Read time = 3333859, Write count = 616967, Write time = 6606962, Idle time = 170630, Bytes read = 74679609344, Bytes written = 74403405824, Split IO Count = 19, Storage number = 4, Storage manager name = FTDISK 

04/27/07 14:24:55 ********** Final Summary for file M:\sqliosim.ldx **********

04/27/07 14:24:55 File Attributes: Compression = No, Encryption = No, Sparse = No

04/27/07 14:24:55 Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 0, Number of times IO throttled = 0, IO request blocks = 8

04/27/07 14:24:55 Reads = 0, Scatter Reads = 0, Writes = 316141, Gather Writes = 0, Total IO Time (ms) = 1346919

04/27/07 14:24:55 DRIVE LEVEL: Read cache enabled = Yes, Write cache enabled = No

04/27/07 14:24:55 DRIVE LEVEL: Read count = 886653, Read time = 3333820, Write count = 616962, Write time = 6606959, Idle time = 170623, Bytes read = 74679347200, Bytes written = 74403381248, Split IO Count = 19, Storage number = 4, Storage manager name = FTDISK 

Stress Test Attempt Complete

Interpretation

[JWM]  This test was done against what was thought by the customer to be a well-architected disk I/O subsystem (EMC Clariion CX700, lots of 146GB 15K disks, RAID 5 (RAID5--when will  they learn?)).  The system is intended to support a system from which very high performance is required.

The results of this test belie this assumption.  I would want Average IO Duration to be less than 5ms, no more than 10ms; however both data files have values exceeding this:  15ms & 16ms, respectively.

The IO Request Blocks aren't nearly as high as I'd expect, especially compared to the 8 disk RAID 10 drive T: in Output Example #1.

The far less respectable values for M:\sqliosim2.mdx compared to the first file will prompt further investigation.

Note also that I/O was throttled 382 & 762 times, respectively.

The Average IO Duration for the log file is excellent.  I don't have enough experience interpreting output for log files, so I don't know whether to be concerned about the log number of IO Request Blocks.

 

 


Published Thursday, June 28, 2007 11:30 AM by KKline
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Jason Haley said:

June 29, 2007 9:40 AM
 

Linchi Shea said:

For stress testing, it is fine to use SQLIOSIM to see whether the disk subsystem may crap out under load. But here we seem to be talking about getting some type of I/O benchmarking data out of the SQLIOSIM output. For I/O benchmarks, wouldn't it be better to use a tool such as IOMeter or sqlio which allow you to control the I/O parameters better? By I/O parameters, I mean reads vs. writes, sequential vs. random, block size, queue depth, and so on.

June 30, 2007 11:16 AM
 

Jimmy May said:

I agree, Linchi.  Kevin & I corresponded several times before he posted this.  Unfortunately, the following statement did not make it into the posted version:

SQLIOSim:  What it's for—& what it's not for

SQLIOSim is a correctness and stress tool, not a performance measurement tool.  Use SQLIOSim to verify your I/O subsystem is functioning correctly under heavy loads.  If you want to measure throughput use the SQLIO utility.

-- SQL Server Storage Engine blog

July 9, 2007 7:32 PM
 

KKline said:

Great feedback, guys!  Thanks,

-Kev

July 13, 2007 1:09 PM
 

PSS SQL Server Engineers said:

The following is from a lengthy conversation I had with Kevin Kline.&#160; I believe he will be posting

November 12, 2008 12:31 PM
 

How It Works: SQLIOSim - Running Average, Target Duration, Discarded Buffers … | MS Tech News said:

November 12, 2008 1:13 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About KKline

Kevin Kline is Technical Strategy Manager for the SQL Server business unit at Quest Software. Kevin was the original architect and dev manager for many of Quest's SQL Server tools. Prior to Quest, Kevin worked as an enterprise DBA for a variety of large corporations and government agencies.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement