THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

The Rambling DBA: Jonathan Kehayias

The random ramblings and rantings of frazzled SQL Server DBA

Parsing SQLIO Output to Excel Charts using Regex in PowerShell

Today Joe Webb (Blog|Twitter) blogged about The Power of Regex in Powershell, and in his post he shows how to parse the SQL Server Error Log for events of interest.  At the end of his blog post Joe asked about other places where Regular Expressions have been useful in PowerShell so I thought I’d blog my script for parsing SQLIO output using Regex in PowerShell, to populate an Excel worksheet and build charts based on the results automatically.

If you’ve never used SQLIO, Brent Ozar (Blog|Twitter) has a article on SQLServerPedia titled SAN Performance Tuning with SQLIO that includes a lot of information as well as a video tutorial on its use.  I have been using SQLIO quite a bit this year, mainly to gather performance information for systems before and after reconfiguring their storage to show percent impact information to the business for making changes in storage like migrating from RAID 5 to 10, and aligning disk partitions used for database and log file storage.  I have also used it in benchmarking new SAN configurations and the SAN configuration of our Virtual Machine environment to show team members why I have recommended that a number of our physical servers be migrated onto Virtual Machines.

I generally use a param.txt file similar to:

G:\MSSQL\testfile1.dat 4 0x0 32768

which creates a 32GB test file and then have a DOS batch file that runs the actual SQLIO tests using a script like:

sqlio -kW -t16 -s10 -o8 -fsequential -b8 -BH -LS -Fparam.txt

sqlio -kW -t8 -s360 -o8 -frandom -b8 -BH -LS G:\MSSQL\testfile1.dat > post.txt
sqlio -kW -t8 -s360 -o8 -frandom -b32 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -frandom -b64 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -frandom -b128 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -frandom -b256 -BH -LS G:\MSSQL\testfile1.dat >> post.txt

sqlio -kR -t8 -s360 -o8 -frandom -b8 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -frandom -b32 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -frandom -b64 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -frandom -b128 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -frandom -b256 -BH -LS G:\MSSQL\testfile1.dat >> post.txt

sqlio -kW -t8 -s360 -o8 -fsequential -b8 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -fsequential -b32 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -fsequential -b64 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -fsequential -b128 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -fsequential -b256 -BH -LS G:\MSSQL\testfile1.dat >> post.txt

sqlio -kR -t8 -s360 -o8 -fsequential -b8 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -fsequential -b32 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -fsequential -b64 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -fsequential -b128 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -fsequential -b256 -BH -LS G:\MSSQL\testfile1.dat >> post.txt

Which runs the gamit of tests using 8, 32, 64, 128, and 256K block sizes, with 16 threads, and 8 pending IO.  You can certainly expand the test set much further than this and do a very exhaustive testing regimen, I am by no means saying that my tests above are the best fit, but they have sufficiently met my needs for the tuning I have been doing.  Once the tests have been run the post.txt file will contain an output entry similar the following for every test that was run.

sqlio v1.5.SG
using system counter for latency timings, 14318180 counts per second
16 threads writing for 60 secs to file d:\MSSQL\testfile1.dat
    using 8KB random IOs
    enabling multiple I/Os per thread with 8 outstanding
    buffering set to use hardware disk cache (but not file cache)
using current size: 2048 MB for file: d:\MSSQL\testfile1.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 18527.91
MBs/sec:   144.74
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 6
Max_Latency(ms): 4900
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 78  6  8  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2

Linchi Shea (Blog) wrote a blog post three years ago that showed how to Parse the sqlio.exe Output using Perl.  I found this in my search for a smarter way of parsing these long output files I was generating other than manually, but I don’t have Perl installed anywhere and I really felt that I should be able to do this natively using the tools readily available to me on my Windows 7 laptop, so I looked to PowerShell for a solution. 

Reading the file data in is accomplished using Get-Content which returns a collection of strings.  At first I tried casting this to a [string] object directly, but soon found out that when I did this the resulting string was missing the carriage returns and line feeds that existed in the original file.  This posed a couple of problems for me in parsing the outputs initially, but doing a [string]::Join operation using [Environment]::NewLine as the concatenater allowed an appropriately CRLF delimited string to be returned. 

$filedata = [string]::Join([Environment]::NewLine,(Get-Content $FileName))

Now that I had my file data loaded into a the $filedate object, the next step was to split this string based on the “sqlio v1.5.SG” header that is output at the beginning of each test run.  Since PowerShell is .NET based, this is easily accomplished by using the Split() method of the System.String object which is System.Type for the $filedata object.

$Results = $filedata.Split( [String[]]"sqlio v1.5.SG", [StringSplitOptions]::RemoveEmptyEntries )

At first I started to write a foreach loop with this object, lets face it old habits die hard, but instead I made use of piping which allows a collection like the one returned by the Split() method to be worked on in a single operation.  Piping in PowerShell is similar to a manufacturing process where an object gets transformed and then passed on until the ultimate result is achieved.  Once an object is piped, it becomes accessible through the use of the $_ variable name, allowing the object to be transformed along the pipeline.  By piping the result of the Split() I was able to write a SELECT statement that utilized Regex to get the information of interest out of each test run.

     select @{Name="Threads"; Expression={[int]([regex]::Match($_, "(\d+)?\sthreads\s(reading|writing)").Groups[1].Value)}},`
                @{Name="Operation"; Expression={switch ([regex]::Match($_, "(\d+)?\sthreads\s(reading|writing)").Groups[2].Value)
                                                {
                                                    "reading" {"Read"}
                                                    "writing" {"Write"}
                                                }    }},`
                @{Name="Duration"; Expression={[int]([regex]::Match($_, "for\s(\d+)?\ssecs").Groups[1].Value)}},`
                @{Name="IOSize"; Expression={[int]([regex]::Match($_, "\tusing\s(\d+)?KB\s(sequential|random)").Groups[1].Value)}},`
                @{Name="IOType"; Expression={switch ([regex]::Match($_, "\tusing\s(\d+)?KB\s(sequential|random)").Groups[2].Value)
                                                {
                                                    "random" {"Random"}
                                                    "sequential" {"Sequential"}
                                                }  }},`
                @{Name="PendingIO"; Expression={[int]([regex]::Match($_, "with\s(\d+)?\soutstanding").Groups[1].Value)}},`
                @{Name="FileSize"; Expression={[int]([regex]::Match($_, "\s(\d+)?\sMB\sfor\sfile").Groups[1].Value)}},`
                @{Name="IOPS"; Expression={[decimal]([regex]::Match($_, "IOs\/sec\:\s+(\d+\.\d+)?").Groups[1].Value)}},`
                @{Name="MBs_Sec"; Expression={[decimal]([regex]::Match($_, "MBs\/sec\:\s+(\d+\.\d+)?").Groups[1].Value)}},`
                @{Name="MinLat_ms"; Expression={[int]([regex]::Match($_, "Min.{0,}?\:\s(\d+)?").Groups[1].Value)}},`
                @{Name="AvgLat_ms"; Expression={[int]([regex]::Match($_, "Avg.{0,}?\:\s(\d+)?").Groups[1].Value)}},`
                @{Name="MaxLat_ms"; Expression={[int]([regex]::Match($_, "Max.{0,}?\:\s(\d+)?").Groups[1].Value)}}`

The select returns a new type of object that has Properties named Threads, Operation, Duration, IOSize, IOType, PendingIO, FileSize, IOPS, MBs_sec, MinLat_ms, AvgLat_ms, and MaxLat_ms.  This alone is sufficient to proceed to creating an Excel file for the information, but I wanted the information sorted in the same manner every time I ran this, primarily because I change how I run my tests sometimes, and I like consistency in the output.  Since I am so anal retentive thorough, I fixed this problem by piping the output from the select to the Sort-Object commandlet and sorted by the IOSize, IOType, Operation, and Threads.

Getting the data into Excel was actually fairly simple to do, thanks to Buck Woody(Blog|Twitter) and SQL Rockstar Tom LaRock(Blog|Twitter).  Buck wrote a blog post titled Create an Excel Graph of your Big Tables – With Powershell, and Tom wrote a Simple Talk Article, Monitoring SQL Server Virtual Log File Fragmentation, that use the Interop Excel objects to create an Excel workbook from PowerShell, populate the workbook with data, and build charts using the data.  My code for the Excel integration is based entirely on their examples.

$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $true
$WorkBook = $Excel.WorkBooks.Add()
$WorkBook.WorkSheets.Item(1).Name = "RawData"
$WorkBook.WorkSheets.Item(3).Delete()
$WorkBook.WorkSheets.Item(2).Delete()
$WorkSheet = $WorkBook.WorkSheets.Item("RawData")
$x = 2
$WorkSheet.Cells.Item(1,1) = "Threads"
$WorkSheet.Cells.Item(1,2) = "Operation"
$WorkSheet.Cells.Item(1,3) = "Duration"
$WorkSheet.Cells.Item(1,4) = "IOSize"
$WorkSheet.Cells.Item(1,5) = "IOType"
$WorkSheet.Cells.Item(1,6) = "PendingIO"
$WorkSheet.Cells.Item(1,7) = "FileSize"
$WorkSheet.Cells.Item(1,8) = "IOPS"
$WorkSheet.Cells.Item(1,9) = "MBs/Sec"
$WorkSheet.Cells.Item(1,10) = "Min_Lat(ms)"
$WorkSheet.Cells.Item(1,11) = "Avg_Lat(ms)"
$WorkSheet.Cells.Item(1,12) = "Max_Lat(ms)"
$WorkSheet.Cells.Item(1,13) = "Caption"

$Results | % {
    $WorkSheet.Cells.Item($x,1) = $_.Threads
    $WorkSheet.Cells.Item($x,2) = $_.Operation
    $WorkSheet.Cells.Item($x,3) = $_.Duration
    $WorkSheet.Cells.Item($x,4) = $_.IOSize
    $WorkSheet.Cells.Item($x,5) = $_.IOType
    $WorkSheet.Cells.Item($x,6) = $_.PendingIO
    $WorkSheet.Cells.Item($x,7) = $_.FileSize
    $WorkSheet.Cells.Item($x,8) = $_.IOPS
    $WorkSheet.Cells.Item($x,9) = $_.MBs_Sec
    $WorkSheet.Cells.Item($x,10) = $_.MinLat_ms
    $WorkSheet.Cells.Item($x,11) = $_.AvgLat_ms
    $WorkSheet.Cells.Item($x,12) = $_.MaxLat_ms
    $WorkSheet.Cells.Item($x,13) = [string]$_.IOSize + "KB " + [string]$_.IOType + " " + `
                                [string]$_.Operation + " " + [string]$_.Threads + `
                                " Threads " + [string]$_.PendingIO + " pending"
    $x++}

$WorkBook.Charts.Add() | Out-Null
$Chart = $WorkBook.ActiveChart
$Chart.SetSourceData($WorkSheet.Range("H1:H$x"))
$Chart.SeriesCollection(1).xValues = $WorkSheet.Range("M2:M$x")
$Chart.SetSourceData($WorkSheet.Range("H1:H$x"))
$Chart.SeriesCollection(1).xValues = $WorkSheet.Range("M2:M$x")
$Chart.Name = "IOPS"

$WorkBook.Charts.Add() | Out-Null
$WorkBook.ActiveChart.SetSourceData($WorkSheet.Range("I1:I$x"))
$Chart = $WorkBook.ActiveChart
$Chart.SeriesCollection(1).xValues = $WorkSheet.Range("M2:M$x")
$Chart.Name = "MBs Sec"

I don’t know why, but the first Chart doesn’t refresh properly unless I call SetSourceData() and set the SeriesCollection xValues properties twice.  Since the code block that loads the data into the worksheet tracks the rows with the $x variable, the Range for the chart is always row 1 to $x.  The H column happens to be the column that holds the IOPS values and the I column holds the MBs Sec values in the workbook.  The only thing missing at this point a parameter declaration at the head of the script file to require an input parameter $FileName which is passed to Get-Content to read the information.

param(    [Parameter(Mandatory=$TRUE)]
        [ValidateNotNullOrEmpty()]
        [string]
        $FileName)

The end result is a set of charts showing your IOPS and MBs Sec results from the test run.

image

image

The complete script is attached to this blog post.  If you use it let me know your thoughts.

Published Tuesday, May 25, 2010 10:12 PM by Jonathan Kehayias
Filed under:

Attachment(s): SQLIOResults.zip

Comments

 

James Luetkehoelter said:

Nice!

May 26, 2010 4:11 PM
 

kameshwara_a said:

good one

June 11, 2010 6:48 AM
 

Noel said:

Thanks Jonathan. This makes the work of deciphering the results a LOT easier. very much appreciated.

July 26, 2010 3:28 PM
 

shishok said:

useful

August 5, 2010 3:04 AM
 

Manu said:

Hi Jonathan...Thanks for making parsing SQLIO task easier...I am bit new to powershell... and can't get this powershell script to run properly..Can you please include few instructions to run this script as well (possibly with screenshots)...Thanks and much appreciate your efforts

August 22, 2010 8:22 PM
 

Theo Ekelmans said:

Thx for sharing :)

September 28, 2010 8:12 AM
 

Frank said:

Great!

Thank you for sharing :)

November 3, 2010 2:24 AM
 

Alex Platonov said:

Thank you so much Jonathan. I am working at a client where I needed to prove to some skeptics that RAID10 aligned with 64K cluster size is better than a misaligned 3 spindle RAID 5 array with default 4K cluster size. The DBA they have on staff was asking me to "prove" that my "opinion" is correct. Having your tool handy to parse SQLIO and display results in a graph that anybody can understand saved me a day of work.

November 11, 2010 10:01 AM
 

Henry Lee said:

Oh my, how incredibly useful! Thank you so much for posting this code!!

November 12, 2010 1:47 PM
 

Marc said:

Hi,

Once has me stumped: where does this "using current size: 2048 MB" come from ?

If I run it on my desktop it shows "using current size: 8 MB " whereas when I run it on my servers it shows : "using current size: 24576 MB "

Any idea ?

December 7, 2010 8:10 AM
 

Rob said:

Has anyone gotten this to run successfully?  I have the .ps1 and results.txt in the same folder, but I am always getting an error:

The term 'Parse-SQLIOResults.ps1' is not recognized as a cmdlet, function, oper

able program, or script file. Verify the term and try again.

At line:1 char:51

+ $result = ls e_test.txt | %{Parse-SQLIOResults.ps1  <<<< -file $_}

PS C:\Documents and Settings\pc1>

December 15, 2010 11:24 AM
 

Rob said:

How do you get this to work? I have the results text file from SQLIO and the SQLIOResults.ps1 file in the same directory, but I am not sure what I need to input from the powershell command-line to get it to run properly.  Any suggestions?

December 15, 2010 1:49 PM
 

Jonathan Kehayias said:

I load the ps1 file in PowerGui, and click the Execute button.  It pops up and asks for the FileName, and then it opens Excel and builds the results and charts.  I am not sure what you are trying to do with your script exactly.  If you want to run it command line just do:

powershell -Noninteractive -Noprofile -File ".\Parse-SQLIOResults.ps1" -FileName "sqlioresults.txt"

December 15, 2010 2:37 PM
Anonymous comments are disabled

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement