THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Louis Davidson

Read/Write Ratio versus Read/Write Ratio?

So, Jason Massie (http://twitter.com/statisticsio) had a blog post (http://statisticsio.com/Home/tabid/36/articleType/ArticleView/articleId/324/Is-8020-a-90rsquos-Estimate.aspx) that Kendal Van Dyke (http://twitter.com/sqldba) alerted me to at SQL Saturday in Atlanta .  It does this by taking the number times an index was used for a read operation versus update operations using sys.dm_db_index_usage_stats.  Coincidentally, I was working on a chapter in a book about DMVs and working on a section on this very subject.  Initially I did my calculations solely it in a much different manner altogether just thinking about the amount of data moved to and from the physical file using sys.dm_io_virtual_file_stats.

After reading his post, I added not only a look using the index usage stats DMV, but also using sys.dm_io_virtual_file_stats as well.  My findings on my server was actually pretty interesting.  I found that while in operations my read/write ratio could be 50/50 at times, when looking at magnitude of data moved in and out of the server the ratio could be 95/5 in favor of data read.  This was clearly a clue that something isn’t quite right with our indexing or our data utilization (this is a very active server, mostly with batch/webservice OLTP operations through CRM!)

In the book, I do more breakdown of drives/objects, etc, but at a high level, I find that this is a pretty interesting comparison to look at how the read/write ratio.  For this blog entry I am just including the the following three queries, which I did go back and change to exclude log files based on some of the very interesting discussion in the comments.

Note too that sys.dm_io_virtual_file_stats could include things like backups and sys.dm_db_index_usage_stats could include maintenance too so it is highly expected that to get the most useful numbers that you employ some technique to capture numbers periodically and you exclude certain times of day when you might be doing non-standard work.  What this means to you may be different to every company.  For example, you may only want to know the read/write characteristics during steady working hours, and not overnight when nightly processes are being run.  Or you might be tuning the backup window and doing backups are you main concern.  These three queries could open your mind to lots of different ideas of how to use these DMV objects to tune your system.

--uses a like comparison to only include databases you desire
DECLARE @databaseName SYSNAME, @excludeLogFilesFlag bit
SET @databaseName = '%' --'%' gives all databases
SET @excludeLogFilesFlag = 1 --excludes
 log files by default because they would not be considered in the sys.dm_db_index_usage_stats numbers

SELECT 'Ratio bases on magnitude of data - sys.dm_io_virtual_file_stats'
SELECT  CAST(SUM(num_of_bytes_read) AS DECIMAL)
        / ( CAST(SUM(num_of_bytes_written) AS DECIMAL)
            + CAST(SUM(num_of_bytes_read) AS DECIMAL) )
                                   As RatioOfReads,
        CAST(SUM(num_of_bytes_written) AS DECIMAL)
        / ( CAST(SUM(num_of_bytes_written) AS DECIMAL)
            + CAST(SUM(num_of_bytes_read) AS DECIMAL) )
                                   AS RatioOfWrites,
        SUM(num_of_bytes_read) as TotalBytesRead,
         SUM(num_of_bytes_written) as TotalBytesWritten
FROM    sys.dm_io_virtual_file_stats(NULL, NULL) AS divfs
         join sys.master_files mf
            on mf.database_id = divfs.database_id
               and mf.file_id = divfs.file_id
WHERE   DB_NAME(divfs.database_id) LIKE @databaseName
  and   (mf.type_desc <> 'LOG' or @excludeLogFilesFlag = 0)

SELECT 'Ratio bases on numbers of operations-sys.dm_io_virtual_file_stats'
SELECT  CAST(SUM(num_of_reads) AS DECIMAL)
        / ( CAST(SUM(num_of_writes) AS DECIMAL)
            + CAST(SUM(num_of_reads) AS DECIMAL) )
                                   As RatioOfReads,

        CAST(SUM(num_of_writes) AS DECIMAL)
        / ( CAST(SUM(num_of_reads) AS DECIMAL)
            + CAST(SUM(num_of_writes) AS DECIMAL) )
                                   AS RatioOfWrites,
        SUM(num_of_reads) as TotalReadOperations,
         SUM(num_of_writes) as TotalWriteOperations
FROM    sys.dm_io_virtual_file_stats(NULL, NULL) AS divfs
         join sys.master_files mf
            on mf.database_id = divfs.database_id
               and mf.file_id = divfs.file_id
WHERE   DB_NAME(divfs.database_id) LIKE @databaseName
  and   (mf.type_desc <> 'LOG' or @excludeLogFilesFlag = 0)

SELECT 'Ratio bases on numbers of operations - sys.dm_db_index_usage_stats'
SELECT  case when (SUM(user_updates + user_seeks
                    + user_scans + user_lookups) = 0)
               then NULL
             else (
        CAST(SUM(user_seeks + user_scans + user_lookups)
                                               AS DECIMAL)
        / CAST(SUM(user_updates + user_seeks
                    + user_scans + user_lookups)
                                              AS DECIMAL)
                  ) end
                                           AS RatioOfReads,

        case when (SUM(user_updates + user_seeks
                    + user_scans + user_lookups) = 0)
                 then NULL
             else (

        CAST(SUM(user_updates) AS DECIMAL)
        / CAST(SUM(user_updates + user_seeks
                   + user_scans + user_lookups) AS DECIMAL)
                  ) end
                                          AS RatioOfWrites,
       SUM(user_updates + user_seeks
                    + user_scans + user_lookups) as
                                       TotalReadOperations,
       SUM(user_updates) as TotalWriteOperations
FROM    sys.dm_db_index_usage_stats AS ddius
WHERE   DB_NAME(database_id) LIKE @databaseName

Published Saturday, June 20, 2009 2:44 PM by drsql

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

jchang said:

the read/write ratio should only apply to data, on the assumption we are interested in RAID5 vs 10 impact.

Log writes are sequential so RAID 5 is actually better, assuming logs are on its own disk.

Also, the problem with using dm_io_virtual_file_stats is that it is hard to exclude backups,

a perfmon log is better, or take a differential reading for business hours

June 20, 2009 9:39 PM
 

drsql said:

Definite thanks for the comment.  Backups are not something I had really thought about with this DMV.

Definitely an extremely valid point. In the book I am careful to make note on most DMV's that since they restart only at reboot that you need to be careful to take readings at various points of time and correlate them to processes that might elevate the counts for various reasons, backups being one of them.

I do feel it is interesting to see the difference between the two values because it can show that 1 read operation can require a gigabyte of data from the physical disk, which can have horrible effects to you performance.

Plus, if you are doing backups during actual business hours, it could be interesting in any case.

I have been one in all of my writing/speaking/quacking that I am NOT a hardware expert and I refuse to even try to keep up with how much things change. Being able to interrogate the system to get a feel for what the system is doing in any rate is certainly a good thing.

June 21, 2009 3:23 PM
 

Linchi Shea said:

> Log writes are sequential so RAID 5 is actually better, assuming logs are on its own disk.

Can you expand on this, Joe?

June 21, 2009 6:46 PM
 

jchang said:

1. my recollection is that the maxtransfersize limit in the backup command is 4M, but most people never change the default of 1M.

2. you could interogate the msdb database for the database backups, I recall there were values for exactly how many bytes were read (a backup must read from disk, even if the db is in memory). now you just need to figure out how long SQL Server has been up, if you know this, let me know, i need it too.

3. please make the adjustment to your script to exclude logs, just do file_id <> 2, and not worry about the people with multiple log files, or sum the log file separate from data.

I do not think it should be necessary to be a hardware expert, but it is important to know how to interpret disk numbers. other wise those san guys will tell you their mutli-million dollar san is so wonderful, it must be your problem

June 21, 2009 8:33 PM
 

drsql said:

Just to make certain, why do I want to exclude logs?  Is this to make the different approaches equivalent?  I mean, I will add that to exclude the logs as a parameter, but it seems to me that when you are looking at it from a hardware basis (which is certainly why you would use the sys.dm_io_virtual_file_stats over the index usage stats dmv), I do care about log writes as well.

If I just want to know the ratio of reads to write actions, excluding the log file would make sense for sure. I will work on changing that in the next day or two.  Thanks for the input.

June 21, 2009 9:20 PM
 

jchang said:

the reason read/write ratio is important from the storage perspective is the RAID 5 versus RAID 10 overhead. So for small (8KB for example) (kind of) random IO, in RAID 10, a write requires 2 IO, a write to each part of the mirror, while in RAID 5, a write requires reading the original data, reading the original parity, writing the new data, and writing the new parity.

This is why RAID 10 is strongly preferred for OLTP databases that exhibit lots of random write IO.

So the read/write ratio is a metric for whether RAID 10 is necessary or is RAID 5 ok.

However, the above RAID rule does not apply to large block writes. While the log writes are small, because it is sequential, the RAID controller can accumulate a bunch, and write entire stripes.

So the log writes should be excluded from read/write ratio calculations. Never mind, the log should be on its own disk, hence not part of the data: RAID 5 or 10 discussion.

Over course, however strong your arguments might be for the data being on RAID 10, the san engineer will probably ignore the silly dba and do as the san vendor taught him, make it all RAID 5. If performance is bad, it must be the apps fault, because the useless metrics he looks at says everything on the san is wonderful. of course 300ms disk latency is not a problem!

June 22, 2009 12:44 AM
 

Linchi Shea said:

Joe;

In my tests, when a workload is log write constrained, log on RAID1 consistently performed better than log on RAID5 regardless of the underlying configuration of the drive.

June 22, 2009 1:08 AM
 

Linchi Shea said:

> However, the above RAID rule does not apply to large block writes. While the log writes are small, because it is sequential, the RAID controller can accumulate a bunch, and write entire stripes.

As far as I observed in my tests, log writes on RAID1 (or 10) consitently outperformed log writes on RAID5 by a very significantly margin.

June 22, 2009 12:12 PM
 

jchang said:

thanks Linchi, I assume the RAID 5 log on a dedicated set of disks? with dedicated write cache enabled?

then the "Never mind" part applies, just make it RAID 1 or 10, but it still should be excluded from the data files R5/10 discussion.

June 22, 2009 1:32 PM
 

Linchi Shea said:

I have actually run quite many tests on this subject on drives presented from disk arrays of different types and configurations.

June 22, 2009 2:35 PM
 

Uri Dimant said:

Louis/Joe/

Can you clarify, should the below two queries return the same (almost) databases for to writes ( I know the second one does have a log files included , but even If I do include the Logs in the first one I get very very different results)

WITH cte

AS

(

SELECT

  DB_NAME(IVFS.database_id) AS db,

  CASE WHEN MF.type = 1 THEN 'log' ELSE 'data' END AS file_type,

  SUM(IVFS.num_of_bytes_read)as reads,

 SUM(IVFS.num_of_bytes_written) AS writes,

  SUM(IVFS.io_stall) AS io_stall

FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS IVFS

  JOIN sys.master_files AS MF

    ON IVFS.database_id = MF.database_id

    AND IVFS.file_id = MF.file_id

GROUP BY DB_NAME(IVFS.database_id), MF.type

) SELECT * FROM cte WHERE file_type='data'

ORDER BY writes DESC

SELECT

sd.name As DatabaseName,

CAST(SUM(user_seeks+user_scans+user_lookups) AS decimal) /

CAST(SUM(user_updates) + SUM(user_seeks+user_scans+user_lookups) AS decimal)

* 100 AS ReadPercent,

CAST(SUM(user_updates) AS decimal) /

CAST(SUM(user_updates) + SUM(user_seeks + user_scans + user_lookups) AS decimal)

* 100 AS WriteRatio

FROM sys.dm_db_index_usage_stats dbius

Join sys.databases sd

On (sd.database_id = dbius.database_id)

Group By sd.name

Order By WriteRatio  DESC

June 23, 2009 8:10 AM
 

RowlandG said:

File IO Measured.

Data exceeds what SAN delivers.

Must use DAS.

June 23, 2009 3:55 PM
 

drsql said:

Uri, this is a very good point. I think what we are looking at are two different things altogether.  The sys.dm_io_virtual_file_stats view measures writes to the file.  My assumption is that this would technically count the number of operations where the lazy writer pushed out changes to the file, even if it was just status of a row (like to mark the page as dirty).  It wouldn't be any specific operation that caused it to occur, other than either idle time or needing to free up cache space.

On the other hand sys.dm_db_index_usage_stats might not actually require that the file was written to yet (the log would have been written to) but that the table was changed. A read wouldn't necessarily be a read to the file, but it could simply be from cache.

Hmm, I think that again I have to rethink a bit about this topic, and check my text to make sure I am not claiming that the different queries are comparing apples to oranges.

June 24, 2009 10:44 PM
 

StackOverflow Update: 560M Pageviews A Month, 25 Servers, And It’s All About Performance | ??????????????? said:

September 19, 2014 3:13 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Links to my other sites

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement