THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Merrill Aldrich

Using Historical Perf Counter Data For Storage Planning

Lately I'm faced with a fairly ambitious data center move, and at the same time with an initiative to consolidate sprawling SQL Servers onto centralized clusters. It's a chunk of work, but these two notions have fit together pretty well: as long as we're moving SQL services and touching everything, it seems to be easier to make the consolidation argument to application owners/stakeholders and get that work done at the same time. So it's looking like consolidation and relocation in the same effort. Hopefully it'll be a win squared.

But to come to the point: this collection of projects has led to the purchase of a sizeable new disk array (one chunk of the new storage is even solid state - yee haw!) and that means serious performance and capacity planning. I currently use Spotlight on SQL Server Enterprise to monitor and collect stats about our servers, and it has a nice, automated repository for historical trending baked right in.

What's that? You don't collect performance stats from your servers over time? Stop burning valuable minutes reading this silly blog and go get (or even build, if you must) a tool! Now! BinGoogle, in no particular order: Idera, Quest, SQL Sentry, Red Gate, and work from there. Sheesh!

I happen to have Spotlight available out of the whole group of monitoring products, so I thought I'd share some techniques I have used to project storage requirements from a pile of existing servers onto a new, consolidated set of clusters. The same techniques work for other calculations, like CPU use and so on, I just happen to be laser-focused on storage at the moment. The same techniques probably also work with minor variation against any tool that collects this perf counter data.

The Spotlight repository I have set up is a SQL database that ends up containing hourly performance counter samples for about 30 days. The list of available counters is quite comprehensive, but for this exercise I need the ones the SAN engineers will need to design the LUN layout on our new array. That is:

IO operations per second, per disk

Bandwidth required (quantity of data read and written per disk, rather than number of operations)

Also handy, Spotlight provides Disk Queue values for past time periods, so you can get some idea of which existing disks in your environment are busy, or maybe too busy. We're also able to cross-reference this with the performance counters from our exising SAN arrays, as a double-check.

Chunking the Spotlight Repository Data

The Spotlight repository schema is fairly easy to understand, especially if you take a peek at some of the provided reporting procs that come with the product. In essence, there's a table full of performance counter samples, and then ancillary tables that list your servers ("monitored objects") and friendly names of counters ("statistic_names," "statistic_keys"). A series of joins will provide a simple output of time, server, and counter value. I went ahead and encapsulated this in a view, so that I could use it for a variety of specific queries:

CREATE VIEW [dbo].[reportingPerfData] 
AS
SELECT
a.timecollected,
ds.datasource_name,
mo.monitored_object_name,
sc.statistic_class_name,
sn.statistic_name,
sk.statistic_key_value,
a.raw_value
FROM spotlight_perfdata a
INNER JOIN spotlight_stat_classes sc
ON a.statistic_class_id = sc.statistic_class_id
INNER JOIN spotlight_stat_names sn
ON a.statistic_name_id = sn.statistic_name_id
INNER JOIN spotlight_datasources ds
ON sc.datasource_id = ds.datasource_id
INNER JOIN spotlight_monitored_objects mo
ON a.monitored_object_id = mo.monitored_object_id
INNER JOIN spotlight_stat_keys sk
ON a.statistic_key_id = sk.statistic_key_id GO

The only tricky bit I found is that some of this is built on the EAV model (labeled rows in a generic table with a variant value column, instead of separate, clearly named columns). EAV just implies the first thing you have to write is a pivot to get the values you need into appropriate columns, cast to the right types. I wish Quest hadn't set it up this way, but it's a small peeve, and just having the data at all is great.

I'm looking for disk stats, so I have a second query that is built on the first, but limits results to disk performance, and pivots the counter values into separate columns. Again, defining this in a view facilitates reuse:

CREATE VIEW dbo.reportingdiskperfdata 
AS
SELECT
[Year],
[Month],
[Day],
daypart,
[Hour],
[Minute],
timecollected,
[server],
drive,
SUM(pctbusy) AS pctbusy,
SUM(readspersec) AS readspersec,
SUM(writespersec) AS writespersec,
SUM(mbyteswrittenpersec) AS mbyteswrittenpersec,
SUM(mbytesreadpersec) AS mbytesreadpersec,
SUM(iopersec) AS iopersec,
SUM(queuelength) AS queuelength
FROM (SELECT YEAR(timecollected) AS [Year],
MONTH(timecollected) AS [Month],
DAY(timecollected) AS [Day],
CASE
WHEN 0 <= DATEPART(HOUR,timecollected)
AND
DATEPART(HOUR,timecollected) < 6
THEN 1
WHEN 6 <= DATEPART(HOUR,timecollected)
AND
DATEPART(HOUR,timecollected) < 12
THEN 2
WHEN 12 <= DATEPART(HOUR,timecollected)
AND
DATEPART(HOUR,timecollected) < 18
THEN 3
WHEN 18 <= DATEPART(HOUR,timecollected)
AND
DATEPART(HOUR,timecollected) < 24
THEN 4
END AS daypart,
DATEPART(HOUR,timecollected) AS [Hour],
DATEPART(MINUTE,timecollected) AS [Minute],
timecollected,
monitored_object_name AS [server],
statistic_name,
CAST(statistic_key_value AS VARCHAR(100)) AS drive,
CASE
WHEN statistic_name = 'pctbusy'
THEN CAST(raw_value AS FLOAT)
ELSE 0.0
END AS pctbusy,
CASE
WHEN statistic_name = 'readspersec'
THEN CAST(raw_value AS FLOAT)
ELSE 0.0
END AS readspersec,
CASE
WHEN statistic_name = 'writespersec'
THEN CAST(raw_value AS FLOAT)
ELSE 0.0
END AS writespersec,
CASE
WHEN statistic_name = 'byteswrittenpersec'
THEN CAST(raw_value AS FLOAT) / (1024 * 1024)
ELSE 0.0
END AS mbyteswrittenpersec,
CASE
WHEN statistic_name = 'bytesreadpersec'
THEN CAST(raw_value AS FLOAT) / (1024 * 1024)
ELSE 0.0
END AS mbytesreadpersec,
CASE
WHEN statistic_name = 'iopersec'
THEN CAST(raw_value AS FLOAT)
ELSE 0.0
END AS iopersec,
CASE
WHEN statistic_name = 'queuelength'
THEN CAST(raw_value AS FLOAT)
ELSE 0.0
END AS queuelength
FROM dbo.reportingperfdata
WHERE datasource_name = 'windows'
AND statistic_class_name = 'logicaldisks'
AND statistic_name IN ('readspersec',
'pctbusy',
'writespersec','byteswrittenpersec',

'bytesreadpersec',
'iopersec','queuelength'

)
AND
timecollected >= DATEADD(DAY,-30,(SELECT MAX(timecollected)
FROM dbo.spotlight_perfdata))
) AS pivotperfvals
GROUP BY [Year],
[Month],
[Day],
daypart,
[Hour],
[Minute],
timecollected,
[server],
drive

GO

If you look closely, there are two other transformations: as the values are pivoted out into labeled columns and there is an expansion of the timecollected value into a hierarchy of year / month / date / time, and four time periods per day ("daypart," meaning early morning, morning, afternoon, evening) along the lines of an Analysis Services / BI hierarchy. This is just to facilitate reporting later with an Excel Pivot Chart.

With these two views we are almost to the point where we've made information out of our data.

From History to Plans

I am doing relocation, but also consolidation. That means that in many cases two or more disks from the performance history I have will need to be added together to form a projection of how busy a new, consolidated LUN will be, and then what performance will be required of the disk array for that LUN.

In order to get to the final numbers, I next made a small  mapping table right in the Spotlight repository database, and used it to map all the existing disks from all the existing servers onto the planned layout of our new SQL Server clusters.

CREATE TABLE dbo.diskmigrationplan ( 
existingserver VARCHAR(128),
existingdrive VARCHAR(128),
newserver VARCHAR(128),
newdrive VARCHAR(128))

INSERT INTO dbo.diskmigrationplan
(existingserver,
existingdrive,
newserver,
newdrive)
SELECT DISTINCT [server],
drive,
[server],
drive
FROM dbo.reportingdiskperfdata
WHERE drive NOT IN ('c:','d:','q:','x:')

GO

This is just a simple, four-column table that lists the existing server name, existing disk and the planned server name and planned disk. Its purpose is to allow creation of a query that can first sum the performance counters for separate disks that will be consolidated, and then average the performance of the two disks together, over time, to estimate the demand on the consolidated disk. Changing the mapping changes the plan for consolidation, so I can fine tune the old-to-new server plans.

Once I have that mapping table filled in, then edited to relate existing disks to new disks, I can get to a query that will show the projected demand in the new server environment:

-- Sum the counter values from old servers/drives to new servers/drives 
CREATE VIEW reportingdiskperfprojection
AS
SELECT
[Year],
[Month],
[Day],
daypart,
[Hour],
[Minute],
timecollected,
drive = mp.newserver + ' ' + mp.newdrive,
pctbusy = SUM(pctbusy),
readspersec = SUM(readspersec),
writespersec = SUM(writespersec),
mbyteswrittenpersec = SUM(mbyteswrittenpersec),
mbytesreadpersec = SUM(mbytesreadpersec),
iopersec = SUM(iopersec),
queuelength = SUM(queuelength)
FROM dbo.reportingdiskperfdata pd
INNER JOIN dbo.diskmigrationplan mp
ON mp.existingserver = pd.[server]
AND mp.existingdrive = pd.drive
GROUP BY [Year],
[Month],
[Day],
daypart,
[Hour],
[Minute],
timecollected,
mp.newserver + ' ' + mp.newdrive

GO

Chart it Out

I've been able to use the output from this last view in two reporting techniques. In order to make a friendly, detailed picture of the performance profile of each disk, I plugged Excel 2007's Pivot Chart straight into the view. This allows me to examine each disk and look at a graph of each counter over the whole time span, to get an idea of when the disk is busy, and how peak usage compares to average usage, etc. It's as simple as clicking the Pivot Chart button, linking the data source for the chart up to the view in SQL Server, then pulling the data columns into the standard pivot table rows / columns / values areas.

Aggregate

Lastly, while the charts showing each disk are handy, it's pretty time consuming to plow through each disk and jot down estimates. Instead, I thought a final bit of SQL work could give me an overview of target numbers for all the disks. The trick here is how to aggregate the results. I don't care so much when a disk is idle, so periods of low use just don't really matter - but they do have to be excluded. I also don't care about spikes or outliers on the high end of the spectrum. One thing I took away from the "engineering lite" classes I had in Architecture school: when you design a building air conditioning system, you don't design around the very hottest days that might happen (that'd be a waste of money). You design around the average of most of the hot days, so that the system keeps the building cool practically all the time, but you haven't wasted money with extra capacity just for the one hottest day in five years.

In order to do this, I need to take the consolidated data for each planned disk (that is, add up the perf counters for disks that will land on consolidated servers) and then compute the average and max values for the normal busy periods. With row_number() ranking, this is pretty easy to do: I have a query rank the 720 available samples of a counter for the whole month in descending order, then from that list I discard the top 5 or so to eliminate spikes or outliers, and then average together the next 100 samples. I'm sure there's a statistics rule and a term for this, like "difference of mean sum square deviation over moving time average," but here I have to confess to knowing practically nothing about statistics beyond what I put in layman's terms above :-).

The query for one such aggregation:

SELECT   rankedioperseccounters.drive, 
AVG(iopersec) AS avgiopersec,
MAX(iopersec) AS maxiopersec
FROM (SELECT timecollected,
drive,
[rank] = Row_number() OVER(PARTITION BY drive ORDER BY iopersec DESC),
iopersec
FROM dbo.reportingdiskperfprojection) AS rankedioperseccounters
WHERE [rank] BETWEEN 5 AND 105
GROUP BY drive

Then, a beastly looking query can do them all at once. This is just that same query repeated for each metric:

-- Top 101 of 720 counter values from all samples, clipping max 5 
SELECT iopersec.drive,
avgiopersec,
maxiopersec,
avgmbytesreadpersec,
maxmbytesreadpersec,
avgmbyteswrittenpersec,
maxmbyteswrittenpersec,
avgpctbusy,
maxpctbusy,
avgqueuelength,
maxqueuelength
FROM (SELECT rankedioperseccounters.drive,
AVG(iopersec) AS avgiopersec,
MAX(iopersec) AS maxiopersec
FROM (SELECT timecollected,
drive,
[rank] = Row_number()
OVER(PARTITION BY drive ORDER BY iopersec DESC),
iopersec
FROM dbo.reportingdiskperfprojection) AS rankedioperseccounters
WHERE [rank] BETWEEN 5 AND 105
GROUP BY drive
) AS iopersec
LEFT JOIN (SELECT rankedmbytesreadperseccounters.drive,
AVG(mbytesreadpersec) AS avgmbytesreadpersec,
MAX(mbytesreadpersec) AS maxmbytesreadpersec
FROM (SELECT timecollected,
drive,
[rank] = Row_number()
OVER(PARTITION BY drive ORDER BY mbytesreadpersec DESC),
mbytesreadpersec
FROM dbo.reportingdiskperfprojection) AS rankedmbytesreadperseccounters
WHERE [rank] BETWEEN 5 AND 105
GROUP BY drive
) AS mbytesreadpersec
ON iopersec.drive = mbytesreadpersec.drive
LEFT JOIN (SELECT rankedmbyteswrittenperseccounters.drive,
AVG(mbyteswrittenpersec) AS avgmbyteswrittenpersec,
MAX(mbyteswrittenpersec) AS maxmbyteswrittenpersec
FROM (SELECT timecollected,
drive,
[rank] = Row_number()
OVER(PARTITION BY drive ORDER BY mbyteswrittenpersec DESC),
mbyteswrittenpersec
FROM dbo.reportingdiskperfprojection) AS rankedmbyteswrittenperseccounters
WHERE [rank] BETWEEN 5 AND 105
GROUP BY drive
) AS mbyteswrittenpersec
ON iopersec.drive = mbyteswrittenpersec.drive
LEFT JOIN (SELECT rankedpctbusycounters.drive,
AVG(pctbusy) AS avgpctbusy,
MAX(pctbusy) AS maxpctbusy
FROM (SELECT timecollected,
drive,
[rank] = Row_number()
OVER(PARTITION BY drive ORDER BY pctbusy DESC),
pctbusy
FROM dbo.reportingdiskperfprojection) AS rankedpctbusycounters
WHERE [rank] BETWEEN 5 AND 105
GROUP BY drive
) AS pctbusy
ON iopersec.drive = pctbusy.drive
LEFT JOIN (SELECT rankedqueuelengthcounters.drive,
AVG(queuelength) AS avgqueuelength,
MAX(queuelength) AS maxqueuelength
FROM (SELECT timecollected,
drive,
[rank] = Row_number()
OVER(PARTITION BY drive ORDER BY queuelength DESC),
queuelength
FROM dbo.reportingdiskperfprojection) AS rankedqueuelengthcounters
WHERE [rank] BETWEEN 5 AND 105
GROUP BY drive
) AS queuelength
ON iopersec.drive = queuelength.drive

Voila! Real figures for moving a large collection of existing sprawl to a few, hopefully tidy clusters. Wish me luck!

Published Thursday, October 29, 2009 5:42 PM by merrillaldrich

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Merrill Aldrich said:

A contribution for T-SQL Tuesday #004 , hosted by the illustrious Mike Walsh! In the past few weeks I

March 8, 2010 11:22 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement