We're in the early stages of implementing System Center Operations Manager 2007 R2, and from what I've see so far it looks really good. I am still interested to see the depth of performance counter information that it'll collect and store, but haven't been able to really dig into that just yet.
There is one issue I am seeing and I don't know if others have come across this (could not find much online about it either): computing a database file free space alert rule is a little complicated, and it still seems to be wrong in some cases with mount points. The basic idea for a free space is this:
For each database file, examine the setting for Autogrow
- If Autogrow is OFF then alert when the remaining space IN the file is below some threshold, because if you fill the file, you'll have a problem.
- If Autogrow is ON then alert when the remaining space IN the file PLUS the remaining space on the disk where the file is located falls below some threshold.
(If we want to be precise, the ultimate calculation should also compare the disk's free space to the next growth interval set for the file, to ensure that a growth event, if triggered, would succeed.)
As far as I can see, this works correctly in Operations Manager unless the files in question are on a disk that is attached via a mount point. I'm getting a lot of false alarms on this rule, and they all seem to be coming from files on mount points. If I had to make an educated guess, I would say that the rule may be using the free space of the root of the disk (meaning the drive-letter mounted disk) instead of the real disk where the files are.
To repro this, one would configure a SQL Server where:
- There's a SMALL disk mounted as a drive letter (say, E:\)
- There's a LARGE disk mounted using a mount point to a directory on the small disk (say, E:\SQLData)
- Make a database with a data file on the large disk
- Fill it until it is about to autogrow, such that there is plenty of room on the large disk, but not the small one
- See if SCOM complains that there's not enough disk space
We use this type of disk layout as a standard practice, so I am seeing a LOT of false alarms.
Has anyone else hit this issue? Is my guess correct?
Update 12/29/2010: Gotcha!
Caught red handed? Here’s the query it looks like SCOM is using, which – a little speculation here – makes me say, “hmmm.” It does indeed appear that the SCOM MP assumes the left-most character of the file path is the drive letter is the disk, which is incorrect:
SELECT size / 128.0 as fileSize,
FILEPROPERTY(name, 'SpaceUsed') / 128.0 as fileUsed,
CASE WHEN max_size = -1 OR max_size = 268435456 THEN -1 ELSE max_size / 128 END as fileMaxSize,
CASE WHEN growth = 0 THEN 0 ELSE 1 END as IsAutoGrow,
is_percent_growth as isPercentGrowth,
growth as fileGrowth,
SUBSTRING(physical_name,1,1) as drive -- Wrong-o, my friend. Wrong. O.
…
FROM sys.database_files
WHERE type = 1 and is_read_only = 0
[The comment is my embellishment :-)]. What it should be doing is comparing the file path to the mount points and realizing one might have a disk like E:\SQLData.
This actually makes two bugs with SCOM around mount points – the installer that creates SCOM’s own databases makes the same error, and complained that we didn’t have enough space on our data disk for its data. We were able to work around that one by claiming in the installer that we just wanted a 200 MB DB, then growing the database files after installation.
Other than this issue, I’m quite pleased with SCOM. Our implementation is still ongoing, but it’s going well.