I got some positive feedback on the last post, All Those Extra Bits We Need for DR, so I wanted to put up another technique I’m using to coordinate DR servers – monitors for missing objects. I’ve got two techniques for this, each with pros and cons, and to be honest I haven’t really decided which makes the most sense. The basic idea is to report about what objects on a primary server (jobs, logins, etc.) are missing from a failover partner, so that a DBA can investigate and take some action. For the examples, I’m going to use SQL Agent jobs as the tracked objects, but this should work for just about any object type that has unique names.
Framework
In order to implement this solution, we need a small admin database as indicated in the previous post. In that database, I’ll create two tables to track missing objects, one that simply records the date a server was checked, and a second related table that will show a list of missing jobs, if any, at that time. Finally, there’s a view that makes it simple to get just the last state, which can be incorporated into a monitoring job or alert:
CREATE TABLE [dbo].[DR_JobValidation] (
[asOfDate] [datetime] NOT NULL,
CONSTRAINT [PK_DR_JobValidation]
PRIMARY KEY CLUSTERED ( [asOfDate] ASC )
)
GO
CREATE TABLE [dbo].[DR_JobValidation_MissingJobs] (
[asOfDate] [datetime] NOT NULL,
[missingJobName] [varchar](255) NOT NULL
)
GO
ALTER TABLE [dbo].[DR_JobValidation_MissingJobs]
WITH CHECK ADD CONSTRAINT [FK_DR_JobValidation_MissingJobs_DR_JobValidation]
FOREIGN KEY([asOfDate])
REFERENCES [dbo].[DR_JobValidation] ([asOfDate])
ALTER TABLE [dbo].[DR_JobValidation_MissingJobs]
CHECK CONSTRAINT [FK_DR_JobValidation_MissingJobs_DR_JobValidation]
GO
CREATE VIEW [dbo].[DR_MissingJobs] AS
SELECT asOfDate,
missingJobName
FROM dbo.DR_JobValidation_MissingJobs
WHERE asOfDate = ( SELECT MAX(asOfDate)
FROM dbo.DR_JobValidation
)
GO
This format for storing the data allows some basic tracking about when items were out of sync and when they were fixed. The amount of history to store is up for grabs.
If the view is empty, then it’s implied that no jobs are missing at the DR server as of the last time it was checked; if the view returns rows, then those are the jobs that need investigation.
Setup complete, now on to two possible methods for gathering the data.
Linked Server Solution
Disclaimer: I’m going to lead this section with the fact that I don’t really like linked servers. Linked servers do function, but in my world they have caused quite a lot of bad habits to proliferate, and the bad habits have in turn led to administrative headaches. In a nutshell, I carefully avoid them because:
- Linked servers can encourage the use of literal server names or four-part names in code. To me that’s definitely a code smell; the sweet spot for T-SQL identifiers is the two-part name. Specific, but not too specific. Both servers and individual databases should have their names abstracted in code whenever possible. Yes, I suppose aliases are possible as a workaround.
- For practical purposes, linked servers demand Kerberos (SSMS running on my workstation, to the server, to another server should be SOP for administration if using linked servers). Kerberos is OK, but often either hasn’t been set up, or remains mysterious, in many smaller shops. Mapping logins to avoid Kerberos opens a whole nasty can of security worms.
- Linked servers in the hands of users can create the illusion that there’s “one big server” where you can perform cross-server queries and joins at will. The users typically don’t understand that query optimization is normally working for them, much less how ugly optimization looks across a linked server. So, my personal nightmare in Linked-server-land takes this form:
SELECT a.col1, a.col2, b.col1, b.col2
FROM literalServerName.somedatabase.dbo.hugeTable1 ht1
INNER JOIN anotherLiteralServerName.somedatabase.dbo.hugeTable2 ht2
ON ht1.foo = ht2.bar
-- user’s perms escalated to sa on both machines, via ham-fisted linked server logins
- On the other hand, when not abused, linked servers can also be … useful. So I’m breaking my own rule here, because this is just a simple administrative task.
So, we can definitely pop over to the DR server and check the MSDB database for a list of jobs using a linked server. One trick to keeping the literal-server-name issue at bay is to create the linked server objects with an aliased name, not the literal server name of the other SQL Server. This allows the code to be consistent on any server, because the linked server object has the same name no matter what it connects to. If using the GUI, it means to avoid that tempting “SQL Server” radio button, and instead specify the SQL Native Client as the provider, and then a consistent name such as “FailoverPartner” as a server name alias:
EXEC master.dbo.sp_addlinkedserver @server = N'FAILOVERPARTNER',
@srvproduct = N'sql_server', @provider = N'SQLNCLI10',
@datasrc = N'<theOtherSQLServer>'
EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname = N'FAILOVERPARTNER',
@useself = N'True', @locallogin = NULL, @rmtuser = NULL,
@rmtpassword = NULL
Once the linked server is in place (and perhaps some permissions adjustment), we can make a simple stored proc to go and compare the job lists on the two servers:
CREATE PROCEDURE [dbo].[recordMissingDRAgentJobs] AS
BEGIN
SET XACT_ABORT ON ;
DECLARE @runtime datetime ;
SET @runtime = GETDATE() ;
BEGIN TRAN
INSERT INTO dbo.DR_JobValidation
( asOfDate )
VALUES ( @runtime ) ;
INSERT INTO dbo.DR_JobValidation_MissingJobs
( asOfDate,
missingJobName
)
SELECT @runtime,
myJobs.name
FROM msdb.dbo.sysjobs myJobs
WHERE NOT EXISTS ( SELECT 1
FROM FAILOVERPARTNER.msdb.dbo.sysjobs DRJobs
WHERE DRJobs.name = myJobs.name ) ;
IF ( @@error != 0 )
BEGIN
ROLLBACK ;
RAISERROR( 'Failed to validate job list at DR server', 11, 1 ) ;
RETURN ;
END
COMMIT ;
END
GO
Into a SQL Agent job goes a call to that proc, and a step to delete old rows from the tables.
Plus: very simple T-SQL code
Minus: requirement to set up and maintain linked server objects on every instance
PowerShell Solution
If you like PowerShell (and I loooves me some @POSH, despite being a rank amateur :-) there’s a no-linked-server method that works equally well. Here’s a fairly straightforward script that makes two SMO-based connections to two servers and compares the job lists. The results are then bulk-copied into the primary server’s admin database:
param( $sqlServerName = $(throw 'Parameter $sqlServerName is required'), `
$DRSqlServerName = $(throw 'Parameter $DRSqlServerName is required')
)
# Name of the database where the output from this script will be stored
# by default use the a prefix and the server name, but replace \ with _
# in case of named instance:
$adminDBName = 'DBAdmin_' + ( $sqlServerName -replace '\\', '_' )
$execTimeTable = 'dbo.DR_JobValidation'
$missingJobsTable = 'dbo.DR_JobValidation_MissingJobs'
$scriptTime = (get-date)
# Create data tables in which to stage the results. One just records the execution time
# for this script, the other lists missing job names, if any
$execTimeDT = New-Object System.Data.DataTable
$execTimeDT.Columns.Add( ( New-Object System.Data.DataColumn asOfDate, ([DateTime]) ) )
$newRow = $execTimeDT.NewRow()
$newRow["AsOfDate"] = $scriptTime
$execTimeDT.Rows.Add( $newRow )
$jobsDT = New-Object System.Data.DataTable
$jobsDT.Columns.Add( ( New-Object System.Data.DataColumn asOfDate, ([DateTime]) ) )
$jobsDT.Columns.Add( ( New-Object System.Data.DataColumn missingJobName, ([String]) ) )
# We're going to do a hash join to compare the lists of jobs from the two servers.
# Why? For fun, of course!
# Create an instance of SMO.Server connected to the DR SQL server
$DRsrv = New-Object Microsoft.SqlServer.Management.SMO.Server $DRSqlServerName
# Fetch the list of jobs currently set up on the DR server into a set
$DRJobList = New-Object System.Collections.Generic.HashSet[System.String]
$DRsrv.JobServer.Jobs | ForEach {
$DRJobList.add( $_.name )
}
# Create an instance of SMO.Server connected to the primary SQL server
$srv = New-Object Microsoft.SqlServer.Management.SMO.Server $sqlServerName
# For each job on the primary, check for the equivalent in the DR server set
$srv.JobServer.Jobs | ForEach {
if( !$DRJobList.contains( $_.name ) ) {
$newRow = $jobsDT.NewRow()
$newRow["AsOfDate"] = $scriptTime
$newRow["missingJobName"] = $_.name
$jobsDT.Rows.Add( $newRow )
}
}
# That gives us a "not exists" result set, using a hash join -- the job names in
# $jobsDT are the ones that exist on the primary server but not on the DR server
# Write the results out to the admin database
$destCon = "Data Source=$sqlServerName;Initial Catalog=$adminDBName;Integrated Security=True;"
$bulkCopier = New-Object Data.SqlClient.SqlBulkCopy $destCon
$bulkCopier.DestinationTableName = $execTimeTable
$bulkCopier.WriteToServer( $execTimeDT )
$bulkCopier.DestinationTableName = $missingJobsTable
$bulkCopier.WriteToServer( $jobsDT )
Plus: No server object, opportunity to hone PowerShell mad skillz
Minus: Requirement to hone PowerShell mad skillz
Important Note: If using the PowerShell-flavor method, pay close attention to the permissions of the account under which the script runs. It will need permissions for SMO to enumerate all the jobs on both servers, but in some permissions scenarios it might enumerate only some of the jobs and not all, despite reporting success. In that case jobs could be falsely reported as “missing,” when in reality they are present on the other server -- they were just missed in the listing of jobs due to permissions under which the script runs.