THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

SQL Server puzzle: Why are the SQL Agent jobs executing twice?

I recently ran into a rather freakish case in which many SQL Agent jobs on a SQL Server 2005 instance were reported to run twice at their scheduled times. And the 2nd run took place either at almost exactly the same time or only a second or two later.


Upon further examination, I could confirm the following:


  • The jobs did indeed run twice. No hallucination here! It’s not that the job history showed two entries for each run. But they actually ran twice as often as their schedules would normally allow.
  • A SQL trace revealed that the related queries (e.g. queries checking the job status, retrieving the job info, and updating the job history) all came from the same server locally. So it’s not the case that msdb was copied somewhere else but still pointing back to this server. Nor was it the case that the jobs on some other server had the identical schedules as the jobs on this server, and were kicking off the jobs on this server.
  • Stopping SQL Agent did not stop all the SQL Agent traffic seen in the SQL trace. Nor did it prevent the jobs from being executed, though they were no longer being executed twice.
  • For a long-running job whose output file was specified, the 2nd run would often fail because it could not get hold of the output file, a further evidence that a 2nd attempt was indeed made to run the job at the scheduled time.
  • Not all the scheduled jobs ran twice. The jobs that did not run twice appeared to be the ones whose durations were extremely short.


In addition, I googled for any reports of similar behaviors out there in the community, and did find a few reported cases. But none of them reported the root cause.


As I mentioned at the beginning, this is a rather freakish case and I don’t expect you to run into it. But it is still interesting from a troubleshooting perspective, and that’s why I think it’s worth sharing the story here.


Now with the information given above, can you guess what may have caused this behavior? I’ve included the root cause at the bottom of this post. Note that it’s possible that the same observed behavior has some other root causes, of which what I encountered may be just one. Take a minute to think about before you scroll to the bottom.















In this particular case, the SQL Server instance was a single instance running in a two-node cluster, and after poking around I eventually discovered that the SQL Agent service was still running on the inactive node (say node B). In other words, both nodes had the SQL Agent service running at the same time for the same SQL Server instance! Furthermore, it looked like that when the SQL instance was failed over from node B to node A, the SQL Agent service was never stopped on node B.


This was not supposed to happen at all, and I had never seen this happened before. When failing over from node B to node A, the cluster service will ensure that SQL Server and SQL Agent are stopped on node B, and that is how the server cluster is designed to function. Note that you can’t just start SQL Agent on node B (out of the cluster) if SQL Server has been failed over to node A because the SQL Agent service depends on the SQL Server service and SQL Server service cannot run on node B when all the system databases are on node A.


I don’t know the further root cause as to why SQL Agent was able to continue to run on node B. But as soon as I killed the SQL Agent process from the OS on node B, the jobs stopped being executed twice.

Published Wednesday, August 25, 2010 6:13 PM by Linchi Shea



AaronBertrand said:

Whoa, that is weird.  I have never seen this happen either.  Was this symptom noticed shortly after applying a service pack or cumulative update?  Just wondering if the cluster got confused when one of the nodes was being updated...

August 25, 2010 9:32 PM

Linchi Shea said:

Hi Aaron;

No, there was no patch update on the cluster. I have a hunch about what might have caused it, but just haven't got time to test and verify. Definitely, the cluster was confused for whatever reason.

August 26, 2010 12:47 AM

TrackBack said:

August 28, 2010 9:22 AM

Texdanny said:

Yes, that happened before. We did everything - and then we had to delete the job; recreate from scratch, and now all is well.  It made us looked pretty bad, but we are still troubleshooting to prevent recurrence in the future.

August 28, 2010 9:54 PM

NebraskaPaul said:

We had the same issue.  I never could nail down the root cause, but did the same thing as TexDanny.  Deleted the job, recreated from scratch and all was well.

August 30, 2010 10:23 AM

Check said:

This may be over three and a half years old, but this just solve an ongoing mystery for me too.  Thanks a ton!

March 18, 2014 3:14 PM

Lucas Benevides (DBA Cabuloso) said:

I use SQL Server 2008 R2 with a failover Cluster and this happened to me. I took two whole days to find it out, thanks to this post. It is quite a huge BUG. We also work for years with Cluster and had never seen anything like this.


April 15, 2014 2:10 PM

Valentino Vranken said:

Just wanted to mention that a similar phenomenon occurs when two identical jobs are running in two different instances on the same server. Lots of fun guaranteed when a) you're not aware of that second instance and b) the job is launching SSIS master packages which launch loads of other packages...

The result was duplicate package execution logs in the SSIS log table but the job history system tables only showed one execution. And weird phantom errors which occurred because those packages were not supposed to run twice at the same time. Felt like going crazy until someone mentioned "wait a minute, there's this other instance"... Doh.

Thanks for putting me on the right track with this post before all my hair was gone!

March 18, 2015 9:40 AM

Jon Fox said:

I just had the same situation as Valentino above (except the SQL instances were on separate servers rather than the same) and this post helped me uncover the issue. Thanks!!!

March 1, 2016 3:23 PM

Vikas said:

Can you please confirm the process to how to stop the sqsl agent from other node or instance, as I can find a way to do this

February 2, 2017 3:49 AM
New Comments to this post are disabled

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog


Privacy Statement