THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Killing a SQL Server thread? Don’t!

Sometimes, when you kill a session (i.e. a spid) in a SQL Server instance, the spid just refuses to go away not because it’s doing a rollback. Perhaps, it’s stuck on a certain dependency on something external to SQL Server or it’s just simply stuck for some decipherable reasons. And the spid may hang around for as long as the instance is online and will only go away when the instance is restarted.

 

In more than a few occasions, I have heard people wondering if they can just kill the thread of the SQL Server process from the OS, that is, the OS thread corresponding to the SQL Server spid.

 

That is a very bad idea! Killing such an OS thread will likely crash the SQL Server instance.

 

You can easily demonstrate the harm of doing this with Process Explorer. You’ll need to run Process Explorer locally on the server of your test SQL Server instance, and needless to say that this should not be done on a production instance.

 

First, connect to the test SQL Server instance, and open a New Query window in SQL Server Management Studio. Note the spid for the connection, and run the following query (replacing <spid> with the spid for the connection) to find the kpid value in the resultset for the spid:

 

select * from master..sysprocesses where spid = <spid>

 

This kpid is the id the OS assigns to the thread that is currently bound to the worker executing a task or idling on the spid.

 

Then, fire up Process Explorer and locate the OS process for the SQL Server instance. If you have multiple SQL Server instances running on the server, you can mouse over each sqlservr.exe to identify the right instance.

 

Double click on the process, and select the Thread tab. This will give you a list of all the OS threads for the SQL Server instances. Highlight the Thread ID corresponding to the kpid found above, and click on the Kill button to terminate the thread.

 

After you have killed the thread from Process Explorer, you may still be able to use the SQL Server instance for a while. You’ll for sure see stack dump error messages in the errorlog, and you may see error messages such as the following:

 

2010-02-04 14:45:31.03 Server      Potential image corruption/hotpatch detected. This may be a sign of a hardware problem or caused by presence of CLR/jitted images on the stack. Check SQLDUMPER_ERRORLOG.log for details.

 

The session related to the kpid value would be dead. But the SQL Server instance itself may or may not continue to function. If you repeat the above steps to kill the threads for the other user sessions, eventually this will crash the SQL Server instance, i.e. the SQL Server instance will become unresponsive. After that, the only way to get the instance back online is to restart it. Most likely, you'll have to first kill the corresponding process from the OS because the instance probably won't respond to any of the normal restart methods.

 

 

 

Published Thursday, February 04, 2010 4:09 PM by Linchi Shea
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

Much more fun than killing a thread is to pause a thread. This can wreak all kinds of havoc, and no error message will be thrown :-)

February 4, 2010 4:20 PM
 

Linchi Shea said:

Adam;

You are ahead of me :-) I was going to post a challenge asking people whether they can identify what may be the problem using the usual SQL Server troubleshooting tools (e.g. DMVs, perfmon, profiler, whatever) when a thread corresponding to a spid is suspended from the OS.

February 4, 2010 4:42 PM
 

Adam Machanic said:

Actually I don't know how to identify the issue except by noticing that a session appears to be totally hung... Sounds like a good challenge.

The sad part is why I know this: I was working on a project where we had a certain query against a linked server that would, on a regular basis, "hang", and consume an entire CPU, even if left for days. The SPID in question could not be killed using KILL, and the business users were getting annoyed with the constant service restarts needed to bring things back to normal. So I decided to try suspending the thread, and for about 10 minutes I thought it had worked wonderfully, until I noticed that there was a huge backlog of sessions sitting there waiting on log buffer waits...

BTW, the fix for the linked server query turned out to be re-writing it so that it didn't use multiple remote query iterators. Not sure why that worked, but PSS was unable to get to the bottom of it and that is the one solution (found via trial and error) that made the issue go away.

February 4, 2010 5:02 PM
 

Mike Walsh said:

Good post and a good warning (though I wonder how many people are now knowledgeable about how to kill a thread that weren't before ;-) He really meant don't do it :) :)  )

Adam - what was the linked server product? We used to have that problem quite often with DB2 on an AS400, it was with certain query patterns and conditions. Ended up upgrading the DB2 provider to the version that comes with SQL 2008 (it was a 2005 environment and this fix came from the Host Integration Services team)... Fixed the problem but it was quite annoying that first time when the SELECT query was stuck in killed/rollback for 2 days before deciding to restart...

Linchi - I really look forward to that follow up post because I can't think of any way to tell and I'd like to see the answer. I have a few thoughts in my mind and maybe I'll try and test them out but all conjecture at this point.

February 4, 2010 11:09 PM
 

fediori@gmail.com said:

Have you tried to :

1) First kill the spid in SQLSERVER

2) If the spid refuses to go away, then kill the thread kpdi in Process Explorer

Many years ago I was working with Oracle 7 on Unix, it was standard practice : first kill session in db, if is not enough then kill process in OS.

Please, if you can , try it up and let me know.. I have no possibility to do this kind of tests

Regards

March 5, 2010 8:58 AM
 

Linchi Shea said:

Fediori@gmail.com;

It doesn't matter whether you kill the spid in SQL Server first or not. Killing the thread from Process Explorer will most likely crash SQL Server.

April 29, 2010 12:05 AM
 

David BAFFALEUF said:

Hi Linchi,

Interesting post. And thanks Adam for the feedback. I guess by joining sys.dm_os_threads and sys.dm_os_workers you can see whether quantum_used and tasks_processed_count are still going up for this particular user thread. I'd be interesting in anyone being able to see what is the last_wait_type on the worker, my ProcessExplorer crashes when trying to load information from the threads.

David B.

May 31, 2010 9:47 AM
 

Tarun said:

KILL statement writes on SQL log what spid was killed and the time stamp.

Is there way to get more information like the SQL statement that was killed by using the SPID and timestamp from any DMVs or other tables/views/fn?

June 2, 2010 5:29 PM
 

Tarun said:

My environment is SQL Server 2005.

June 2, 2010 5:30 PM
 

retracement said:

good post.

Adam might play with your suggestion sometime next time I have a few minutes to spare because it sounds like it could be mildly amusing. Apart from being a diagnostic headache, I'd be surprised if pausing the thread would cause instability to SOL (as long as it did actually get resumed) but will have to see for myself. I'm a little bit disappointed that killing the spid thread is so dangerous (although can see why it would be bad practice), and this behavior suggests to me that either :-

The OS thread is not entirely equivalent the SQL spid

-or-

The Engine has a weakness that perhaps it shouldn't.

I know the SQLOS adds a layer of complication so perhaps I'm being too simplistic.

June 7, 2010 10:33 AM
 

HerbertDBA said:

Linchi and Tarun both having the same issue.

Some things never change.

Herb

August 26, 2010 2:51 PM
 

James Anderson said:

Came across this thread as everynow and again I find I have a process that gets stuck and eventually ends up with a service restart, this happens about 3-4 times a year. I know why it happens but its not easy to code around without taking a hefty performance hit so we've implemented a failover system instead.

Anyway thought I'd run a small experiment with process explorer in a test lab...

5/10 times the SQL server immediately crashed, became unresponsive when trying to restart it in SQL configuration manager. I had to reboot the OS.

3/10 times the spid did get cancelled but within approximately 20-30 seconds the SQL server slowed and became unresponsive to queries. However it was responsive to a service restart from the SQL configuration tab.

2/10 times it actually worked without any noticeable consequences.

So 80% of the time it resulted in a service restart of my test SQL server (SQL 2012, windows 2012), not great odds if anyone was ever thinking about doing it to get round a service restart.

James

October 16, 2014 11:20 AM
 

Best proceed to transform user rankings but murdering a server | Zeolla said:

November 21, 2014 11:28 PM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement