THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Sarah Henwood

Connection Failures, SQL 2005 Appears Unresponsive

Here is a situation one of my client's recently encountered on SQL Server 2005.  It was incredibly painful and a challenge to determine what was happening, plus it's an issue that is supposed to be fixed in SP2.  So wanted to share the details of the symptoms in case you experience it too.

 My client is on a SP2+ build.  They recently conducted a successfull migration from SQL Server 2000 to 2005.  They have a very high heavy rate of OLTP activity.  Things were chugging along just fine for a couple of hours after migration, and then all of a sudden web servers started throwing connectivity errors.  A local connection to the server could not even be made.  After establishing a connection using the DAC, first thing checked was the state of the SPID's on SQL - I fully expected to see excessive waits due to blocking or some other type of wait.  However, in this case it was strange - everything looked as I would expect for a normal functioning system - a small handful of SPID's were waiting on something with short waits you would expect to see at any period in time, but nothing excessive or abnormal looking (no high cpu, no indication of any runaway or expensive queries etc.)  The majority of SPID had a zero waittype and none were waiting on working thread.  This is a very key symptom of the issue we encountered - that SPID do not look problematic and you do not have a blocking or other wait type of bottleneck.  There also are no network connectivity issues or anything else that is typical of this type of symptom. 

We then issued a DBCC FREESYSTEMCACHE ('TokenAndPermUserStore')  and then immediately, the issue cleared and connections could be made to SQL and processing continue.  (Some notice resolution with DBCC FREEPROCCACHE - it is because freeing the procedure cache also dumps this store.  For this issue, you only need to dump the tokenandpermuserstore.)  I was very surprised since my client was already on SP2.  So it appears there still is a different flavor of this issue out there with some varying symptoms.  (RE:  http://support.microsoft.com/kb/927396)

 My client has a support case open with Microsoft and is still being investigated and debugged.  In the meantime, the workaround was to monitor the size of the TokenAndPermUserStore and free it when it got over 40MB.  Initially we started with a size of 100MB but still sporadically encountered the problem.  (I suspect it may be more related to number of entries, but size is what we were able to work around with.)  Below is a query you can use that will report the size and number of objects in the store if you want to track and trend.  You can add in additional logic if you need to free the cache if it is over a certain size or number of objects.

  SELECT SUM(single_pages_kb + multi_pages_kb) as CacheSize, entries_count,entries_in_use_count
  FROM sys.dm_os_memory_clerks
  WHERE name = 'TokenAndPermUserStore'

 Hope this helps!  And I will post with any new updates.


Published Monday, July 23, 2007 6:30 AM by sarahhen

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Jason Haley said:

July 24, 2007 8:26 AM
 

Matt Stanford said:

We're experiencing the exact same issue and we are able to mitigate it using DBCC FREESYSTEMCACHE ('TokenAndPermUserStore').  Were you able to get anywhere with Microsoft?

October 4, 2007 2:59 PM
 

DT said:

We are experiencing somewhat similar problems.  We have recently upgraded to a SQL 2005 box, and eventually the database becomes unresponsive.  Our sproc and application code on SQL 2000 worked fine.

We however relieve the pressure by running "dbcc freeproccache".  this is not the optimal solution however.  We thought it might have to do with high load excessive temp table usage, and we removed temp tables out of our sprocs, but no luck.

We do have 128GB of RAM installed on the machine.  Could that be a problem?  We are tempted to remove RAM out of the machine so that in theory SQL server can pump query plans out of the cache at a faster rate.  I am wondering if is possible that as the number of query plans increase, SQL server takes longer to find an open memory to write to?

Has anyone tried removing memory of their server to see what happens?

November 28, 2007 11:32 AM

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement