Recently I had to take care of the most interesting corruption case I’ve even seen, so decided to share this experience with you. We’re talking about small accounting program which keeps its data in SQL Server Express – in this particular case in SQL Server 2005. The customer called today and sent me following error screen (nice screenshot – taken with cellular phone camera
):

Upon connecting to the server I’ve immediately noticed dumps with the same error. Here is entire error message:
A time-out occurred while waiting for buffer latch -- type 2, bp 04268450, page 1:804, stat 0xc00009, database id: 5, allocation unit Id: 72057594108248064, task 0x00A186B8 : 0, waittime 300, flags 0x1a, owning task 0x00A0A4D8. Not continuing to wait.
This article was extremely helpful in analyzing this dump with WinDbg tool – flow and somewhat cryptic commands described there easily pinpointed the guilty thread and its call stack:
sqlservr!LatchBase::AcquireInternal
sqlservr!BUF::AcquireLatch
sqlservr!BPool::Get
sqlservr!PageRef::Fix
sqlservr!IndexPageManager::GetPageForLinkModification
sqlservr!RemoveBTreePageIfUnchangedInternal
sqlservr!RemoveBTreePageIfUnchanged
sqlservr!CleanVersionsOnBTreePage
sqlservr!IndexDataSetSession::CleanupVersionsOnPage
sqlservr!GhostExorciser::CleanupPage
sqlservr!TaskGhostCleanup::ProcessTskPkt
sqlservr!GhostRecordCleanupTask
sqlservr!CGhostCleanupTask::ProcessTskPkt
sqlservr!TaskReqPktTimer::ExecuteTask
sqlservr!OnDemandTaskContext::ProcessTskPkt
sqlservr!SystemTaskContext::ExecuteFunc
sqlservr!SystemTaskEntryPoint
sqlservr!OnDemandTaskContext::FuncEntryPoint
sqlservr!SOS_Task::Param::Execute
sqlservr!SOS_Scheduler::RunTask
sqlservr!SOS_Scheduler::ProcessTasks
sqlservr!SchedulerManager::WorkerEntryPoint
sqlservr!SystemThread::RunWorker
sqlservr!SystemThreadDispatcher::ProcessWorker
sqlservr!SchedulerManager::ThreadEntryPoint
msvcr80!_callthreadstartex
msvcr80!_threadstartex
Highlighted part of the call stack indicates that Ghost Cleanup process caused this failure. Actually, I had a similar encounter with stuck ghost cleanup several years ago. The easiest way to verify that it is still stuck is to execute DBCC CHECKDB command. Indeed CHECKDB appeared to be blocked by ghost cleanup session.
What’s next? You can’t kill system session. Indeed you can’t by you can start SQL Server without it – using trace flag 661 as startup parameter (don’t forget to remove flag and restart service again if you use this flag). So, after restarting service, rebuilding index in question, removing trace flag and restarting service again, I’ve already thought that I’ve earned my beer database is fixed.
Not so fast. Now DBCC CHECKDB succeeds to complete but results are very very red. Including interesting messages like:
Msg 8992, Level 16, State 1, Line 1
Check Catalog Msg 3853, State 1: Attribute (object_id=1575689407) of row (object_id=1575689407,column_id=1) in sys.columns does not have a matching row (object_id=1575689407) in sys.objects.
Msg 8992, Level 16, State 1, Line 1
Check Catalog Msg 3853, State 1: Attribute (object_id=1575689407) of row (object_id=1575689407,column_id=2) in sys.columns does not have a matching row (object_id=1575689407) in sys.objects.
Msg 8992, Level 16, State 1, Line 1
Check Catalog Msg 3853, State 1: Attribute (object_id=1575689407) of row (object_id=1575689407,column_id=3) in sys.columns does not have a matching row (object_id=1575689407) in sys.objects.
Msg 8992, Level 16, State 1, Line 1
Check Catalog Msg 3853, State 1: Attribute (object_id=1575689407) of row (object_id=1575689407,index_id=0) in sys.indexes does not have a matching row (object_id=1575689407) in sys.objects.
Msg 8992, Level 16, State 1, Line 1
Check Catalog Msg 3855, State 1: Attribute (data_space_id=1) exists without a row (object_id=1575689407,index_id=0) in sys.indexes.
Nice, uh? After first wave of shock has passed, I checked and found out that indeed object with that id doesn’t exist. So it seems that all we need to do is to delete 3 rows from sys.columns and 1 row from sys.indexes. Ah, but those sys.something objects are views, aren’t they? And what are the real objects? The way to find real – internal – tables and columns is via execution plans:

So actually we need to delete rows from sys.syscolpars and from sys.sysidxstats (notice that column names are also different). How do we do it? Let’s try DAC (Dedicated Admin Connection)? No way – Express Edition doesn’t support DAC. Unless… unless we use trace flag 7806 as startup parameter.
Restart server again, connect using DAC, try to delete rows… Oops,
Msg 259, Level 16, State 1, Line 1
Ad hoc updates to system catalogs are not allowed.
But for this we have Paul Randal’s instructions. So: “ sqlservr.exe -sInstanceName -m -T661 –T7806 “, then “ sqlcmd -S.\InstanceName /A ” and finally…
1: DELETE FROM sys.syscolpars WHERE id = 1575689407 AND number = 0
2: GO
3: DELETE FROM sys.sysidxstats WHERE id = 1575689407 AND indid = 0
4: GO
Now stop the server, remove all trace flags, start server in a normal way and verify that DBCC CHECKDB returns nothing. Bingo! And… well deserved my own home-brewed Scottish Ale!
