THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Windows XP Crash: Lessons Learned

Yesterday morning I had to deal with a non-bootable Windows XP machine. Every time it turned on, it would get to the Windows XP spash screen, sit there for a while, then flash a BSOD and restart -- the BSOD flashed just long enough to see that the screen was blue, and maybe the words "dump" or "kernel" if you looked fast enough. But not enough to get any real data.

Nothing new had been installed on the machine, and it had booted fine the night before. Typical bit-rot situation. Very annoying.

This wasn't the first time I've ever had to deal with XP spontaneously deciding not to boot... I've had this happen on numerous occasions. And here's what usually happens: I throw in the XP CD ROM, boot it, and try to get it to launch auto-repair mode from the install screen. But nine times out of ten, that option doesn't show up. I'm not sure what makes that option show up or not, but apparently on my computers it just doesn't.

So at this point, I usually just shrug and re-install XP. I specify a new computer name and a new default user name so that none of the documents will be overwritten, and resign myself to a few days of re-installing all of the apps I use. One side benefit of this is that I now use very few apps!

But yesterday, that wasn't an option. It wasn't my computer, and documents on the computer needed to be completed by mid-afternoon for a major deadline. Ugh...

I booted off the CD ROM, and as usual the repair option didn't bother coming up. So I started Googling for some solutions... And found a few web pages with advice on how to get that option to come up.

Turns out, you need to use the Recovery Console for that. I'd booted into it a few times in the past, but never bothered learning how to use it...

Lesson 1: To boot into the recovery console, you need the administrator password. Oops. I hadn't written it down when I installed XP on this machine. I have about 10 different passwords I generally use, but you're only allowed 3 tries per boot. And each boot cycle takes a LOT longer than it needs to -- who knew there were so many different disk drivers required to start up Windows? (Anyone who's recently booted off the CD ROM knows exactly what I'm talking about) -- On the fourth try, 30 minutes or so into this exercise, I finally figured out what the password was.

Into the recovery console I went, and after trying several different "tricks" from various web pages, and rebooting a bunch of times, the option still didn't appear. So I kept searching. Finally, when I was about ready to just re-install and try to very quickly get the needed documents back in shape for the deadline, I found this excellent, utterly-lifesaving article by Charlie White. Following his sage advice I went back into the Recovery Console where I backed up the registry, restored a recovery version, booted into XP (and strangely, after rebooting I had to use a different administrator password! I'm not sure why), recovered to a point from a few days ago, and rebooted the machine back into XP, good as it had been before the crash.

Lesson 2: If you search the Web, you will probably find someone who knows more about the subject than you do, and who can save you a lot of time. Re-installing was the way I knew to fix the problem; this is due solely to the fact that I've never bothered actually searching for a better solution before!

Great stuff. But why did it crash to begin with? I was unable to find any log entries or other diagnostic data, but I figured I should run a disk scan to check for issues (CHKDSK /R). And that made the problem instantly apparent. The scan reported a single bad cluster in the SOFTWARE registry file.

Lesson 3: If your hard disk starts making clicking noises, that means that bad things are about to happen! Turns out, this disk had been clicking for about a week before the crash.

... And were there any data backups? Of course not, this is a home computer! I still don't know what to do about that; another computer decided to crash this morning, so it's becoming painfully apparent that my house is cursed -- and I need a backup solution.

Lesson 4: As if I haven't learned this one about a million times before... Backup is key! But I need a real-time solution of some sort. I believe Microsoft is working on some sort of "data integrity server" (?) -- I'm not sure if that will be suitable for home networks or only targetted at enterprise users, though.

Finally, I'd like to whine about the registry a bit. After this disaster, it's clear to me that a monolithic solution like the registry is just begging for problems. A single faulty cluster brought down the entire machine! I'm hoping we'll see some kind of solution for these types of issues in Longhorn.

Anyway, I'm now typing on the fixed computer, waiting for the other computer to finish its scan (two faulty clusters already found)... I hope everyone else is having a more productive weekend!


Published Wednesday, July 12, 2006 10:43 PM by Adam Machanic

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

k_nitin_r said:

I don't really know why, but the bad cluster issue is worse than it was in the old days. I have a 386 notebook PC (from AST... I think Samsung merged with or purchased it) that ran without bad sectors for about 7 years. A 486 desktop (assembled) that didn't have any bad sectors till I got rid of it. A Pentium-I Compaq Deskpro (2000 series) still working without any bad sectors after a little under a decade of service, though the monitor doesn't work any more. An IBM Thinkpad Pentium-II (600E) without bad sectors but with a BIOS that got corrupted last year (...and so has been put out of service). A Compaq Presario Pentium-III notebook (1800 series) with a hard disk that didn't have bad sectors during the 2 years that I had it... then a screen hinge broke and so I got rid of it.

Another Compaq Presario Pentium-IV notebook (Presario 2132) ran for 2 years before some extremely long waits before reporting a read failure, and then a disk scan for bad sectors that was so slow on reading (stopped for almost half a minute on each bad sector) that I didn't bother for it to finish... it then developed intermittent failures at which point I had to get it out, shake it (totally un-geekly), and then put it back in... it ran in the intermittent state for a month before I got rid of it and replaced the Toshiba hard disk (I always thought Toshiba was good for notebook hard drives) with a new Hitachi hard disk and it's still running perfectly after a year of use. I think the disk may have failed as I have power fluctuations in the area that I've been using this one and powercuts too - I run it using the AC adapter as batteries usually stop working in 9-14 months for me.

I got an IBM Thinkpad (T60 series) last month and needless to say, there aren't any issues with it.

For me, having a hard disk fail before 5 years is something to be dissatisfied about since it doesn't usually happen to me, but a classmate had a hard disk failure on her Dell notebook after about 4 months, followed by a DVD drive failure 2 months later. It was a low-end notebook and she doesn't really care since it's just for the times when her USB disk isn't large enough to carry data back home to her desktop.

I run most of my computers 24x7, except for the notebooks that are on standby/hibernate/off when I carry them around (during software installations, I leave them on but leave them out of the case to avoid overheating). My PCs before the Compaq Presario 2132 (the one with the failed hard disk) were always within a temperature range of 18-30 C (room temperature, not CPU or system temperature), and were plugged into a 'clean' power supply wall socket (i.e. no fluctuations, spikes or sags).

When my hard drive failure occurred, I even thought about doing away with hard drives entirely and downloaded a copy of Knoppix linux and was on the verge of buying a USB flash disk (I still don't have one of these since my first one failed on me)...

For removable storage, I stopped using floppies and CD-RWs and relied entirely on CD-Rs and my old Nokia Communicator 9210i (for file transfer via IrDA - the serial cable caused frequent disconnects so I never used it).
August 3, 2006 11:03 PM
 

MK14 said:

solution: get a Mac.

April 1, 2010 10:55 AM
 

Adam Machanic said:

Right, because Macs never have hardware failures. Dream on.

April 1, 2010 11:14 PM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement