THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

The Rambling DBA: Jonathan Kehayias

The random ramblings and rantings of frazzled SQL Server DBA

A Major Lesson to Learn from 2009 – Don’t Just Backup, Test Recovery

This year is almost over, and its sad and unfortunate that I can write a blog post about this topic, but there have been so many major examples of data loss from missing, damaged, or otherwise unusable backups this year. 

The year started out with a rollover from last years JournalSpace.com disaster that resulted in full loss of all of the blogs that existed on the site and their associated postings.  This was hotly talked about at the beginning of the year when the owners of JournalSpace announced that all attempts at recovery had failed, and they were not interested in trying to resurrect the site from scratch. 

JournalSpace Drama: All Data Lost Without Backup, Company Deadpooled
Brent Ozar - Why Backup? Ask JournalSpace

Not to long behind this at the end of January, the social bookmarking site Ma.gnolia suffered from data loss and corruption that ultimately led to its own demise.

Ma.gnolia Suffers Major Data Loss, Site Taken Offline
VOD-cast explaining the catastrophic nature of the data loss
Lessons Startups (and Users) can Learn From Ma.gnolia’s Crash
Brent Ozar - Backup Fail: Ma.gnolia goes under

Just as it became apparent that the ma.gnolia crash had caused unrecoverable data loss that ended the sites first implementation, it was announced that online backup company Carbonite had filed suit against it’s suppliers for the loss of customer data stored in its cloud based backup solution in 2007:

Online Backup Company Carbonite Loses Customers’ Data, Blames And Sues Suppliers (Updated)
Brent Ozar - Another backup failure: Carbonite
Brent Ozar - More On the Carbonite Backup Failures

Things were quiet for the most part over the summer, but then the Sidekick data loss issue occurred in October:

T-Mobile Sidekick Disaster: Danger’s Servers Crashed, And They Don’t Have A Backup

Then most recently online personality Jeff Atwood suffered a crash and significant data loss without a backup of his hosted virtual machines that he documented on his blog after getting it back online:

International Backup Awareness Day

A number of these were covered by Brent Ozar on his blog, but even Brent wasn’t able to escape the year without his own tale of failed backups in his virtual testing environment, though Brent’s scenario resulted in no actual data loss:

Brent Ozar - Bad News, Good News, Worse News

However, based on the events that transpired, Brent was able to put together a number of excellent blog posts that covered what the topic of backups:

Brent Ozar - Why Are You Backing Up?
Brent Ozar - Adding Reliability to Your Infrastructure
Brent Ozar - Mirrors aren’t backups

The major takeaway from all of these problems is that you don’t have a backup until you have tested the recovery of it, and there is no substitute for having an actual, “cold” backup of your environment that you can recover to a known point in time from.  Keep in mind that testing this backup doesn’t have to occur to equivalent hardware, a cheap and easy solution to testing your backups could be restoring it to a low cost commodity server, or in a worst case, desktop that has sufficient disk space to accept the restore.  At least using this methodology you know that your backup will be good when you need it the most.

As a DBA, my number one priority is ensuring that I have good valid backups that support my business SLA’s.  I manually run a report every Monday morning that checks my servers backup information using Multi-Server Query in SSMS 2008 to ensure that my backups over the weekend completed successfully, and in the event that I find a problem, this becomes my immediate focus until the report returns valid information to me.  After that I generally look into why I wasn’t notified of a failed backup if that actually occurred because I have automated alerting setup that should trigger an email notification to me if the backups fail on my servers.  To date I have been lucky in that I work for an employer that has understood the importance of backups.  However, I have done consulting work for a handful of customers where backups just aren’t the prevailing priority, despite having the risk explained to them.

Hopefully you’ve learned from the mistakes of this years disasters, I know that I have.

Published Wednesday, December 23, 2009 1:25 AM by Jonathan Kehayias

Comments

 

Uri Dimant said:

Hello Jonathan

Having backuops is fine but I hope you also test them to restore database:-))

December 23, 2009 2:41 AM
 

ArnieRowland said:

December 23, 2009 3:11 AM
 

Brent Ozar said:

I totally shouldn't giggle when I read backup horror stories, but I do, and that's why I keep posting 'em on my blog.  It's amazing to me that it keeps happening.

December 23, 2009 10:40 AM
Anonymous comments are disabled

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement