This year is almost over, and its sad and unfortunate that I can write a blog post about this topic, but there have been so many major examples of data loss from missing, damaged, or otherwise unusable backups this year.
The year started out with a rollover from last years JournalSpace.com disaster that resulted in full loss of all of the blogs that existed on the site and their associated postings. This was hotly talked about at the beginning of the year when the owners of JournalSpace announced that all attempts at recovery had failed, and they were not interested in trying to resurrect the site from scratch.
JournalSpace Drama: All Data Lost Without Backup, Company Deadpooled
Brent Ozar - Why Backup? Ask JournalSpace
Not to long behind this at the end of January, the social bookmarking site Ma.gnolia suffered from data loss and corruption that ultimately led to its own demise.
Ma.gnolia Suffers Major Data Loss, Site Taken Offline
VOD-cast explaining the catastrophic nature of the data loss
Lessons Startups (and Users) can Learn From Ma.gnolia’s Crash
Brent Ozar - Backup Fail: Ma.gnolia goes under
Just as it became apparent that the ma.gnolia crash had caused unrecoverable data loss that ended the sites first implementation, it was announced that online backup company Carbonite had filed suit against it’s suppliers for the loss of customer data stored in its cloud based backup solution in 2007:
Online Backup Company Carbonite Loses Customers’ Data, Blames And Sues Suppliers (Updated)
Brent Ozar - Another backup failure: Carbonite
Brent Ozar - More On the Carbonite Backup Failures
Things were quiet for the most part over the summer, but then the Sidekick data loss issue occurred in October:
T-Mobile Sidekick Disaster: Danger’s Servers Crashed, And They Don’t Have A Backup
Then most recently online personality Jeff Atwood suffered a crash and significant data loss without a backup of his hosted virtual machines that he documented on his blog after getting it back online:
International Backup Awareness Day
A number of these were covered by Brent Ozar on his blog, but even Brent wasn’t able to escape the year without his own tale of failed backups in his virtual testing environment, though Brent’s scenario resulted in no actual data loss:
Brent Ozar - Bad News, Good News, Worse News
However, based on the events that transpired, Brent was able to put together a number of excellent blog posts that covered what the topic of backups:
Brent Ozar - Why Are You Backing Up?
Brent Ozar - Adding Reliability to Your Infrastructure
Brent Ozar - Mirrors aren’t backups
The major takeaway from all of these problems is that you don’t have a backup until you have tested the recovery of it, and there is no substitute for having an actual, “cold” backup of your environment that you can recover to a known point in time from. Keep in mind that testing this backup doesn’t have to occur to equivalent hardware, a cheap and easy solution to testing your backups could be restoring it to a low cost commodity server, or in a worst case, desktop that has sufficient disk space to accept the restore. At least using this methodology you know that your backup will be good when you need it the most.
As a DBA, my number one priority is ensuring that I have good valid backups that support my business SLA’s. I manually run a report every Monday morning that checks my servers backup information using Multi-Server Query in SSMS 2008 to ensure that my backups over the weekend completed successfully, and in the event that I find a problem, this becomes my immediate focus until the report returns valid information to me. After that I generally look into why I wasn’t notified of a failed backup if that actually occurred because I have automated alerting setup that should trigger an email notification to me if the backups fail on my servers. To date I have been lucky in that I work for an employer that has understood the importance of backups. However, I have done consulting work for a handful of customers where backups just aren’t the prevailing priority, despite having the risk explained to them.
Hopefully you’ve learned from the mistakes of this years disasters, I know that I have.