THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

The Rambling DBA: Jonathan Kehayias

The random ramblings and rantings of frazzled SQL Server DBA

There is no such thing as a “Small Change” to a production database

It seems like every week I get hit up with some kind of “Its a view only for this specific report,” or “It is a simple change that the vendor would have made if they had access” type of request related to SQL Server.  If you’ve never dealt with me, let me start off this blog post by saying that I am a staunch advocate of Change Control Processes for production systems, I don’t care how simple you might think the change might actually be.  Personally, I submit change requests for even the most simplistic tasks like creating a new index in the database or even changing the frequency of DBCC CHECKDB commands on production servers.  Why do I do things like this? Basically put; because you never know!

Early on in my career with SQL Server, I learned the impact that so called “Small Changes” to a SQL Server database could have in a production environment.  As a Business Analyst I had elevated access to a production system, which I really had no business having, that I did a significant amount of work on.  One of my primary job functions was to prevent bills from going out to customers that contained errors on them, so I created a table in the database to exempt accounts from the billing process by inserting the account primary key into the table, which was then checked during Billing Nomination to exclude those accounts until the associated problem was fixed.  So that I had a full history of when an account was added and removed from the table and by who, a audit table was added with auditing triggers for INSERT/UPDATE/DELETE on the exception table. 

At some point, someone decided to alter this table and add an additional column to it to make it so they didn’t have to join to the accounts table to get the LDC (local distribution company) account number for troubleshooting problems with the accounts.  When this was done, it broke reports that I had written for our CTO at the time because now the column name for the LDC account number existed in two of the tables, so being young, dumb, thinking that this column wasn’t needed, I mean really how hard is it to write a JOIN, I dropped it from the table.  This is where the walls came crashing down.

Being relatively new to SQL there were a lot of things I didn’t know, like for example, SELECT * is dangerous to use, especially in auditing triggers when the table definitions don’t match.  When i dropped the column from the base table, I didn’t change the audit table to match, causing the columns to mismatch and the trigger to fail.  Now if the only thing that actually used this table was me, stupid assumption on my part, this wouldn’t have been a big deal, but the new account creation process had been recently changed to add new accounts to this table to create a billing hold during the account setup period.  This is actually what broke, and it broke very fast causing a cascading effect of problems across multiple systems.  It took the DBA and application developers a bit of time to trace this down and when they did I got a phone call asking what I had changed.

Such a small change, such a big mess.  The impacts of this solidified for me that there is no such thing as a small change to a production database very early in my career.  It took multiple people nearly two hours to cleanup the residual effects of this small change, primarily because a new account manager had to manually reenter the failed new accounts into the system and let them process again.  I’d like to say that I never made a change outside of change control processes after this again, but I’d be lying if I did.  However, I can say that I haven’t made untested adhoc changes like that ever again, and it has been over two years since I have made any undocumented change to a system.  (Well at least intentionally, I have made mistakes and been connected to the wrong server and done seriously bad stuff like dropping a 80GB table which is why I use SSMSToolPack to color code my servers now.)

Published Tuesday, August 03, 2010 10:50 PM by Jonathan Kehayias

Comments

 

Adam Machanic said:

Yes, yes, yes, yes, yes, yes, and yes!

Couldn't agree more. Just had a debate about this the other day, as a matter of fact... only a few hours after some stuff had broken due to lack of a properly designed change control process. And the person STILL insisted that there are "low-risk" changes. They'll never learn.

August 3, 2010 10:32 PM
 

Sankar Reddy said:

You are spot on and have to appreciate your honesty. Even though I am a one man team (unfortunately) in our organization (20 odd employees), I try to get all db changes reviewed by either our Principal Architect or by our developer (also operations), even though they don't always understand the scope of the change. This process has worked great for us and so far we don't have any screw-ups. And SSMSToolsPack is a life saver. Can't appreciate it enough though.

August 4, 2010 1:51 AM
 

Vikas Rajput said:

Am too in strict agreement on this. Also, want to share one good practice I saw on one account.

We had a change management practice for one of our client which was immensely successful. For each change, be it from Dev or Admin, it was scanned by Infra+DBA Team (i.e. my team). We had all right to turn down change if it didnt accompany: (a). Change Document (carrying change reasons and impact of not having it in place), (b). Dev envt test validation or logs, and (c). roll back script.

I believe the best reason why it worked for us was we had Account Leadership backing us on this. We (Infra) were incident gatekeepers, and we got this agreement in place as a pre-requisite for our role. Not to say that change related incidents were nullified, but they were very very less (uptill through Production).

August 4, 2010 1:57 AM
 

Dave J said:

Absolutely!  Although, even then, things can go wrong.  Just last week we finally implemented a release of data warehouse bug fixes and enhancements that had been through a pretty stringent testing routine.  The release was successful and we all breathed a sigh of relief as other work was dependent on those fixes.

And yet, yesterday, when I got back to this other work, I discovered that the bugs were still there.  It turns out that the one area in which our testing is somewhat less than desirable is in making sure that the code that has been implemented is the code that was meant to be implemented, rather than the existing code being re-implemented.

I was not happy.

August 4, 2010 8:23 AM
 

Ranga Narasimhan said:

In one of the places I worked, we used to have a XYZ_Rollback folder for a XYZ_DBChange folder for any production change....If things did not work well, even after a good QA test, we can always quickly rollback. Every change script, should have a rollback script.

August 4, 2010 10:39 AM
 

Marshall Sutherland said:

I wish I could say that was the worse mess we ever had to clean up around here, Jon, :-).

I started color-coding with SSMSToolKit after you mentioned it a few months ago. Very handy! That is the feature that made me get it.

August 4, 2010 1:13 PM
 

Armando Prato said:

Rhetorically speaking, isn't it just common sense?  Yet, a lot of shops just continue to add changes straight to prod without a process in place.  Lunacy.

August 4, 2010 1:38 PM
 

Nimit Parikh said:

Strongly agreed with you although sometime I do feel that its little headache but at the end of day my production environment is more secured and less unknown outages to troubleshoot with, so can't complain either. :)

August 4, 2010 7:20 PM
 

Oscar Zamora said:

Happened to me early in my career while managing an Oracle Production Database. I dropped a table named "Temp" and hell broke loose. Online orders were not able to be fulfilled because of that change.

I needed to restore it from a backup to get us up and running. But seriously, who would have thought that "Temp" was a critical table?

October 22, 2010 12:20 PM
Anonymous comments are disabled

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement