THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Buck Woody

Carpe Datum!

Did that change really fix the problem?

When we’re heads-down on a problem, it’s sometimes far too easy to relax the method we should follow for troubleshooting. We’re supposed to gather as much information as possible, freeze the system as much as possible, and then develop the plan for the steps to correct the problem. Then we’re supposed to make a change, test the change, and solidify the change if it corrects the error.

Most of us have learned to research what really happened, what changed, and then search the documentation, web, forums and other sources for possible fixes.

We also know that “freezing” the system in our case often means ensuring there is a full backup before we change anything. If you don’t follow this rule; don’t worry – you will. The first time you’re at fault when it all goes wrong and you don’t have a way to put it back is the time that you’ll start with this step.

It’s from here on out that the tyranny of the now sometimes gets the best of us. We may create a plan in our head, thinking “this is a simple problem” but an hour later we’re not sure what we’ve touched. Settings have been changed (and most importantly, not changed back), code has been altered, processes have been run and at some point, the problem goes away. We breathe a sigh of relief and go on about our day.

But which “fix” fixed the problem? Or which combination of them? Sure, it might not matter right now – the problem is gone – but at some point it will come back to haunt us.

So the lesson for today (and I always point these diatribes at myself as well) is that we should always follow the scientific method, no matter how “trivial” the problem: hypothesis, test, control. The “control” part is where we change the setting back from the one you changed!

Published Tuesday, March 2, 2010 6:31 AM by BuckWoody



Marty said:

I agree with the sentiment and it's good advice, I would add that the analysis, test and solution approach adopted needs to be appropriate for the situation. In other words sometimes it's better to try something (methodically) and fix the problem quickly rather than meticulously analyse a problem for hours or days while the world falls down around you.

I'll illustrate my point with an example:

One of our major applications started to experience stability problems after we virtualised it, the app was on windows server 2000. We were advised that this wasn’t ideal but given time pressure it would be virtualised “as is”. OK Cool, then it started to go down intermittently, to me the quick option to try was to provision a new windows 2003 virtual server and migrate the app, which wasn’t a big job in this case.

Instead it was decided that we try and find the problem. It took 3 or 4 weeks of the business screaming before it was decided just to try a new windows 2003 server… and hey presto it worked…

March 2, 2010 3:44 PM

Barry said:

"So the lesson for today ... is that we should always follow the scientific method"

I don't know about this one.

Sure its easy to express this thought but in the heat of the moment when you have your boss, customers, and who knows shouting at you about an issue or being down its not always practical to explain to them the "scientific method".

In crises situations, sometimes the Shotgun approach is the correct option.

March 4, 2010 2:25 AM

Jason Strate said:

I definitely agree with using the scientific method.  Too often I've seen people get working on a solution and get it fixed and have no idea what fixed it.

As a consultant, client's always want to know what broke and why.  The extra few minutes to only change one thing at a time versus a few dozen things at once has always helped me to answer those questions.  And made it easy to undo a "fix" that has other unintended consequences.

April 1, 2010 11:26 AM
New Comments to this post are disabled

About BuckWoody

This Blog


Privacy Statement