The last two weeks, I have been Active Duty for the Army completing the last phase of BNCOC (Basic Non-Commissioned Officers Course) for my MOS (Military Occupational Specialty). While attending this course a number of things stood out to me that have practical application in the civilian sector as well as in the military. One of these is the necessity and purpose behind Standard Operating Procedures, or as we refer to them SOPs.
In the Army we have official doctrines, often in the form of Army Regulations, which provide guidance, rules, and policies for tasks. Standard Operating Procedures are generally created whenever a topic is not covered by a standing policy, or to provide clarification of existing policies. SOPs already exist in many civilian companies, and they provide a number of benefits that are important in managing SQL Server in your own environment.
An SOP isn’t Standard.
Despite their name, SOPs, are usually not standard. In the Army they exist at various levels and are often different from unit to unit, platoon to platoon, and even squad to squad. An example of a SOP that varies widely in the Army is squad level Military Operations in Urban Terrain (MOUT), which is generally outlined in Army Field Manual 3-06 (FM 3-06). The core fundamentals behind clearing a building or room are the same unit to unit since they are prescribed in the FM. Room clearing is generally performed using a four man fire team, which is known as a stack. However, there are multiple accepted methods of signaling inside a stack that the team is ready to move into a room. Which method a specific team uses is prescribed in their SOP, and is based on what the team is comfortable with from their experiences and training together. Two teams in the same building may use different signaling inside their respective stacks while performing clearing procedures.
This flexibility can also be of benefit with SQL Server as well, especially in larger businesses where multiple DBA teams exist to support different aspects of the business. What works for one DBA team may not work for another DBA team based on the requirements for the systems that they are supporting, their past experiences, and even the manager of the team. Multiple SOPs may exist for different occurrences of the same classification of problem. For example, a SOP for how to deal with a failed SQL Agent Job, probably wouldn’t be very effective if it were applied globally to all failed Jobs in a complex environment. For one job the SOP may be to rerun the job, while for a different job it might provide detailed steps required to troubleshoot why the job specifically failed.
An SOP provides a defined set of procedures for handling a specific task, or a defined reaction to a specified event.
One of the key reasons that the military and civilian businesses have SOP’s in place is to define how to handle a specific task in detail. This ensures that the task can be accomplished in a repeatable, consistent manner with a predictable outcome regardless of who is actually performing the task. Most Army schools have a Barracks SOP, that defines the rules for living in the barracks while attending school. School to school the SOP is slightly different, but the basic layout usually ends up being similar. While it might seem ridiculous how detailed this particular SOP gets the purpose is to establish a uniform method of setting up the display so that everything is “dress right dress.” The other reason for having SOP’s in place is to ensure a predictable response to a specific event, like a failed backup a full transaction log, a failed CHECKDB, cluster/mirroring failover, or any other event/problem you can think of in SQL Server.
In my last job, I had a number of SOP’s written, though I didn’t call them SOP’s, they were included in my run book as troubleshooting measures for our Oracle DBA when I was out of the office. They covered common items like failed backups, refreshing test databases from production backups, deploying changes into QA and Production, and a few of other items that had popped up while I was out that I got around to covering. This provided a level of continuity in the event that I was unavailable to log in to our systems while I was out of the office, and made it so I could actually take a vacation without being tied to my laptop and cell phone.
An SOP should be written down, but doesn’t necessarily have to be.
In most circumstances a SOP should exist in a written form that has been reviewed and agreed to or approved by management. This ensures that the SOP addresses the necessary information for its intent, and is durable for future reference and available for new team members to read. However, there are scenarios like the squad level signal for entering a room, where the SOP generally isn’t written and doesn’t need review. If a team member changes, the team dynamic changes, and the signal that works for the team and is defined in their SOP may change.
When applicable to SQL Server, most if not all SOP's should exist in a written form, to include most common agreements between team members about who will perform what operations when, and how. The reason I say this is that unwritten SOPs are subject to miscommunication and arguments/debates when things go wrong. In room clearing operations, following the unwritten SOP can mean the difference in life and death, and the team practices execution sufficiently to the point that the operation is second nature, where the team acts as a single unit. This is rarely the case with SQL Server, and I am a firm believer in covering your bases (aka CYA), and a written procedure helps do that.
Over the next few weeks, I will try to blog a few of the Standard Operating Procedures that I either have in place, or plan to implement in my own job. I have already listed a few here, but I have ideas for a number of other ones that I need to document and put into place.