Now that I’ve got your attention, do I really mean that? Of course, I do in the context of T-SQL performance tuning and to a very large extent.
Many in the SQL Server community have long preached a set-oriented approach to improving database query performance, and it has been a prevalent view. It was appropriately so some years ago when people versed in the ISAM-style databases such as dBase, FoxPro, and Access started to invade SQL Server. But for this view to remain prevalent today, it would be more misleading than constructive in T-SQL performance tuning. My personal experience is that, having been bombarded by the set superiority propaganda and ridiculed and shamed to death for having used cursors and other so-called procedural constructs in T-SQL code, most T-SQL developers have gotten the message and gotten comfortable with the set oriented approach (or the spirit of it) – whatever that may be! Some may have gotten way too comfortable and have deployed a huge collection of very complex queries into production.
I know that may be a controversial observation. But it is my personal experience of late. Most of my recent performance tuning experience has not been to turn procedural code into set-oriented code, but rather to dismantle complex set oriented code and pepper it with procedural constructs for the ultimate goal of better performance. And that has mostly been successful! Have I recently gotten rid of procedures in favor of sets and thus improved performance? Yes, absolutely. But the percentage of that is rather small, I must admit. Now, perhaps I live in a highly skewed sample space, and I can only speak about my own experience. As they say, your mileage may vary!
However, there is something much more fundamental and common than an argument between sets and procedures. And that is the realization that all computer programs are fundamentally loops and performance tuning is none other than optimizing these loops. That sounds like a platitude and may trigger no useful action. It is certainly not when it’s forgotten and I contend that a lot of people have forgotten about this platitude, or forgotten to keep it in mind when doing performance tuning.
So if all that boils down to is loops, what does it mean for T-SQL performance tuning? It implies the following:
- First, T-SQL loops may be explicit or may be implicit. When you use a while loop, you are looping through a construct of your choice (e.g. servers, databases, tables, and so on). Or when you iterate through a cursor, you are looping, most likely, through some kind of resultset, i.e. looping through some rows. These loops are explicit. But even if you are a set-solution purist, you are still doing a lot looping. Just take a look at any of your query plans, and you see loops everywhere, particularly in any of the join processors. These loops are less explicit, and you certainly have not constructed them explicitly. Also, when you apply an otherwise harmless looking scalar function to some of the columns in your T-SQL resultset, you have introduced yet another set of loops, which can be very expensive in some scenarios.
- Loops may be seen at different levels. At the very low level, if you look at the machine code, all programs, including SQL Server code, are translated into billions of loops spinning the computer processors. But this doesn’t help performance tuning T-SQL code. Identifying and checking out the loops immediately introduced and used by the query optimizer is useful. Most naturally, we can identify loops in the T-SQL code using the T-SQL constructs. In addition, some loops are pushed outside of T-SQL and SQL Server to the app level and may not be visible when you inspect T-SQL code. But they are as relevant to your SQL Server performance as any loops inside your T-SQL code.
- Of course, just identifying the loops is not enough. Performance tuning means that you need to look for ways to reduce the total cost of all the loops:
- There are often unnecessary loops that SQL Server may have to go through for a given task. These loops may be introduced for a variety of reasons. Maybe, they used to be useful, but are no longer so. Maybe, they were introduced as a side effect of some other changes. Whatever the reason for their introduction, eliminating them means better performance. In other words, you can improve performance by reducing the number of loops.
- Not all loops are equal. Some loops are far more expensive than others, and some are unnecessarily more expensive that others. For instance, you may be doing something in an inner loop that can be done in an outer loop. By moving that work to the outer loop, you have made the loops less expensive. Note that I’m not talking about just explicit loops. Depending on how they are written, two equivalent queries (in terms of their resultsets) can exercise loops that are dramatically different in how expensive they are. In other words, you can improve performance by cutting down the fat in each loop.
- If the goal of performance tuning is to complete a task in a given time interval, i.e. if the elapsed time is a critical factor, perhaps you can improving performance by running some loops in parallel, or running some loops outside of the time interval by, for instance, pre-processing some loops.
What I have described so far is a rather generic perspective, or a general way of looking at a problem space. But my own experience is that it is a constructive and productive perspective, nonetheless. I hope to demonstrate that experience in some future blog posts with concrete examples in which conscious effort to look for, eliminate, reduce, or parallelize loops borne significant fruit in improving T-SQL application performance. And that effort is certainly not set oriented!