THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Andrew Kelly

Sometimes you just have to break it up

I will be the 1st to admit that I am no Itzik Ben-Gan when it comes to writing complex SQL statements and I wonder at times how people come up with some of the Selects that I see.  Sometimes they are so complex or convoluted that I simply have a hard time figuring out the intent or the logic behind them. But as a performance consultant I pretty much find myself looking at these on a regular basis when they end up scanning very large tables or indexes. I find that many developers have a hard time splitting up the query into what I think are more manageable pieces, especially if there is a temp table involved. Lets face it temp tables are evil aren’t they :).  Well that’s a whole another discussion which we won’t get into now but sometimes they can be be your friend. For instance today I took a rather unwieldy query that was being used in a stored procedure in which the optimizer simply couldn’t come up with an efficient query the way it was written. And one of the issues was that they needed a total count for paging purposes so they were running the same query twice, once for the count and once to get the TOP xx rows. In a nutshell I changed the query to use a UNION which allowed the optimizer to focus better on the SARG’s for each part of the query and placed the resultant rows into a temp table. This gave me the ability to get a count and retrieve the rows using TOP xx without running it twice.  As you can see from the picture the costs were dramatic. The reads went from over 3 Million to just under 5 thousand and the duration went from over 23 seconds to 150 milliseconds.  So bottom line is don’t be afraid to make the query more maintainable and break it up at times. You might be surprised what you can accomplish.

 

clip_image002

Published Tuesday, April 14, 2009 8:19 PM by Andrew Kelly

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Jack Corbett said:

NIce post Andrew.  Steve Jones had an editorial about this  on SqlServerCentral recently.  I've seen breaking thing out like you have in this case regularly improve performance.

April 14, 2009 9:53 PM
 

AaronBertrand said:

Like cursors, we're all trained to avoid them.  For those who need to use the #temp table as a temporary holding ground (procedural programmers mostly), I think it is essential that they learn how to get the result in fewer steps (ideally one), but to not be afraid to experiment with alternative approaches like this when they hit a performance issue with the "big honking query"... like you Andrew I have also used #temp tables to solve the same kind of issue.

April 14, 2009 10:20 PM
 

Adam Machanic said:

I've also used temp tables to improve perf, but on the flip side in my last engagement I was dealing with a lot of code that used way too many temp tables and ended up putting it back together into bigger/more monolithic queries, which helped a lot... I guess it's kind of a Zen thing :-)

April 14, 2009 10:47 PM
 

Rob Farley said:

I've gone both ways, and even this week (with similar ratios of improvement too). I introduced a new CTE which I was hoping was going to be nicely optimised. It wasn't, so I dumped the results out to a temporary table, which then meant that it worked nicely. If I could've marked the CTE as instantiated (or something), then I wouldn't've used the temporary tables.

I see temporary tables over-used, but at the end of the day, they help most people get their logic right. CTEs are a great replacement for temporary tables in most scenarios, but definitely not all.

April 14, 2009 11:30 PM
 

Venkat said:

Modularization is a concept, which nowadays, has taken backseat as the coding as evolved and put more wrappers into the hands of developers.

April 15, 2009 12:12 PM
 

Paul White said:

Hey Andrew,

Simplification is a great thing, generally speaking.  Analyzing a 'honking great query' is never fun - especially when there isn't any documentation to help understand what it is trying to achieve in the first place.

Also, the awesome monolithic query that runs superbly today is tomorrow's headache when statistics or indexes change - or one of a hundred different things which can cause the QO to come up with a suboptimal plan.

I do try to code with the QO in mind.  Yes there are hints and plan guides and so on, but oftentimes the best query is the one which is readily understandable by both humans and optimizers!

Of course, one has to consider all the pros and cons carefully when deciding to break up a query.  Nevertheless, I think it is good to at least stop and think a little once a statement reaches a certain level of complexity, or when the plan chosen starts to look 'fragile' or includes operators which can be indicative of QO 'distress' :c)

I have often meant to sit down and seriously test whether eager/lazy spools are generally more efficient than breaking the query and inserting some explicit temporary storage.  

I suspect that the answer is 'it depends' as usual, though I might expect the case against index spools to be rather more clear cut (except in specialized cases like recursive CTEs maybe).

Paul

April 16, 2009 5:41 AM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement