THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Recursive SQL queries: How do they work?

Recursive SQL queries are built with recursive common table expressions (CTEs). You can look at the execution semantics of a recursive CTE from two different, but equivalent, perspectives.

 

We can call the first one the UNION ALL view. The following pseudo code that illustrates how a recursive CTE executes is taken from the SQL Server 2005 Books Online:

 

  1. Split the CTE expression into anchor and recursive members.
  2. Run the anchor member(s) creating the first invocation or base result set (T0).
  3. Run the recursive member(s) with Ti as an input and Ti+1 as an output.
  4. Repeat step 3 until an empty set is returned.
  5. Return the result set. This is a UNION ALL of T0 to Tn.

This is an intuitive, and—some may even argue—is very much a set-based view of recursive CTEs because it is described with query resultsets and the UNION ALL operator.

 

The second approach can be labeled as the stack view. The following pseudo code defines the execution semantics of a recursive CTE:

 

  1. Run the anchor member and push the resultset in a stack.
  2. Let a cursor point to the first row from the bottom of the stack.
  3. Use the current row to evaluate the recursive member, and push the resultset into the stack.
  4. Move the cursor one row up in the stack.
  5. Repeat Step 3 until the cursor is moved beyond the top of the stack.

If you don’t like the word ‘cursor’, replace it with the word ‘pointer’ J.  As far as I know, the stack view is much less talked about in the SQL Server community. But it's always good to have different perspectives on the same thing.

 

I personally find the stack view less intuitive, though it is actually closer to how a recursive CTE is implemented. When I think about solving a problem with a recursive CTE, I always think with the UNION ALL view. However, one of the DB2 gurus I used to work with insists that the stack view suits him just fine, and he finds it intuitive enough that it is how he thinks when approaching a problem with recursive CTEs.

 

For you folks reading this post, most of you probably think in terms of the UNION ALL of the resultsets when working with recursive CTEs. But I could be wrong.

Published Thursday, April 16, 2009 1:36 AM by Linchi Shea

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Paul White said:

Yes - its UNION ALL for me, though I have on occasion thought of the anchor as creating a number of 'threads' (just conceptually!) where each thread is one row that then executes against the recursive part, if you see what I mean.  On occasion, it has helped me recognize situations where recursion might be useful, which I would not have spotted otherwise.

A developer friend insists that he can think 'recursively' naturally - making it easy to recognize recursion opportunities...sadly I lack that particular gene :c)

April 16, 2009 5:47 AM
 

Adam Machanic said:

Hi Linchi,

Great post!

The key thing most important to me in all of this is understanding that each row of the anchor part (and subsequently, each row of the recursive part) will cause the recursive part to run once.  This is cause for the massive inefficiency associated with recursive CTEs... I hope that they can be converted to use a more "set-based" approach in the future.

April 16, 2009 10:36 AM
 

Brian Tkatch said:

Yep, it's UNION ALL for me. Because it's a set. The stack view seems for suitable for a non-set coder.

April 20, 2009 9:42 AM
 

Adam Machanic said:

Brian:

Linchi's point, as far as I can tell, is that there is no set with recursive CTEs.  It's an illusion of a set.  You can verify that yourself by looking at the query plan, statistics, or creatively applying ROW_NUMBER in the recursive part (Google a bit and you should be able to find an example I've posted on a few forums).  

April 20, 2009 10:28 AM
 

Linchi Shea said:

Set is an illusion with recursive CTEs. But then, we must be careful about at which level we are talking about set-based solutions. One could argue that DBMS vendors are free to implement recursive CTEs and any other SQL constructs however they want internally as long as the defined meaning of the construct is maintained. For instance, a SELECT statement may be internally implemented with walking a pointer through a memory data structure, and no set-base purist would give a fuss about that, right?

With respct to recursive CTEs, the question is: should we expect the UNION ALL pseudo code given in BOL to be the actual implementation of a SQL Server recursive CTE? The answer is no (a yes would contradict the set-based view I reckon). Recursive CTEs are valued for their declarative nature, not for their being a super-construct that will be pre-processed (or expanded or transformed) into a composition of some other SQL constructs. Being declarative is being set-oriented, one could argue.

The other question, of course, is: is the stack-based implementation of recursive CTEs, as described above, always the best for performance? The answer here is not that clear. My impression is that the vendors who have implemented recursive CTEs seem to have reached a consensus that the stack approach is the way to implement recursive CTEs. There may be research to support this.

My limited experience led me to believe that even with the stack approach, there are variations (such as pruning the intermediate 'results' or the stack size more aggressively or earlier in the process) that could have resulted in better performance.

Personally, I believe the performance problems of recursive CTEs all seem to stem from the fact that there is no way to materialize prior CTEs (or cache and control the results of prior CTEs) when their materialization actually makes sense semantically.

April 21, 2009 12:27 PM
 

Adam Machanic said:

Linchi: Assuming that there is at least one equijoin between the anchor and the recursive part, I don't see why the current row-by-row spool approach is necessary.  Why is it possible to build a spool, but not to optimize a bit further and build a hash table for each intermediate set, based on the join predicate?  

April 21, 2009 5:05 PM
 

Linchi Shea said:

Adam;

I'd love to see that. But note that if an intermediate set is 'materialized' (with a hash or whatever algorithm), the execution semantics of the recursive CTE would be different from how a recursive CTE is currently implemented because the materialized intermediate result can become stale. This is not necessarily bad because in many situations the nature of the problem or data can guarantee the correctness of the final result regardless whether an intermediate result is stale or not. In these cases, what you described would be a great welcome performance wise.

April 21, 2009 8:14 PM
 

Adam Machanic said:

Hi Linchi,

OK, then limit that solution to when the user is running SNAPSHOT or RCSI?  Works for me :-)

April 21, 2009 8:32 PM
 

Neal Pointer said:

This sounds like it could run very slowly in general. Does it?

April 22, 2009 8:41 AM
 

Adam Machanic said:

Neal: Yep.  It's a beast.

April 22, 2009 4:40 PM
 

Adam Machanic said:

After weeks of putting it off, I finally found the time and spent the last day and a half judging the

May 31, 2009 4:58 PM
 

sujit said:

nice one

June 23, 2009 8:33 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement