THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Enjoy Another Sandwich -- Kent Tegels

Yummy slices of SQL Server between slices of .NET and XML

T-SQL statements with multiple Common Table Expressions: Yes, we can.

Yesterday I was mulling over the SQL Server tweets and read one by Michelle Ufford (aka SqlFool). She was answering another developer's questions about the use of Common Table Expression in which she asked if it was possible to have multiple CTEs in a single query.

Yes and No.

No in the sense that you cannot nest CTEs. That is, a CTE cannot be written such that its results into another CTE which in turn feeds the query in question.

Yes in the sense that you can define multiple CTEs as "peers." That is, you can have as many CTEs as you like -- each separated by a comma -- which feed a query.

This is more than just syntax and semantic sugar. Remember than one of the use cases for a CTE is to write an executable query where you can re-use a resolved result set over and over again without rewriting its query each time as sub-queries. There may be cases where you need to do that with more than one feeding query too.

Here's an example. Sure, there's other ways to write this, but it demonstrates the use of peered CTEs. Let us say we have table that represents students scores on an exam. Each time the class runs, a new generation of the test is issued. Ideally, the student scores on the test should be statistically similar for each class assuming a normal distribution of students. You decide to see if that's really the case. You decide that you would like to know how many students scores from the third test generation were between the average score less one standard deviation from the first generation and the average score plus one standard deviation from the second .

Let us start the query by generating some data in a table variable:

create table #t(id int not null primary key clustered,score decimal(5,2) not null,generation tinyint not null);
declare @score float=1.0;
declare @group tinyint = 0;
declare @index smallint = 1000;
while @index > 0 begin
set @score = 100*RAND();
set @group = RAND()*5+1;
insert into #t values (1000-@index,@score,@group);
set @index -= 1;
end;

You've heard about CTEs and wonder "Might they might help in this case" That really is an interesting question. Here is one way to solve the problem using CTEs:

-- find the means and S for the first two generations of data
-- (shows using peer CTEs)
-- on my test machine, total sub-tree cost is 0.02304924
with g0(a,s) as (
select AVG(score),STDEVP(score) from #t where generation=1
), g1(a,s) as (
select AVG(score),STDEVP(score) from #t where generation=2
)
-- find the number of criteria matching records
select COUNT(score),avg(score),stdevp(score) from #t,g0,g1
where generation > 1
group by g0.a,g0.s,g1.a,g1.s
having avg(score) between g0.a-g0.s and g1.a+g1.s;

And another using sub-queries:

-- with subsqueries instead
-- on my test machine, total sub-tree cost is 0.0197593
select COUNT(score),avg(score),stdevp(score) from #t
where generation > 1
having AVG(score) between (
select AVG(score)-STDEVP(score) from #t where generation=1
) and (
select AVG(score)+STDEVP(score) from #t where generation=2);

So here's where this gets interesting. Which of these two queries has a lower query cost and why? If the hairs on the back of your neck started rising with the CTE-based query after reading the where clause... GOOD, they should have. What we have done there is creating a Cartesian product. Most DBAs and Developers would cringe at that. After all, aren't you bringing as many rows into scope as the product of the sizes of the base table and the two result sets from the CTEs? Yes, you are. But even so, 1000x1x1 is still just one thousand. The problem is that the query engine does not really grok that. If you execute the two plans sequentially with "show actual plan" turned on, you'll see that the CTE-driven plan has a considerably higher cost that the sub-query plan. Why? You have probably already guessed! It is the expense of doing the Cartesian product.

The actual plan shows us this nicely:

Click here see picture. For some reason, inserting the IMG tag here causes IE to freak out. Yuck!

1: Yes, yes, I know, I know. Statistically this is not really meaningful. You should be using Student's T-Test comparing each group individually. It is just an example used here to help people understand the peer CTE concept. Before sending me a whiney email, statboy, just keep that in mind.

Published Wednesday, May 06, 2009 11:53 AM by ktegels

Comments

 

Michelle Ufford said:

Thanks, Kent!  I appreciate the blog post and clarification.  :)

May 6, 2009 12:37 PM
 

Adam Machanic said:

Kent,

You can in fact "nest" them--not in the way you're talking about, but subsequent "peers" can reference previous ones:

;with

a as (select 'a' as x),

b as (select * from a)

select * from b

... this is quite powerful as you can quickly build up complex expressions that with derived tables would require lots of code repetition.

May 6, 2009 1:40 PM
 

Linchi Shea said:

> This is more than just syntax and semantic sugar.

Non-recursive CTEs are syntactic sugar because you can re-write them with derived tables, for instance, although this doesn't mean syntactic sugar has no value. If it saves some typing or makes the query more readable, it's good to have.

> Remember than one of the use cases for a CTE is to write an executable query where you can re-use a resolved result set over and over again without rewriting its query each time as sub-queries.

I don't know exactly what you meant by 'resolved result set'. But a CTE will not be 'resolved into a resultset', and have that same resultset used again and again in the latter part of the same query statement. Rather, it'll be resolved wherever it's used. If it's used multiple times, it'll be 'resolved' multiple times.

May 6, 2009 2:48 PM
 

ktegels said:

Michelle, welcome

Adam: true, thanks. Have you ever used that in a practical way? If so, how was performance?

Linchi, thanks for pointing that out, here's a little bit of confirmation. Its a bit messed up that the QO doesn't recognize its using the same query part again and avoid this IMHO. However, in a sense, really goes to prove the point. Even though you can use a CTE to do this sort thing, you probably shouldn't it.

with g0(a,s) as (

select AVG(score),STDEVP(score) from #t where generation=1

)

-- find the number of criteria matching records

select COUNT(score),avg(score),stdevp(score),g0.a,g0.s from #t,g0

where generation > 1

group by g0.a,g0.s

having avg(score) between (g0.a-(1.0*g0.s)) and (g0.a+(1.0*g0.s))

union all

select COUNT(score),avg(score),stdevp(score),g0.a,g0.s from #t,g0

where generation > 1

group by g0.a,g0.s

having avg(score) between (g0.a-(0.01*g0.s)) and (g0.a+(0.01*g0.s))

May 6, 2009 5:07 PM
 

Adam Machanic said:

Kent: Yes, I use it all the time on my current project--relational DW with a lot of complex reporting requirements.  "Stacking" CTEs is great for readability; no perf issues that wouldn't be there with derived tables, views, or any other similar construct.

May 6, 2009 7:46 PM
 

Alex Kuznetsov said:

Adam,

Are you saying that CTEs may perform better than inline TVFs?

I prefer views or inline TVFs - I can test them separately.

May 6, 2009 8:14 PM
 

Rob Farley said:

Alex/Kent,

Have you seen Itzik's table of numbers?

with

N0 as (select 1 as n union all select 1),

N1 as (select 1 as n from N0 a, N0 b),

N2 as (select 1 as n from N1 a, N1 b),

N3 as (select 1 as n from N2 a, N2 b),

N4 as (select 1 as n from N3 a, N3 b),

N5 as (select 1 as n from N4 a, N4 b),

nums as (select row_number() over (order by (select 1)) as num from N5 a, N5 b)

select num

from nums

where num <= 1000000;

This returns very quickly... I can't imagine TVFs would do better. And I'd hate to try to do this using derived tables.

Rob

May 7, 2009 11:46 PM
 

Adam Machanic said:

Rob: I've been using that technique on and off, and have had some problems with the QO coming up with crazy plans.  I've decided to stick with the numbers table for most cases.

May 8, 2009 10:05 AM
 

Adam Machanic said:

Alex: Inline TVFs and CTEs should perform exactly the same.  I don't think you can "prefer" one over the other.  I have a stored proc I recently finished that is just one query, with eight stacked CTEs.  I would not create eight TVFs and the stored procedure -- management nightmare!

May 8, 2009 10:07 AM
 

Alexander Kuznetsov said:

Rob,

When I wrapped your query as an inline TVF, I got the same performance, as I expected. I'm with Adam here - I stopped using this approach and use a regular table instead. here is the TVF:

CREATE FUNCTION dbo.GetNumbers()

RETURNS TABLE

AS RETURN(

WITH

N0 as (select 1 as n union all select 1),

N1 as (select 1 as n from N0 a, N0 b),

N2 as (select 1 as n from N1 a, N1 b),

N3 as (select 1 as n from N2 a, N2 b),

N4 as (select 1 as n from N3 a, N3 b),

N5 as (select 1 as n from N4 a, N4 b),

nums as (select row_number() over (order by (select 1)) as num from N5 a, N5 b)

select num

from nums

where num <= 1000000);

GO

SET STATISTICS TIME ON

SET STATISTICS IO ON

GO

SELECT num INTO #t FROM dbo.GetNumbers()

SQL Server parse and compile time:

  CPU time = 71 ms, elapsed time = 71 ms.

SQL Server Execution Times:

  CPU time = 390 ms,  elapsed time = 394 ms.

Here are results from your query, also inserting into a temp table:

SQL Server parse and compile time:

  CPU time = 63 ms, elapsed time = 69 ms.

SQL Server Execution Times:

  CPU time = 422 ms,  elapsed time = 416 ms.

(1000000 row(s) affected)

May 8, 2009 10:49 AM
New Comments to this post are disabled

About ktegels

Kent Tegels passed away on July 31, 2010. Kent was an Adjunct Professor at Colorado Technical University and a member of the technical staff at PluralSight. He was recognized by Microsoft with Most Valuable Professional (MVP) status in SQL Server for his community involvement with SQL Server and .NET. Kent held Microsoft Certifications in Database Administration and Systems Engineering, and contributed to several books on data access programming and .NET. He was a well known industry speaker, and resided in Sioux Falls, South Dakota.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement