THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Merrill Aldrich

Busting a persistent myth: views are executed "before" enclosing queries

To all who are SQL savvy, this is old news, but I wanted to put this tidbit up for you to pass along, as a concise proof, to others that might subscribe to this performance myth. On forums and Stack Overflow, it seems I constantly see this misinformation about views: the commenter/poster will argue that a view is expensive because it has to be evaluated "before" other code that uses the view. It may be natural for some devs, who are used to OO or procedural code that is not automatically run-time cost-optimized, to assume that a view would have to be processed first, and then the results fed to the enclosing query logic -- but if you face a performance issue, it's important to understand how this works.

This is a fundamental misconception, and can be disproven pretty easily. In fact, view code is typically "expanded" into enclosing queries that use the view, just like a derived table, and the whole query gets optimized as one unit. Indexed views aside, a view is most like a stored derived table definition, with a name. If a derived table could be optimized in the context of an outer query, then a view typically can too.

Proof

-- Make a table with some test data, big enough for illustration purposes
 
CREATE TABLE dbo.BigHairyTable ( 
    IndexedColumn int PRIMARY KEY CLUSTERED,
    NonIndexedColumn int
);
 
WITH Units ( nbr ) AS ( 
    SELECT 0 
    UNION SELECT 1 
    UNION SELECT 2
    UNION SELECT 3 
    UNION SELECT 4
    UNION SELECT 5
    UNION SELECT 6
    UNION SELECT 7
    UNION SELECT 8
    UNION SELECT 9 )
INSERT dbo.BigHairyTable ( IndexedColumn, NonIndexedColumn )
    SELECT 
          u1000000.nbr * 1000000 
        + u100000.nbr * 100000 
        + u10000.nbr * 10000 
        + U1000.nbr * 1000
        + u100.nbr * 100 
        + u10.nbr * 10 
        + u1.nbr 
        + 1 AS IndexedColumn,
        checksum( newid() ) as NonIndexedColumn
    FROM Units u1,
         Units u10, 
         Units u100, 
         Units u1000, 
         Units u10000,
         Units u100000, 
         Units u1000000 
 
-- http://www.projectdmx.com/tsql/tblnumbers.aspx (Anith Larson?)
 
GO

Next, run and examine execution plans for a "cheap" and an "expensive" query (hence the non-indexed column in the sample data):

-- Compare the execution plans for searching on the indexed vs. non-indexed columns
 
SELECT * FROM dbo.BigHairyTable 
WHERE IndexedColumn BETWEEN 1000 AND 1099
 
SELECT * FROM dbo.BigHairyTable 
WHERE NonIndexedColumn BETWEEN 500000000 AND 500050000
GO

Clearly, the first is cheap because it can seek a few rows from the index (cost = 0.00328 on my laptop) and the second is expensive because it cannot. I get a parallel clustered index scan of the whole table, cost = 25.5, or something around 8000 times more expensive, for the second query.

Next, let's wrap a view around that expensive query:

-- Make a view that is "expensive" because it accesses by the non-indexed column
CREATE VIEW dbo.ExpensiveView AS
    SELECT IndexedColumn, NonIndexedColumn 
    FROM dbo.BigHairyTable 
    WHERE NonIndexedColumn BETWEEN 500000000 AND 500050000
GO
 

Here's the clutch: if the view has to be evaluated before an enclosing query, then these two queries should both be very expensive:

-- Compare the cost of these queries:
SELECT * FROM dbo.ExpensiveView
 
SELECT * FROM dbo.ExpensiveView
WHERE IndexedColumn BETWEEN 1000 AND 1099
GO

But, they are not. The second query is very cheap. It, in fact, has about the same execution plan and cost as our very first sample, picking rows using the clustered index. If the view had to be evaluated "first," then this performance difference could not be true.

What really happens is, approximately, this:

-- This is what is "really" happening (approximately)
 
SELECT * FROM  --dbo.ExpensiveView
(   SELECT IndexedColumn, NonIndexedColumn 
    FROM dbo.BigHairyTable 
    WHERE NonIndexedColumn BETWEEN 500000000 AND 500050000
) as ExpensiveView
WHERE IndexedColumn BETWEEN 1000 AND 1099
 
-- The derived table factors out of this query according to optimization rules,
-- and this query is logically equivalent
SELECT IndexedColumn, NonIndexedColumn 
FROM dbo.BigHairyTable 
WHERE NonIndexedColumn BETWEEN 500000000 AND 500050000
AND IndexedColumn BETWEEN 1000 AND 1099
GO

Exceptions

"So, smart guy," you say, "how come that doesn't work with MY view? Mmm?"

OK, so immediately someone will come back with an example where a view remains expensive to execute, and seems to do painful row-by-row operations. This can happen because of the specific content of the view, but not because of the mere fact that a view exists. The optimizer's rules can only rewrite queries that are precisely, logically, algebraically equivalent (and only a subset of those). So it is possible to create a view that has, for example, a function buried inside that operates on a value from a column, where the optimizer cannot factor that operation out of the enclosing query logic, and ends up performing the operation on every row. But, you can test this yourself, easily, by doing the expansion shown above: copy the query text from the view, paste it into your "outer" query as a derived table, and look at execution plans to see where the hang-up is.

It's also possible that a whole pile of nested views won't optimize fully because the optimizer "gives up." Query optimization is expensive, and the search for the best possible query does get pruned in some ways by the optimizer, so that we don't wait all day for query optimization. At some point the system has to get down to business and find your data.

A heap of complex nested views can, then, present a performance problem because they won't optimize fully. Just know, as you troubleshoot that scenario, that at least it's not because views execute "first."

Published Thursday, February 11, 2010 10:49 AM by merrillaldrich
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Harsh said:

Quite true. View is expanded before preparing execution plan and hence from query optimizer's perspective it is same as one big query. Another misconception is that view executes faster than normal query. This misconception stems from the thinking that view is precompiled and its query plan is cached which is not true either. View is a logical concept using which you can logically breakup query in smaller chunks for reducing complexity.

February 11, 2010 4:46 PM
 

Bob Probst said:

@Harsh:  I've heard that bit about Views being faster too.  A quick look at a query plan can bust both these myths.  The objects used in a view get optimized into the query plan just like any subquery might.

February 12, 2010 8:52 AM
 

Boston Web Developer said:

Hey:  

How do indexed views fit into this then?  They must be preprocessed in order to build the indexes, right?  I suppose the view can still be dynamically executed, if the index is kept up-to-date separately.

A good test might be to make a third column in your test data, and add it to your view with an index, and then check your execution plan.    

February 16, 2010 6:32 PM
 

merrillaldrich said:

Hi BWD - An indexed view differs from a "normal" view in some important ways:

1. It is "materialized" -- the data in the indexed view is stored on disk, all the time, rather like a table -- but a table that is constantly and automatically kept up to date by the server engine whenever the base tables change. Indexes for such views are not built at run time, they are built when the view is created, and then they are maintained with all changes to the underlying data, just like an index on a table.

2. SQL Server optimizes for indexed views differently in different editions, because it is viewed (pardon the pun) as an enterprise/scalability feature. In Enterprise Edition, the optimizer will automatically elect to use an indexed view's index(es) to fetch data if it seems useful. In lower editions, you can only manually direct the server to use an idexed view with a query hint (NOEXPAND) and they will not be used automatically.

So, an indexed view is *roughly* like having an auxilliary table that the server engine can maintain automatically, from base tables. In EE, they act transparently as another possible access path to data, in lower editions, they can only be manually queried by invoking a hint that overrides normal view expansion.

See also http://msdn.microsoft.com/en-us/library/dd171921.aspx

February 16, 2010 7:32 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement