THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Louis Davidson

ANY and ALL, Two Keywords I Had Never Taken the Time to Understand

As I am starting to prepare for an upcoming blog series on database designs and associated query patterns (stay tuned!) I was thinking about what query keywords I don't really know well enough. ANY and ALL were two keywords that immediately popped into my mind that I had never really used (and I probably should have.)

ANY and ALL are used as options on scalar comparison operators like =, !=, >, etc to allow them to be used with multi-row sub-queries.

You know if you do something like the following:

select *
from Sales.SalesOrderHeader
where SalesOrderId = ( select SalesOrderId
                       from Sales.SalesOrderDetail
                       where SalesOrderId in (43659, 43660, 43661))

You will get the following error:

Msg 512, Level 16, State 1, Line 11
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

However, you can change this to be:

select *
from Sales.SalesOrderHeader
where SalesOrderId = ANY (  select SalesOrderId
                            from Sales.SalesOrderDetail
                            where SalesOrderId in (43659, 43660, 43661))

And now each value in the left input will be checked against the set on the right.

Admittedly, this isn't that useful, as = ANY is basically equivalent to IN, as in the following statement. Heck, I did use IN in the example to get the 3 rows for the test for a reason:

select *
from Sales.SalesOrderHeader
where SalesOrderId in (43659, 43660, 43661)

One big difference between IN and = ANY, is that it only works with subqueries so this will not work

select *
from Sales.SalesOrderHeader
where SalesOrderId = ANY (43659, 43660, 43661)

Will result in:

Msg 102, Level 15, State 1, Line 3
Incorrect syntax near '43659'.

The next thing you might think (I did) is that if = ANY is the same as IN, then != ANY must be the same as NOT in, right? No, because the != ANY is the operator, and you have to think about what “Does not equal ANY” means. Actually the following query will return every row in the table:

select *
from Sales.SalesOrderHeader
where SalesOrderId != ANY (select SalesOrderId
                           from Sales.SalesOrderDetail
                           where SalesOrderId in 43659, 43660, 43661))

Why? What the “!=ANY” operator says is to return a row that the row values doesn't match any subquery row value, so if even one row doesn't match, then it will be successful. So take 43659. It does match 43659, but it doesn't match the other 2 rows. Or as this page in Technet puts it, NOT IN says != value1 and != value2 and != value3, where != ANY is != value1 or != value2 or != value3.

In this case, you could use the ALL keyword, which says to check the value against every value returned, and the value returned must match all values (Not tremendously interesting with an equal operator, but essential to understanding the operators), but if any row matches, it doesn't match all of them. When no rows are returned by the subquery, it returns true. So the following (with the subquery negated with the 1=2 will return all rows in the table:

select *
from Sales.SalesOrderHeader
where SalesOrderId = ALL (select SalesOrderId                
                          from Sales.SalesOrderDetail
                          where SalesOrderId in (43659, 43660, 43661)
                            and 1=2)

But, if this is the case, then != ALL should return = rows, right? Wrong.

select *
from Sales.SalesOrderHeader
where SalesOrderId != ALL (select SalesOrderId
                           from Sales.SalesOrderDetail
                           where SalesOrderId in (43659, 43660, 43661)
                             and /* again */ 1=2)

Both return the exact same set of rows. The problem (and why I could wrap my head around these operators) is that the operator is truly = ALL and != ALL. So = ALL says that you must match ALL values, but != ALL means that you must be different than all individual values. So in the following, it will return all rows in the table except the three values from the subquery:

select *
from Sales.SalesOrderHeader
where SalesOrderId != ALL (select SalesOrderId
                           from Sales.SalesOrderDetail
                           where SalesOrderId in (43659, 43660, 43661))

What I particularly like about the ANY and ALL keywords, is that they are named in a very straightforward manner, once you get the idea of what they do, but as noted, it wasn’t necessarily intuitive to start with for me. On a practical side, what if we want to compare one value to all of the values in a related set. In my example (using ye olde AdventureWorks database), I want to see what orders do not have any sales order items where the UnitPrice is less than 500. A classic way of making this comparison would be to use an aggregate on the salesOrderItems to get the max UnitPrice for all items and use it as a filter:

select *
from Sales.SalesOrderHeader
where SalesOrderId in (    select SalesOrderId
                           from Sales.SalesOrderDetail
                           group by SalesOrderId
                           having max(UnitPrice) <= 500)

It is a technique I have used many times over. But, really, what would be easier would be to check each item against the scalar value, rather than doing an aggregate. Using the >= ALL operator, we can do this in a direct method. Each row is compared to the value on the left side of the operator.

select *
from Sales.SalesOrderHeader
where 500 >= ALL (    select UnitPrice
                      from Sales.SalesOrderDetail
                      where SalesOrderHeader.SalesOrderId = SalesOrderDetail.SalesOrderId)

Using AdventureWorks2014 as it is, freshly downloaded, the performance for both queries is very much evenly matched, based on the plan:

clip_image002

And Statistics IO and Time are very much similar as well. In my tries, the IN version took a few milliseconds more than than the >= ALL version. But note that the >= ALL query suggested an index. This is the index (with a slightly better name!):

CREATE NONCLUSTERED INDEX unitPrice_include_salesOrderId
                      ON [Sales].[SalesOrderDetail] ([UnitPrice]) INCLUDE ([SalesOrderID])

From there, you will see a fairly large (though not necessarily, amazing) improvement:

clip_image004

The CPU and reads were quite a bit better, with the >= ALL version needing:

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahea

Table 'SalesOrderHeader'. Scan count 1, logical reads 689, physical reads 0,

Table 'SalesOrderDetail'. Scan count 1, logical reads 96, physical reads 0, r

SQL Server Execution Times:

CPU time = 78 ms, elapsed time = 318 ms.

While the IN version needed:

Table 'SalesOrderHeader'. Scan count 1, logical reads 689, physical reads 0,

Table 'SalesOrderDetail'. Scan count 1, logical reads 1246, physical reads 0,

SQL Server Execution Times:

CPU time = 125 ms, elapsed time = 361 ms.

The elapsed times were similar, but there was an improvement over pretty much equal as well.

To test to make sure you have the correct answer we can use the following query to look at the values that are returned. It takes the basic query and looks at the data for each of the salesOrderDetail rows for a salesOrder:

select SalesOrderId, max(UnitPrice) as MaxUnitPrice, min(UnitPrice) as MinUnitPrice
from sales.SalesOrderDetail
where salesOrderId = ANY ( select SalesOrderId
                           from Sales.SalesOrderHeader
                           where 500 >= ALL (select UnitPrice
                                             from Sales.SalesOrderDetail
                                             where SalesOrderHeader.SalesOrderId =
                                                         SalesOrderDetail.SalesOrderId)

                                                      )
group by salesOrderId
order by MaxUnitPrice desc, MinUnitPrice desc

Which will return a result set like:

SalesOrderId MaxUnitPrice          MinUnitPrice
------------ --------------------- ---------------------
46641        469.794               469.794
46934        469.794               469.794
46963        469.794               469.794
47022        469.794               469.794
47048        469.794               469.794

73273          2.29                  2.29
73040          2.29                  2.29
51782         1.374                  1.374
53564         1.374                  1.374
65214         1.374                  1.374

Which you can see, all of the max values are <  500 (So 500 is greater than all of the values).

I haven't done a tremendous amount of performance testing (as you can tell), but it is clear to me that ANY and ALL have the propensity to be of some value in queries on occasion (and not just to get a blog entry posted so I don't vanish from the SQLBlog role again this month.

Published Saturday, June 13, 2015 12:21 AM by drsql

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Links to my other sites

Archives

Privacy Statement