THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

John Paul Cook

Aggregating Aggregates and UNION vs. UNION ALL

Sometimes you need to aggregate the results of multiple queries. There are several ways to accomplish this (a CTE is particularly elegant). I decided on the following solution because it provides instruction in aggregates, aliases, and when to use UNION vs. UNION ALL.

For this example, the COUNT aggregate function is used to find the number of rows in each of several tables. To find the total number of rows for all of the tables, you need to SUM the results of all of the COUNTs. In other words, the result sets from all of the queries need to become rows in a single table. To get the desired grand total, use the SUM aggregate on this single table.

The following two queries return the total number of customers and the total number of employees:

select COUNT(*) from Sales.Customer
select COUNT(*) from HumanResources.Employee

-----------
19820

(1 row(s) affected)

-----------
290

(1 row(s) affected)

Because the queries are identical in the number of columns and data types returned, they can be combined using the UNION operator. Simplistic and naïve testing suggests that this may work:

select COUNT(*) from Sales.Customer
UNION
select COUNT(*) from HumanResources.Employee


-----------
290
19820

(2 row(s) affected)

UNION is not the correct operator to use as this test case reveals:

select COUNT(*) from HumanResources.Employee
UNION
select COUNT(*) from HumanResources.Employee

-----------
290

(1 row(s) affected)

UNION contains an implied DISTINCT operator that causes identical rows to disappear. This can lead to miscounts when SUM is used later. The correct operator to use is UNION ALL:

select COUNT(*) from HumanResources.Employee
UNION ALL
select COUNT(*) from HumanResources.Employee

-----------
290
290

(2 row(s) affected)

Now we know that we need to build on the following query:


select
COUNT(*) from Sales.Customer
UNION ALL
select COUNT(*) from HumanResources.Employee

We need to SUM the results of our UNION ALL query, but SUM requires a column name. That means aliases (nrows) must be used.

select COUNT(*) as nrows from Sales.Customer
UNION ALL
select COUNT(*) as nrows from HumanResources.Employee

nrows
-----------
19820
290

(2 row(s) affected)

To query from our new result set, we once again must use an alias (counts) for our tabular result set to make the SELECT * FROM work:

select * from
(select COUNT(*) as nrows from Sales.Customer
UNION ALL
select COUNT(*) as nrows from HumanResources.Employee) as counts

nrows
-----------
19820
290

(2 row(s) affected)

To finish up, replace the wildcard with the SUM aggregate. Another alias (GrandTotal) is used (but not required) to provide a descriptive column header.

select SUM(nrows) as GrandTotal from
(select COUNT(*) as nrows from Sales.Customer
UNION ALL
select COUNT(*) as nrows from HumanResources.Employee) as counts

GrandTotal
-----------
20110

(1 row(s) affected)


Published Tuesday, September 23, 2008 11:39 AM by John Paul Cook

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Leave a Comment

(required) 
(optional)
(required) 
Submit

About John Paul Cook

SQL Server developer and Microsoft MVP for Virtual Machines based in Houston, Texas.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement