<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'T-SQL Tuesday' and 'aggregation'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=T-SQL+Tuesday,aggregation&amp;orTags=0</link><description>Search results matching tags 'T-SQL Tuesday' and 'aggregation'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>T-SQL Tuesday #5: My First Cube</title><link>http://sqlblog.com/blogs/kalen_delaney/archive/2010/04/13/tsql-tuesday-5-my-first-cube.aspx</link><pubDate>Tue, 13 Apr 2010 19:00:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:24253</guid><dc:creator>Kalen Delaney</dc:creator><description>&lt;P&gt;&lt;EM&gt;It's time for the fifth &lt;/EM&gt;&lt;A href="http://sqlvariant.com/wordpress/index.php/2010/04/t-sql-tuesday-005-reporting/"&gt;T-SQL Tuesday&lt;/A&gt;&lt;EM&gt;, managed this time by Aaron Nelson of SQLVariations&lt;/EM&gt;&lt;EM&gt;. Once again, the deadline came up just too quickly, and I'm on the road this week, so my entry will not be too long. Aaron's &lt;/EM&gt;&lt;EM&gt;topic is reporting and in keeping with my past posts, this contribution will include a history lesson.&amp;nbsp; &lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Since I first learned SQL, I've always thought of aggregation as a way of producing simple reports. Summary information was frequently all that was needed on an ongoing basis to see what was going on in an organization. Clients would want straightforward summaries of sum of sales per region or average orders per day or per month, or number of products available per category.&amp;nbsp; Once a TSQL programmer was fluent with the GROUP BY operator, these simple reports could be generated very easily. &lt;/P&gt;
&lt;P&gt;But what if we wanted just a bit more? What if we to see a summarization of total sales amount by region and salesperson, and also just by region? &lt;/P&gt;
&lt;P&gt;Way back in the earliest versions of Transact-SQL before we had any 3rd party report writing products, any results we wanted had to be generated purely through T-SQL. We could use a clause called COMPUTE BY that allowed us to get aggregated values for any column in the SELECT list, and we could have multiple COMPUTE BY clauses for a single SELECT.&amp;nbsp; This functionality gave us some ability to have multiple aggregations in a single query, but it had some drawbacks. First, we couldn't just return the aggregate information… we always had to return the detail rows also. Second, the rows that contained the aggregates ONLY contained the requested aggregates, which were based on a single column in the SELECT list; they did not include a value for the nonaggregated columns. This meant that that the result set was non-relational… the detail rows contained different columns than the COMPUTE BY rows. In fact, it was by understanding COMPUTE BY all those years ago that I learned exactly what was meant by relational result set. [COMPUTE BY is actually still in the product, even thought it produces non-relational results. However, the documentation for SQL Server 2008 indicates that it will finally be removed in the &lt;EM&gt;&lt;STRONG&gt;next&lt;/STRONG&gt;&lt;/EM&gt; version. ]&lt;/P&gt;
&lt;P&gt;Having an interest in using aggregation for reporting, I was very interested in two new options to GROUP BY that were introduced in SQL Server 6.5. There were the CUBE and ROLLUP extensions. Their addition to the SQL language were described in a paper by Jim Gray in 1997, which you can read here:&lt;/P&gt;
&lt;P&gt;&lt;A title=http://arxiv.org/ftp/cs/papers/0701/0701155.pdf href="http://arxiv.org/ftp/cs/papers/0701/0701155.pdf"&gt;http://arxiv.org/ftp/cs/papers/0701/0701155.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;This&amp;nbsp; paper was a real eye-opener when I first read it 13 years ago. And it does an awesome job of explaining why a CUBE is called a CUBE.&amp;nbsp; I'm reproducing one figure from this paper here, that shows the genesis of the cube, when grouping by three different 'dimensions'. The example in the paper is a car dealership, and the grouping values are year, make and color of vehicle.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/kalen_delaney/image_3E3DB06D.png"&gt;&lt;IMG style="BORDER-RIGHT-WIDTH:0px;DISPLAY:inline;BORDER-TOP-WIDTH:0px;BORDER-BOTTOM-WIDTH:0px;BORDER-LEFT-WIDTH:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/kalen_delaney/image_thumb_1AC10C08.png" width=194 height=244&gt;&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Although I have not begun to explore analysis services or reporting services in any depth at all, I find that a thorough understanding of what is a cube and why we should care allows me to at least understand why these services are such powerful additions to the SQL Server product line.&lt;/P&gt;
&lt;P&gt;&lt;FONT color=#ff00ff size=4&gt;~Kalen&lt;/FONT&gt;&lt;/P&gt;</description></item><item><title>T-SQL Tuesday #002: Non-aggregated columns in a GROUP BY query</title><link>http://sqlblog.com/blogs/kalen_delaney/archive/2010/01/11/non-aggregated-columns-in-a-group-by-query.aspx</link><pubDate>Tue, 12 Jan 2010 00:00:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:20880</guid><dc:creator>Kalen Delaney</dc:creator><description>&lt;P&gt;&lt;EM&gt;It's time for the second &lt;/EM&gt;&lt;A href="http://sqlblog.com/blogs/adam_machanic/archive/2010/01/04/invitation-for-t-sql-tuesday-002-a-puzzling-situation.aspx"&gt;T-SQL Tuesday&lt;/A&gt;&lt;EM&gt;, once again managed by&lt;/EM&gt; &lt;EM&gt;Adam Machanic. Since my first T-SQL Tuesday post dealt with my long years with the SQL Server product, I decided to continue in that vein.&amp;nbsp; Adam asked for us to write something about a puzzle we had to deal with, involving TSQL. I decided to write about one of the most puzzling behaviors of the TSQL language, that fortunately went away many versions ago….&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;When I started working with Sybase, over 22 years ago, I was hired as a Technical Support Analyst. The company was new and small, and didn't have a lot of internal training procedures in place, so I was basically just given the manuals to read for a couple of weeks before being put on the phones with customers. &lt;/P&gt;
&lt;P&gt;My only knowledge of SQL and relational database systems was from a one semester course I had taken at UC Berkeley, taught by Michael Stonebraker. But, at least that was something, better than nothing.&amp;nbsp; I remember reading the TSQL manual, practically cover to cover, and encountering a very strange feature of queries using GROUP BY. I didn't completely understand it, but it seemed very strange. But I just kept reading, because I would have to face my first customer call… very soon. I remember that very first call, because I actually was able to provide some support. The caller was puzzled by the behavior he was getting from a certain TSQL query, the problem description sounded very much like something like the mysterious GROUP BY behavior I had read about just a day or two before.&amp;nbsp; And I was able to solve his problem that day, on my very first call. &lt;/P&gt;
&lt;P&gt;As good as my memory for SQL Server trivia is, I don't remember the exact query that the customer was calling about. (I do remember the customer though, as we went on to work together on many more interesting problems, and I have recently reconnected with him on LinkedIn.) Although I don't remember the exact query, I remember the behavior. It had to do with a TSQL 'extension' to the standard, that allowed you to use columns in a SELECT list containing aggregates, that were not aggregates and not part of the GROUP BY. Whew… &lt;/P&gt;
&lt;P&gt;So let's see what I am talking about. I'm going to use the &lt;EM&gt;pubs&lt;/EM&gt; sample database, which is the only one we had 22 years ago, and all sample queries in all the documentation were written to use it. I know that data very well.&amp;nbsp; There is a table called &lt;EM&gt;titles&lt;/EM&gt;, and each book in the &lt;EM&gt;titles&lt;/EM&gt; table has an attribute called &lt;EM&gt;type&lt;/EM&gt;, as well as &lt;EM&gt;title&lt;/EM&gt;, &lt;EM&gt;price&lt;/EM&gt;, and a few others. The following query is typical of the use of aggregation and GROUP BY. It takes all the rows with the same value for &lt;EM&gt;type&lt;/EM&gt;, and produces one row of output for each distinct value of &lt;EM&gt;type&lt;/EM&gt;. The average is the average per set of rows, one value per group of rows with the same &lt;EM&gt;type&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;&lt;FONT size=1 face="Courier New"&gt;USE pubs; &lt;BR&gt;GO &lt;BR&gt;SELECT type, AVG(price) as avg_price &lt;BR&gt;FROM titles &lt;BR&gt;GROUP BY type;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size=1 face="Courier New"&gt;RESULTS: &lt;BR&gt;type&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; avg_price &lt;BR&gt;------------ --------------------- &lt;BR&gt;business&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 13.73 &lt;BR&gt;mod_cook&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 11.49 &lt;BR&gt;popular_comp 21.475 &lt;BR&gt;psychology&amp;nbsp;&amp;nbsp; 13.504 &lt;BR&gt;trad_cook&amp;nbsp;&amp;nbsp;&amp;nbsp; 15.9633 &lt;BR&gt;UNDECIDED&amp;nbsp;&amp;nbsp;&amp;nbsp; NULL&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Way back in SQL Server 3 and 4 (and 4.2 and 4.21) we could add additional columns to the SELECT,&amp;nbsp; so I could add the &lt;EM&gt;title&lt;/EM&gt; column, or the first 10 characters of the &lt;EM&gt;title&lt;/EM&gt;, to the query.&amp;nbsp; We would get the results shown after the query. &lt;/P&gt;
&lt;P&gt;&lt;FONT size=1 face="Courier New"&gt;SELECT &lt;BR&gt;&amp;nbsp; SUBSTRING(title, 1, 10) as name, &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; type, avg(price) as avg_price, &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; count(*) as number &lt;BR&gt;FROM titles &lt;BR&gt;WHERE&amp;nbsp; type like '%cook%' &lt;BR&gt;GROUP BY type&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size=1 face="Courier New"&gt;RESULTS: &lt;BR&gt;name&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; type&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; avg_price&amp;nbsp;&amp;nbsp; number&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;---------- ------------ ----------&amp;nbsp; ------ &lt;BR&gt;Silicon Va mod_cook&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 11.49&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;The Gourme mod_cook&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 11.49&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;Onions, Le trad_cook&amp;nbsp;&amp;nbsp;&amp;nbsp; 15.96&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;Fifty Year trad_cook&amp;nbsp;&amp;nbsp;&amp;nbsp; 15.96&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;Sushi, Any trad_cook&amp;nbsp;&amp;nbsp;&amp;nbsp; 15.96&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;So first, you might wonder what it means to add the extra column. Once the rows are grouped by &lt;EM&gt;type&lt;/EM&gt;, adding some additional column basically undoes the grouping. Now I get individual data values instead of aggregates (summaries).&amp;nbsp; Except that the &lt;EM&gt;avg_price&lt;/EM&gt; and &lt;EM&gt;number&lt;/EM&gt; do seem to be summaries. It might almost start to make sense, until we add an additional WHERE filter:&lt;/P&gt;
&lt;P&gt;&lt;FONT size=1 face="courier new"&gt;SELECT &lt;BR&gt;SUBSTRING(title, 1, 10) as name, &lt;BR&gt;type, avg(price) as avg_price, &lt;BR&gt;count(*) as number &lt;BR&gt;FROM titles &lt;BR&gt;WHERE&amp;nbsp; type like '%cook%' &lt;BR&gt;&amp;nbsp; AND title LIKE 'S%' &lt;BR&gt;GROUP BY type&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size=1 face="courier new"&gt;RESULTS: &lt;BR&gt;name&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; type&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; avg_price&amp;nbsp;&amp;nbsp; number&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;---------- ------------ ----------- ----------- &lt;BR&gt;Silicon Va mod_cook&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 19.99&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;The Gourme mod_cook&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 19.99&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;Onions, Le trad_cook&amp;nbsp;&amp;nbsp;&amp;nbsp; 14.99&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;Fifty Year trad_cook&amp;nbsp;&amp;nbsp;&amp;nbsp; 14.99&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;Sushi, Any trad_cook&amp;nbsp;&amp;nbsp;&amp;nbsp; 14.99&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/FONT&gt;&amp;nbsp;&amp;nbsp; &lt;/P&gt;
&lt;P&gt;Both the average and the count changed when we added the additional condition, but the data that comes back does not all satisfy the condition… there are titles that do NOT start with 'S'! So what's going on?&lt;/P&gt;
&lt;P&gt;It turned out that once we broke the ANSI rules, and added a column to the SELECT list that was not part of an aggregate function and was not contained in the GROUP BY, then our WHERE clause changes its meaning. In this case, the WHERE was only used to determine what rows were used to compute the aggregates. This is why the values for &lt;EM&gt;avg_price&lt;/EM&gt; and &lt;EM&gt;number&lt;/EM&gt; changed with the addition of the second condition. If we wanted to control which rows come back, we needed to specify the condition in a HAVING clause. &lt;/P&gt;
&lt;P&gt;&lt;FONT size=1 face="courier new"&gt;SELECT &lt;BR&gt;SUBSTRING(title, 1, 10) as name, &lt;BR&gt;type, avg(price) as avg_price, &lt;BR&gt;count(*) as number &lt;BR&gt;FROM titles &lt;BR&gt;WHERE&amp;nbsp; type like '%cook%' &lt;BR&gt;GROUP BY type &lt;BR&gt;HAVING title LIKE 'S%'&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT size=1 face="courier new"&gt;RESULTS: &lt;BR&gt;name&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; type&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; avg_price&amp;nbsp;&amp;nbsp; number&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;---------- ------------ ------------ ------- &lt;BR&gt;Silicon Va mod_cook&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 11.49&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;Sushi, Any trad_cook&amp;nbsp;&amp;nbsp;&amp;nbsp; 15.96&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;This sort of seems like the results I might expect. If this behavior is puzzling to you, imagine trying to explain it to customers and students new to SQL. It was actually my good friend Tibor Karaszi who ran all these queries for me on a SQL Server 4.21 that he had available, so I could verify the results. After being totally baffled by the behavior, he said: “&lt;I&gt;I’m glad we don’t have to handle such topics in the newsgroups anymore!”&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;Even though we no longer can add unaggregated, ungrouped columns to the SELECT list of an aggregate query, that doesn't mean the puzzles are all gone. One problem that I continued to encounter when teaching beginning TSQL classes, was WHY this query isn't legal:&lt;/P&gt;
&lt;P&gt;&lt;FONT size=1 face="Courier New"&gt;SELECT title, type, count(*) as number &lt;BR&gt;FROM titles;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;SQL Server 2008 gives you a somewhat helpful error message:&lt;/P&gt;
&lt;P&gt;&lt;FONT size=1 face="courier new"&gt;Msg 8120, Level 16, State 1, Line 1 &lt;BR&gt;Column 'titles.title' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Just a few versions ago, the message only reported that the titles.title column was invalid, with no explanation. In Books Online, the explanation was found in the section on GROUP BY. But if someone worte the preceding query, they didn't even use a GROUP BY and may have no idea there was any relationship of this query to grouping operations. It made for a very puzzling time, to try to figure out this query didn't work as expected.&lt;/P&gt;
&lt;P&gt;Hopefully now there are fewer puzzling behaviors… but we'll just have to wait and see what everyone else blogs about.&lt;/P&gt;
&lt;P&gt;Have fun (and THANKS Tibor!)&lt;/P&gt;
&lt;P&gt;&lt;FONT color=#ff00ff size=4&gt;~Kalen&lt;/FONT&gt; &lt;/P&gt;
&lt;P&gt;(posted Monday evening on the US West Coast, but already Tuesday UTC time.)&lt;/P&gt;</description></item></channel></rss>