THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Running sums yet again: SQLCLR saves the day!

Back again!  Fourth post for the month of February, making this my best posting month in, well, months.  Expect this trend to continue.

After yesterday's post on running sums and the evils of cursors, Jamie Thompson came up with a faster solution than the curser I posted.  Alas, Jamie's solution uses an undocumented form of UPDATE syntax, and I am really not comfortable using it.  So I set out to find still another solution.  As promised at the end of my last post, SQLCLR is where I looked.  And my instinct proved correct.

Jamie's solution runs in 4 seconds on my laptop.  Compared to 14 seconds for the cursor I posted, that's a great enhancement.  But I knew that we could do better still, and without undocumented syntax and temp tables.

The answer?  A SQLCLR stored procedure.  Same logic as the cursor: Pull back the data in order, then loop over the rows and maintain the running sum in a variable.  But thanks to the SqlPipe's SendResults methods, we don't need a temporary table for this one -- the results can be sent back one row at a time, and will still show up on the client as a single result set.

Here's how I did it:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;


public partial class StoredProcedures
{
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void TransactionHistoryRunningSum()
    {
        using (SqlConnection conn = new SqlConnection("context connection=true;"))
        {
            SqlCommand comm = new SqlCommand();
            comm.Connection = conn;
            comm.CommandText = @"" +
                "SELECT TransactionID, ActualCost " +
                "FROM Production.TransactionHistory " +
                "ORDER BY TransactionID";

            SqlMetaData[] columns = new SqlMetaData[3];
            columns[0] = new SqlMetaData("TransactionID", SqlDbType.Int);
            columns[1] = new SqlMetaData("ActualCost", SqlDbType.Money);
            columns[2] = new SqlMetaData("RunningTotal", SqlDbType.Money);

            decimal RunningSum = 0;

            SqlDataRecord record = new SqlDataRecord(columns);

            SqlContext.Pipe.SendResultsStart(record);

            conn.Open();

            SqlDataReader reader = comm.ExecuteReader();

            while (reader.Read())
            {
                decimal ActualCost = (decimal)reader[1];
                RunningSum += ActualCost;

                record.SetInt32(0, (int)reader[0]);
                record.SetDecimal(1, ActualCost);              
                record.SetDecimal(2, RunningSum);

                SqlContext.Pipe.SendResultsRow(record);
            }

            SqlContext.Pipe.SendResultsEnd();
        }
    }
};

Results?  3 seconds on my laptop.  25% better than the previous best solution.  Not bad!

And, I even did better than that.  By creating a CLR table-valued user-defined function that uses a custom class wrapping a SqlDataReader, I was able to get this operation down to 2 seconds on my laptop.  However, due to restrictions related to passing around context connections, that solution can not use a context connection and is therefore highly suboptimal.  Until I figure out how to pass around a context connection (or if it's even possible), I'll keep that one off the blog.

So to recap: We certainly have not gotten rid of the cursor.  This SQLCLR solution is really just a cursor in disguise.  But we've built a better cursor, because this one doesn't require temporary tables.  And that I can live with--for now.

Thanks to Jamie Thompson for prompting me to not wait several weeks before following up as I usually do with these posts!


Published Wednesday, July 12, 2006 10:52 PM by Adam Machanic
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Jeff Smith said:

Hi Adam -- great post.  It reminds me a lot of my post about doing cross tabs at the client instead of trying to jump through hoops to do them in T-SQL.  And the results are the same -- not only is it easier to do these things at the client, it is faster!  

http://weblogs.sqlteam.com/jeffs/archive/2005/05/12/5127.aspx

The only time that SQL Server should be calculating running sums is when those values are needed for further processing within SQL Server (i.e., to store in a table or to calc percentages or something like that).  As I often say, if people will just use SQL the way it is intended they would be surprised at how much simplier their code is and how much easier their lives are!

Great blog, the site looks great, looking forward to your posts.  
July 13, 2006 12:54 PM
 

stook said:

Hi,

great post.

the update syntax is not undocumented actually. At least my sql 2000 help says:

SET @variable = column = expression sets the variable to the same value as the column. This differs from SET @variable = column, column = expression, which sets the variable to the pre-update value of the column.

SQLCLR is faster of course, but the SET is simpler one.

About performance tests - if these take less then 10 seconds I'll usualy make a 100 loop for timing to be sure.

May 8, 2007 6:28 PM
 

Adam Machanic said:

Hi Stook,

That's technically only documented for a single row; the result of a multirow operation is what is not documented (and not deterministic).

May 22, 2007 2:13 PM
 

Adam Machanic said:

I found Linchi's recent post on use of cursors in the TPC-E test to be quite interesting. The question

October 13, 2007 10:06 AM
 

Kevin Boles said:

Per Hugo's blog here, http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/21/curious-cursor-optimization-options.aspx, you can actually get the TSQL cursor solution to run a tad faster.  Instead of this for the cursor declaration:

CURSOR LOCAL FAST_FORWARD  FOR

try this:

CURSOR LOCAL FORWARD_ONLY STATIC  FOR

On my mondo test server . . . er laptop :), I was able to cut runtime from 5.56 seconds to 4.82 seconds average time.  13.3% faster if I did the math correctly.

Really blows my mind to find that FAST_FORWARD isn't the fastest cursor!  I have been advising clients and students for years to use that 'optimization' if they absolutely had to use a cursor.  oopsie!

November 24, 2007 8:41 PM
 

David said:

Did some tests with:

LOCAL FAST_FORWARD

vs

LOCAL READ_ONLY FORWARD_ONLY STATIC

(note the READ_ONLY might speed things up a bit)

And it seemed the FORWARD_ONLY approach was faster only on SQL 2000 and not 2005.

December 1, 2008 10:17 AM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement