<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx</link><description>It seems like every couple of months we see yet another post on SQLCLR string splitting routines. Many bloggers, I suppose, are still struggling, even three years later, to find that "perfect" use case for SQLCLR. Is string splitting it? Probably not.</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13572</link><pubDate>Mon, 27 Apr 2009 01:13:29 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13572</guid><dc:creator>Ted Krueger</dc:creator><description>&lt;p&gt;Very impressive Adam as with anything I've read of yours. &amp;nbsp;I'm glad the one on LTD got you up to writing this and that I had the chance to get the feedback on the imperfections in mine along with the more efficient ways of performing the task. &amp;nbsp; It’s easy to say things can be better, but showing it like you have here is the best way we can learn from it. &amp;nbsp; I know I have! &amp;nbsp;-Ted&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13579</link><pubDate>Mon, 27 Apr 2009 12:11:04 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13579</guid><dc:creator>Paul White</dc:creator><description>&lt;p&gt;Hey Adam,&lt;/p&gt;
&lt;p&gt;This debate is currently running over at SQLServerCentral:&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_new" href="http://www.sqlservercentral.com/Forums/Topic695508-338-1.aspx?Update=1"&gt;http://www.sqlservercentral.com/Forums/Topic695508-338-1.aspx?Update=1&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We used a very similar TVF to yours, and found that leaving SqlChars.Value as char[] improved the speed of the split slightly.&lt;/p&gt;
&lt;p&gt;That is the current fastest CLR TVF for most cases, though for larger strings with fewer delimiters, a RegEx split is even faster.&lt;/p&gt;
&lt;p&gt;A nice graph showing the two fastest CLR TVFs versus two of the best T-SQL approaches can be found here:&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_new" href="http://www.sqlservercentral.com/Forums/FindPost704614.aspx"&gt;http://www.sqlservercentral.com/Forums/FindPost704614.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The other interesting thing to come out so far is that the CLR TVFs are very significantly faster than tally (numbers) table based T-SQL routines, long held to be the best way to split strings.&lt;/p&gt;
&lt;p&gt;There are a large number of tests in that thread, from small strings to splitting the entire text of Moby Dick!&lt;/p&gt;
&lt;p&gt;Finally, it slightly sucks that while the TVF can stream its output via IEnumerator and IEnumerable, it is a shame that streaming input can't be achieved at the same time, either via the SqlChars or by using a SqlDataReader to read the rows-to-split from a persistent table.&lt;/p&gt;
&lt;p&gt;Cheers,&lt;/p&gt;
&lt;p&gt;Paul&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13581</link><pubDate>Mon, 27 Apr 2009 12:50:42 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13581</guid><dc:creator>Adam Machanic</dc:creator><description>&lt;p&gt;Hi Paul,&lt;/p&gt;
&lt;p&gt;Thanks for the pointer. &amp;nbsp;I just took a quick look through the thread but couldn't find any methods similar to mine. &amp;nbsp;I saw the &amp;quot;chars&amp;quot; method, but it has the same problem as String.Split: It does all of the splitting work upfront, producing an intermediate collection (it uses an ArrayList -- if I've looked at the wrong solution please point me to the correct one). &amp;nbsp;I didn't see anything in there about anyone actually load testing these methods, and that's where the scalability factor comes into play. &amp;nbsp;The upfront allocations simply will not scale.&lt;/p&gt;
&lt;p&gt;By the way, SqlChars does stream the data in, and if I can figure out a way to leverage the char collection without doing an upfront large allocation I'll post back an updated solution that should be even better.&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13583</link><pubDate>Mon, 27 Apr 2009 13:27:43 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13583</guid><dc:creator>Paul White</dc:creator><description>&lt;p&gt;Thanks for the quick reply - and, yes, you are quite right - all the solutions there do the allocation up-front, not using IEnumerator methods explicitly. &amp;nbsp;I do apologize for my confusion there. &amp;nbsp;Sorry.&lt;/p&gt;
&lt;p&gt;The reason for my confusion was that I had experimented with a solution using explicit IEnumerator methods, when trying to write a CLR TVF to read all the strings to split directly from the source table. &amp;nbsp;The rows were read into a SqlDataReader, and the plan was to stream the input from the Reader and also stream the output in the IEnumerator methods.&lt;/p&gt;
&lt;p&gt;(BTW yes SqlChars can stream, but this method was trying to read all rows at once, rather than streaming individual rows...)&lt;/p&gt;
&lt;p&gt;Sadly, it failed for two reasons:&lt;/p&gt;
&lt;p&gt;(a) CLR TVFs can only do data access in the InitMethod, not in the method called via FillRowMethodName, so I would have had to read all the data up front again.&lt;/p&gt;
&lt;p&gt;(b) Sending the results down the SqlPipe row-by-row using a SqlDataRecord and associated metadata is really slow. &amp;nbsp;Building a DataTable in a DataSet and sending a SqlDataReader from the DataSet is even worse.&lt;/p&gt;
&lt;p&gt;My apologies again.&lt;/p&gt;
&lt;p&gt;Paul&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13586</link><pubDate>Mon, 27 Apr 2009 15:01:24 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13586</guid><dc:creator>Adam Machanic</dc:creator><description>&lt;p&gt;Hi Paul, &lt;/p&gt;
&lt;p&gt;No need to apologize for anything :-)&lt;/p&gt;
&lt;p&gt;I did some work with the SqlPipe for a running sums problem and found it to be plenty fast for my needs. &amp;nbsp;How big was the data you were returning?&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13595</link><pubDate>Mon, 27 Apr 2009 20:52:18 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13595</guid><dc:creator>Paul White</dc:creator><description>&lt;p&gt;It was returning the split string results from Jeff Moden's test table. &amp;nbsp;This generates 800K rows of single small strings O.0&lt;/p&gt;
&lt;p&gt;It wasn't that slow in absolute terms, it was just a little disappointing that it was slower than the CROSS APPLY with the original CLR TVF. &amp;nbsp;It was about the same speed as the simple tally table solution, which was one of the worst performers overall.&lt;/p&gt;
&lt;p&gt;Of course, it may be simply that my C# skills are not all they could be!&lt;/p&gt;
&lt;p&gt;The table-at-once approach was never intended to be a good addition to the SQL CLR developer's toolkit, for anyone wondering - it was an attempt to find the fastest possible solution for the original posted problem on SSC.&lt;/p&gt;
&lt;p&gt;It's memory allocation in particular would be horrible - I just wanted to find a way to cut down the round trips between the database engine and CLR, and to reduce the number of calls (late-bound by reflection?) to the IEnumerator methods.&lt;/p&gt;
&lt;p&gt;I was apologising for replying too quickly without taking the time to fully digest your solution. &amp;nbsp;I stand by that! &amp;nbsp;I will learn :c)&lt;/p&gt;
&lt;p&gt;Paul&lt;/p&gt;
</description></item><item><title>SQLCLR String Splitting Part 2: Even Faster, Even More Scalable</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13613</link><pubDate>Tue, 28 Apr 2009 19:08:28 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13613</guid><dc:creator>Adam Machanic</dc:creator><description>&lt;p&gt;Two days ago, after posting what I thought was a pretty solid SQLCLR string splitting method , I received&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13666</link><pubDate>Thu, 30 Apr 2009 07:07:05 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13666</guid><dc:creator>M</dc:creator><description>&lt;p&gt;Have you investigated Regex.Split, using a precompiled regex?&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13667</link><pubDate>Thu, 30 Apr 2009 08:25:23 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13667</guid><dc:creator>Paul White</dc:creator><description>&lt;p&gt;M,&lt;/p&gt;
&lt;p&gt;Yes - see the link to SSC in Adam's article. &amp;nbsp;The Regex is modestly faster for very large strings - but see Adam's next post for the ultimate solution. &amp;nbsp;As you will see, the problem with all the CLR methods up to this point (including a pre-compiled Regex) is that memory is allocated for the entire split up front...&lt;/p&gt;
&lt;p&gt;Cheers,&lt;/p&gt;
&lt;p&gt;Paul&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13672</link><pubDate>Thu, 30 Apr 2009 14:03:20 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13672</guid><dc:creator>Adam Machanic</dc:creator><description>&lt;p&gt;M: As Paul mentioned, Regex.Split is out, due to upfront memory allocation. &amp;nbsp;Precompiled regex might still be an option, but I can't find a way to stream the results (i.e., an IndexOf method or something similar). &amp;nbsp;Do you have any ideas?&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13765</link><pubDate>Mon, 04 May 2009 20:58:40 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13765</guid><dc:creator>Simon Sabin</dc:creator><description>&lt;p&gt;What if you stream the positions back and then use substring in the TSQL. This avoids the need to make any copies of the strings.&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13767</link><pubDate>Mon, 04 May 2009 21:53:42 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13767</guid><dc:creator>Paul White</dc:creator><description>&lt;p&gt;Simon,&lt;/p&gt;
&lt;p&gt;Wouldn't SQL Server be making a copy of the string when it performs the substring? ;c)&lt;/p&gt;
&lt;p&gt;Paul&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#13797</link><pubDate>Tue, 05 May 2009 17:11:32 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13797</guid><dc:creator>Adam Machanic</dc:creator><description>&lt;p&gt;I tried something similar in the past without much success, but it was 100% T-SQL. &amp;nbsp;I'm not sure if the performance issue was with SUBSTRING or CHARINDEX but on VARCHAR(MAX) it was pretty bad. &amp;nbsp;I'll have to test and see if that makes a difference. &amp;nbsp;Only issue is that it would require two separate modules to do a split, which becomes a bit more of a maintenance nightmare.&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#15342</link><pubDate>Sun, 19 Jul 2009 16:28:22 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:15342</guid><dc:creator>TechVsLife</dc:creator><description>&lt;p&gt;Thank you, this is very useful (though it should be included as a built-in T-SQL function). &amp;nbsp;I haven't read through all the many pages spilled on this subject, but I assume the conclusion is that your CLR version is among the fastest, and faster in most scenarios than the T-SQL version?&lt;/p&gt;
&lt;p&gt;btw, I added a comment about an update to your pure T-SQL version, at:&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_new" href="http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/splitting-a-string-of-unlimited-length.aspx"&gt;http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/splitting-a-string-of-unlimited-length.aspx&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#41507</link><pubDate>Thu, 02 Feb 2012 12:21:49 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:41507</guid><dc:creator>Dave</dc:creator><description>&lt;p&gt;Hi Adam&lt;/p&gt;
&lt;p&gt;Would you mind please posting the whole code for your example? &amp;nbsp;I am trying to learn how to create a CLR TVF do split strings, and this page comes highly recommended. &amp;nbsp;However, I can't figure out how to fill in the gaps... Thanks :-)&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#41542</link><pubDate>Sat, 04 Feb 2012 04:29:33 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:41542</guid><dc:creator>Adam Machanic</dc:creator><description>&lt;p&gt;Hi Dave,&lt;/p&gt;
&lt;p&gt;First of all, you should be using this code:&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_new" href="http://sqlblog.com/blogs/adam_machanic/archive/2009/04/28/sqlclr-string-splitting-part-2-even-faster-even-more-scalable.aspx"&gt;http://sqlblog.com/blogs/adam_machanic/archive/2009/04/28/sqlclr-string-splitting-part-2-even-faster-even-more-scalable.aspx&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To use: Fire up Visual Studio, launch a new C# Database Project, right-click on the project and click &amp;quot;Add,&amp;quot; select any item, and then go to the editor window and paste the code in on top of the template. Build it, deploy it, and you're done :-)&lt;/p&gt;
&lt;p&gt;Feel free to shoot back with any remaining questions!&lt;/p&gt;
</description></item><item><title>re: Faster, More Scalable SQLCLR String Splitting</title><link>http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx#47273</link><pubDate>Thu, 24 Jan 2013 10:41:44 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:47273</guid><dc:creator>Jesús López</dc:creator><description>&lt;p&gt;The function would be simpler using yield return:&lt;/p&gt;
&lt;p&gt;public partial class UserDefinedFunctions&lt;/p&gt;
&lt;p&gt;{&lt;/p&gt;
&lt;p&gt;	[Microsoft.SqlServer.Server.SqlFunction(&lt;/p&gt;
&lt;p&gt;		TableDefinition=&amp;quot;token nvarchar(4000)&amp;quot;, &lt;/p&gt;
&lt;p&gt;		FillRowMethodName=&amp;quot;GetNextString&amp;quot;)]&lt;/p&gt;
&lt;p&gt;	public static IEnumerable Split(&lt;/p&gt;
&lt;p&gt;		[SqlFacet(MaxSize=-1)] SqlString str, &lt;/p&gt;
&lt;p&gt;		[SqlFacet(IsFixedLength=true, MaxSize=1)] SqlString separator)&lt;/p&gt;
&lt;p&gt;	{&lt;/p&gt;
&lt;p&gt;		if (str.IsNull || separator.IsNull) yield break;&lt;/p&gt;
&lt;p&gt;		char splitter = separator.Value[0];&lt;/p&gt;
&lt;p&gt;		int currentPosition = 0;&lt;/p&gt;
&lt;p&gt;		int nextPosition = 0;&lt;/p&gt;
&lt;p&gt;		string inputStr = str.Value;&lt;/p&gt;
&lt;p&gt;		int strLength = inputStr.Length;&lt;/p&gt;
&lt;p&gt;		do&lt;/p&gt;
&lt;p&gt;		{&lt;/p&gt;
&lt;p&gt;			nextPosition = inputStr.IndexOf(splitter, currentPosition);&lt;/p&gt;
&lt;p&gt;			if (nextPosition == -1) nextPosition = strLength;&lt;/p&gt;
&lt;p&gt;			yield return inputStr.Substring(currentPosition, nextPosition - currentPosition).Trim();&lt;/p&gt;
&lt;p&gt;			currentPosition = nextPosition + 1;&lt;/p&gt;
&lt;p&gt;		} while (currentPosition &amp;lt; strLength);		&lt;/p&gt;
&lt;p&gt;	}&lt;/p&gt;
&lt;p&gt;	public static void GetNextString(Object obj, out SqlString str)&lt;/p&gt;
&lt;p&gt;	{&lt;/p&gt;
&lt;p&gt;		str = new SqlString((string)obj);&lt;/p&gt;
&lt;p&gt;	}&lt;/p&gt;
&lt;p&gt;};&lt;/p&gt;
</description></item></channel></rss>