<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'SQL' and 'collation'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=SQL,collation&amp;orTags=0</link><description>Search results matching tags 'SQL' and 'collation'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Using SQL Execution Plans to discover the Swedish alphabet</title><link>http://sqlblog.com/blogs/rob_farley/archive/2011/01/03/using-sql-execution-plans-to-discover-the-swedish-alphabet.aspx</link><pubDate>Mon, 03 Jan 2011 05:54:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:32237</guid><dc:creator>rob_farley</dc:creator><description>&lt;P&gt;SQL Server is quite remarkable in a bunch of ways. In this post, I’m using the way that the Query Optimizer handles LIKE to keep it SARGable, the Execution Plans that result, Collations, and PowerShell to come up with the Swedish alphabet.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://msmvps.com/blogs/robfarley/archive/2010/01/22/sargable-functions-in-sql-server.aspx" target=_blank&gt;SARGability&lt;/A&gt; is the ability to seek for items in an index according to a particular set of criteria. If you don’t have SARGability in play, you need to scan the whole index (or table if you don’t have an index). For example, I can find myself in the phonebook easily, because it’s sorted by LastName and I can find Farley in there by moving to the Fs, and so on. I can’t find everyone in my suburb easily, because the phonebook isn’t sorted that way. I can’t even find people who have six letters in their last name, because also the book is sorted by LastName, it’s not sorted by LEN(LastName). This is all stuff I’ve looked at before, including in the &lt;A href="http://www.sqlbits.com/Sessions/Event7/Understanding_SARGability_to_make_your_queries_run_faster" target=_blank&gt;talk I gave at SQLBits in October 2010&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;If I try to find everyone who’s names start with F, I can do that using a query a bit like:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;SELECT LastName FROM dbo.PhoneBook WHERE LEFT(LastName,1) = 'F';&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Unfortunately, the Query Optimizer doesn’t realise that all the entries that satisfy LEFT(LastName,1) = 'F' will be together, and it has to scan the whole table to find them.&lt;/P&gt;
&lt;P&gt;&lt;IMG style="BACKGROUND-IMAGE:none;BORDER-BOTTOM:0px;BORDER-LEFT:0px;PADDING-LEFT:0px;PADDING-RIGHT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;PADDING-TOP:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/rob_farley/image_0BB56CF3.png" width=595 height=143&gt;&lt;/P&gt;
&lt;P&gt;But if I write:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;SELECT LastName FROM dbo.PhoneBook WHERE LastName LIKE 'F%';&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;then SQL is smart enough to understand this, and performs an Index Seek instead.&lt;/P&gt;
&lt;P&gt;&lt;IMG style="BACKGROUND-IMAGE:none;BORDER-BOTTOM:0px;BORDER-LEFT:0px;PADDING-LEFT:0px;PADDING-RIGHT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;PADDING-TOP:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/rob_farley/image_0F7311C3.png" width=566 height=133&gt;&lt;/P&gt;
&lt;P&gt;To see why, I look further into the plan, in particular, the properties of the Index Seek operator. The ToolTip shows me what I’m after:&lt;/P&gt;
&lt;P&gt;&lt;IMG style="BACKGROUND-IMAGE:none;BORDER-BOTTOM:0px;BORDER-LEFT:0px;PADDING-LEFT:0px;PADDING-RIGHT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;PADDING-TOP:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/rob_farley/image_14754F72.png" width=302 height=200&gt;&lt;/P&gt;
&lt;P&gt;You’ll see that it does a Seek to find any entries that are at least F, but not yet G. There’s an extra Predicate in there (a Residual Predicate if you like), which checks that each LastName is really LIKE F% – I suppose it doesn’t consider that the Seek Predicate is quite enough – but most of the benefit is seen by its working out the Seek Predicate, filtering to just the “at least F but not yet G” section of the data.&lt;/P&gt;
&lt;P&gt;This got me curious though, particularly about where the G comes from, and whether I could leverage it to create the Swedish alphabet.&lt;/P&gt;
&lt;P&gt;I know that in the Swedish language, there are three extra letters that appear at the end of the alphabet. One of them is ä that appears in the word Västerås. It turns out that Västerås is quite hard to find in an index when you’re looking it up in a Swedish map. I talked about this briefly in my five-minute &lt;A href="http://bit.ly/RFCollation" target=_blank&gt;talk on Collation&lt;/A&gt; from SQLPASS (the one which was slightly less than serious).&lt;/P&gt;
&lt;P&gt;So by looking at the plan, I can work out what the next letter is in the alphabet of the collation used by the column. In other words, if my alphabet were Swedish, I’d be able to tell what the next letter after F is – just in case it’s not G.&lt;/P&gt;
&lt;P&gt;It turns out it is… Yes, the Swedish letter after F is G. But I worked this out by using a copy of my PhoneBook table that used the Finnish_Swedish_CI_AI collation. I couldn’t find how the Query Optimizer calculates the G, and my friend &lt;A href="http://sqlblog.com/blogs/paul_white/" target=_blank&gt;Paul White&lt;/A&gt; (&lt;A href="http://twitter.com/SQL_Kiwi" target=_blank&gt;@SQL_Kiwi&lt;/A&gt;) tells me that it’s frustratingly internal to the QO. He’s particularly smart, even if he is from New Zealand.&lt;/P&gt;
&lt;P&gt;To investigate further, I decided to do some PowerShell, leveraging the &lt;A href="http://sqlblog.com/blogs/rob_farley/archive/2010/12/07/powershell-to-fetch-a-sql-execution-plan.aspx" target=_blank&gt;Get-SqlPlan function&lt;/A&gt; that I blogged about recently (make sure you also have the SqlServerCmdletSnapin100 snap-in added. To do that, run &lt;EM&gt;get-pssnapin -reg&lt;/EM&gt; to see if it's on the machine, and &lt;EM&gt;get-pssnapin -reg | add-pssnapin&lt;/EM&gt; to make the cmdlet available).&lt;/P&gt;
&lt;P&gt;I started by indicating that I was going to use Finnish_Swedish_CI_AI as my collation of choice, and that I’d start whichever letter came straight after the number 9. I figure that this is a cheat’s way of guessing the first letter of the alphabet (but it doesn’t actually work in Unicode – luckily I’m using varchar not nvarchar. Actually, there are a few aspects of this code that only work using ASCII, so apologies if you were wanting to apply it to Greek, Japanese, etc). I also initialised my $alphabet variable.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;$collation = 'Finnish_Swedish_CI_AI'; &lt;BR&gt;$firstletter = '9'; &lt;BR&gt;$alphabet = '';&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Now I created the table for my test. A single field would do, and putting a Clustered Index on it would suffice for the Seeks. &lt;EM&gt;[Edit: You may need to insert some rows if you find that it's producing&amp;nbsp;an Index Scan. Theoretically it should prefer to Scan,&amp;nbsp;knowing there's&amp;nbsp;no rows, but my tests showed it would always Seek]&lt;/EM&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;Invoke-Sqlcmd -server . -data tempdb -query "create table dbo.collation_test (col varchar(10) collate $collation primary key);"&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Now I get into the looping. &lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;$c = $firstletter; &lt;BR&gt;$stillgoing = $true; &lt;BR&gt;while ($stillgoing) &lt;BR&gt;{&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I construct the query I want, seeking for entries which start with whatever $c has reached, and get the plan for it:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;$query = "select col from dbo.collation_test where col like '$($c)%';"; &lt;BR&gt;[xml] $pl = get-sqlplan $query "." "tempdb";&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;At this point, my $pl variable is a scary piece of XML, representing the execution plan. A bit of hunting through it showed me that the EndRange element contained what I was after, and that if it contained NULL, then I was done.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;$stillgoing = ($pl.ShowPlanXML.BatchSequence.Batch.Statements.StmtSimple.QueryPlan.RelOp.IndexScan.SeekPredicates.SeekPredicateNew.SeekKeys.EndRange -ne $null);&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Now I could grab the value out of it (which came with apostrophes that needed stripping), and append that to my $alphabet variable.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;&amp;nbsp; if ($stillgoing) &lt;BR&gt;&amp;nbsp; {&amp;nbsp; &lt;BR&gt;$c=$pl.ShowPlanXML.BatchSequence.Batch.Statements.StmtSimple.QueryPlan.RelOp.IndexScan.SeekPredicates.SeekPredicateNew.SeekKeys.EndRange.RangeExpressions.ScalarOperator.ScalarString.Replace("'",""); &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; $alphabet += $c; &lt;BR&gt;&amp;nbsp; }&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Finally, finishing the loop, dropping the table, and showing my alphabet!&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;} &lt;BR&gt;Invoke-Sqlcmd -server . -data tempdb -query "drop table dbo.collation_test;"; &lt;BR&gt;$alphabet;&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;When I run all this, I see that the Swedish alphabet is ABCDEFGHIJKLMNOPQRSTUVXYZÅÄÖ, which matches what I see at &lt;A href="http://en.wikipedia.org/wiki/Swedish_orthography" target=_blank&gt;Wikipedia&lt;/A&gt;. Interesting to see that the letters on the end are still there, even with Case Insensitivity. Turns out they’re not just “letters with accents”, they’re letters in their own right.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/rob_farley/image_528E6A23.png"&gt;&lt;IMG style="BACKGROUND-IMAGE:none;BORDER-BOTTOM:0px;BORDER-LEFT:0px;PADDING-LEFT:0px;PADDING-RIGHT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;PADDING-TOP:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/rob_farley/image_thumb_0511C796.png" width=609 height=292&gt;&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I’m sure you gave up reading long ago, and really aren’t that&amp;nbsp;grabbed about the idea of doing this using PowerShell. I chose PowerShell because I’d already come up with an easy way of grabbing the estimated plan for a query, and PowerShell does allow for easy navigation of XML.&lt;/P&gt;
&lt;P&gt;I find the most interesting aspect of this as the fact that the Query Optimizer uses the next letter of the alphabet to maintain the SARGability of LIKE. &lt;A href="https://connect.microsoft.com/SQLServer/feedback/details/526431/make-more-functions-sargable" target=_blank&gt;I’m hoping they do something similar for a whole bunch of operations&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;Oh, and the fact that you know how to find stuff in the IKEA catalogue.&lt;/P&gt;
&lt;HR&gt;

&lt;P&gt;Footnote: &lt;/P&gt;
&lt;P&gt;If you are interested in whether this works in other languages, you might want to consider the following screenshot, which shows that in principle, it should work with Japanese. It might be a bit harder to run this in PowerShell though, as I’m not sure how it translates. In Hiragana, the Japanese alphabet starts あ, ぃ, ぅ, ぇ, ぉ, ...&lt;/P&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/rob_farley/image_20F2D6C1.png"&gt;&lt;IMG style="BACKGROUND-IMAGE:none;BORDER-BOTTOM:0px;BORDER-LEFT:0px;PADDING-LEFT:0px;PADDING-RIGHT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;PADDING-TOP:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/rob_farley/image_thumb_57DC8F2D.png" width=644 height=413&gt;&lt;/A&gt;&lt;/P&gt;</description></item><item><title>My Lightning Talk in MP3 format</title><link>http://sqlblog.com/blogs/rob_farley/archive/2010/12/23/my-lightning-talk-in-mp3-format.aspx</link><pubDate>Fri, 24 Dec 2010 01:51:01 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:31939</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;Download it now via &lt;a href="http://bit.ly/RFCollation" target="_blank"&gt;http://bit.ly/RFCollation&lt;/a&gt;&amp;#160;&lt;a href="http://twitter.com/Kendra_Little/status/2834157470228480" target="_blank"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;margin:5px;padding-left:0px;padding-right:0px;display:inline;float:right;border-top:0px;border-right:0px;padding-top:0px;" title="Thanks to Kendra Little for this photo!" border="0" alt="Thanks to Kendra Little for this photo!" align="right" src="http://sqlblog.com/blogs/rob_farley/wglw_30A289FA.jpg" width="184" height="244" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Lots of people tell me they wish they’d heard my &lt;a href="http://sqlpass.eventpoint.com/topic/details/LT100R" target="_blank"&gt;Lightning Talk from the PASS Summit&lt;/a&gt;. This was the one that was five minutes, in which I explained Collation using examples comparing US English, UK English and Australian English. At the end, I showed my Arsenal thongs. You can see a picture of them below. There was a visual joke involving the name Arsenal too...&lt;/p&gt;  &lt;p&gt;After the recordings became available, I asked the PASS legal people, and they said I could do what I liked with my own five-minute set, so long as I didn’t sell it.&lt;/p&gt;  &lt;p&gt;So I made an MP3. I’ve uploaded it to the &lt;a href="http://lobsterpot.com.au" target="_blank"&gt;LobsterPot Solutions&lt;/a&gt; web server, and provided an easy link via &lt;a href="http://bit.ly/RFCollation" target="_blank"&gt;http://bit.ly/RFCollation&lt;/a&gt;. It’s a link straight to the MP3, and you’re welcome to download it, put it on your iPod, whatever you like.&lt;/p&gt;  &lt;p&gt;And also feel free to write comments here, to let me know what you think.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/rob_farley/003_1589BAEC.jpg"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;display:inline;border-top:0px;border-right:0px;padding-top:0px;" title="Arsenal Thongs" border="0" alt="Arsenal Thongs" src="http://sqlblog.com/blogs/rob_farley/003_thumb_48499B93.jpg" width="244" height="184" /&gt;&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Collation errors in business</title><link>http://sqlblog.com/blogs/rob_farley/archive/2010/12/13/collation-errors-in-business.aspx</link><pubDate>Mon, 13 Dec 2010 23:59:21 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:31584</guid><dc:creator>rob_farley</dc:creator><description>&lt;p&gt;At the &lt;a href="http://www.sqlpass.org/summit/na2010/default.aspx" target="_blank"&gt;PASS Summit&lt;/a&gt; last month, I did a set (Lightning Talk) about collation, and in particular, the difference between the “English” spoken by people from the US, Australia and the UK.&lt;a href="http://www.sqlservercentral.com/blogs/steve_jones/archive/2010/12/07/t_2D00_sql-tuesday-_2300_13-_2D00_-what-the-business-says-is-not-what-the-business-wants.aspx" target="_blank"&gt;&lt;img style="background-image:none;border-bottom:0px;border-left:0px;margin:5px;padding-left:0px;padding-right:0px;display:inline;float:right;border-top:0px;border-right:0px;padding-top:0px;" title="TSQL2sDay150x150" border="0" alt="T-SQL Tuesday" align="right" src="http://sqlblog.com/blogs/rob_farley/TSQL2sDay150x150_21DC5B54.jpg" width="154" height="154" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;One of the examples I gave was that in the US drivers might stop for gas, whereas in Australia, they just open the window a little. This is what’s known as a &lt;a href="http://en.wikipedia.org/wiki/Paraprosdokian"&gt;paraprosdokian&lt;/a&gt;, where you suddenly realise you misunderstood the first part of the sentence, based on what was said in the second. My current favourite is Emo Phillip’s line “I like to play chess with old men in the park, but it can be hard to find thirty-two of them.”&lt;/p&gt;  &lt;p&gt;Essentially, this a collation error, one that good comedians can get mileage from.&lt;/p&gt;  &lt;p&gt;Unfortunately, collation is at its worst when we have a computer comparing two things in different collations. They might look the same, and sound the same, but if one of the things is in SQL English, and the other one is in Windows English, the poor database server (with no sense of humour) will get suspicious of developers (who all have senses of humour, obviously), and declare a collation error, worried that it might not realise some nuance of the language.&lt;/p&gt;  &lt;p&gt;One example is the common scenario of a case-sensitive collation and a case-insensitive one. One may think that “Rob” and “rob” are the same, but the other might not. Clearly one of them is my name, and the other is a verb which means to steal (people called “Nick” have the same problem, of course), but I have no idea whether “Rob” and “rob” should be considered the same or not – it depends on the collation.&lt;/p&gt;  &lt;p&gt;I told a lie before – collation isn’t at its worst in the computer world, because the computer has the sense to complain about the collation issue.&lt;/p&gt;  &lt;p&gt;People don’t.&lt;/p&gt;  &lt;p&gt;People will say something, with their own understanding of what they mean. Other people will listen, and apply their own collation to it. I remember when someone was asking me about a situation which had annoyed me. They asked if I was ‘pissed’, and I said yes. I meant that I was annoyed, but they were asking if I’d been drinking. It took a moment for us to realise the misunderstanding.&lt;/p&gt;  &lt;p&gt;In business, the problem is escalated. A business user may explain something in a particular way, using terminology that they understand, but using words that mean something else to a technical person. &lt;/p&gt;  &lt;p&gt;I remember a situation with a checkbox on a form (back in VB6 days from memory). It was used to indicate that something was approved, and indicated whether a particular database field should store True or False – nothing more. However, the client understood it to mean that an entire workflow system would be implemented, with different users have permission to approve items and more. The project manager I’d just taken over from clearly hadn’t appreciated that, and I faced a situation of explaining the misunderstanding to the client. Lots of fun...&lt;/p&gt;  &lt;p&gt;Collation errors aren’t just a database setting that you can ignore. You need to remember that Americans speak a different type of English to Aussies and Poms, and techies speak a different language to their clients.&lt;/p&gt;</description></item></channel></rss>