THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

String comparison: binary vs. dictionary

It is well known that you get better performance if you compare two strings in a binary collation than in a dictionary collation. But is the performance difference significant enough to warrant the use of an explicit COLLATE clause in the string comparison expression? That was a question came up in a conversation I had with a colleague recently.


To get a feel for the extent of the performance difference, I ran several tests comparing two commonly-used collations:


·      Latin1_General_BIN, and

·      SQL_Latin1_General_CP1_CI_AS


More specifically, I ran the following T-SQL script in a loop for 100 times:


DECLARE @s1 varchar(max), @s2 varchar(max), @i int

DECLARE @dummy int, @dt datetime

DECLARE @length int


SET @length = 1000000  -- or 100, or 1000000


SELECT @s1 = REPLICATE(CAST('a' as varchar(max)), @length)

SELECT @s2 = REPLICATE(CAST('a' as varchar(max)), @length)


SELECT @i = 1, @dt = GETDATE()

    WHILE @i < 1000


       IF @s1 = @s2 COLLATE SQL_Latin1_General_CP1_CI_AS

       --IF @s1 = @s2 COLLATE Latin1_General_BIN

            SET @dummy =0


       SET @i = @i + 1




The script was run in a number of scenarios:


·      The @length variable was set to 100, 10,000, and 1,000,000 to see the impact of the string length on the comparison method,

·      The string comparison was done in the IF clause with either the Latin1_General_BIN collation or the SQL_Latin1_General_CP1_CI_AS collation


The following table summarizes the test results. The value in each cell is the average elapsed time of the WHILE loop in the above script running 100 times for the corresponding test scenario. All values are in milliseconds.



String length (bytes)














The most salient point to note in the table is that as the string length increases, the performance difference between the two comparison methods increases. While there was no performance difference when the strings were 100 bytes long, the binary comparison was about twice as fast as the dictionary comparison when the string length was 10,000 bytes. When the string length increased to 1,000,000, the binary comparison was about seven times faster.


So, if you are comparing two short strings, you probably shouldn’t bother to explicitly cast the comparison into a binary collation. But when the strings can be very long, it can result in a huge performance improvement to explicitly specify a binary collation for string comparison.


Note that string comparison in COLLATE Latin1_General_BIN and string comparison in COLLATE SQL_Latin1_General_CP1_CI_AS have different semantics. But there are many cases where you have control over your data, and the difference in comparison semantics does not matter (e.g. when you know your data is all in lower case).


Nothing is particularly new here. Just want to contribute some data points to the community.


Published Saturday, January 31, 2009 11:46 PM by Linchi Shea
Filed under:


No Comments
New Comments to this post are disabled

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog


Privacy Statement