THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Unicode, VARCHAR, NVARCHAR and index usage in SQL Server

This is not the first time I found non-intuitive issues with collation, Unicode and 8-bit strings with SQL Server, but this time I discovered (more precisely: one of my customers discovered and asked me...) a strange behavior that affects the query plan and the overall query optimization strategy.

This was my original post to Microsoft Connect:

A SQL query with a WHERE condition does not use the existing index if the comparison is made against a Unicode constant string. It happens when the field has a collation setting different than the current one (i.e. field has SQL_Latin1_General_CP1_CI_AS where server default is now Latin1_General_CI_AS).
Query plan use Clustered Index Scan instead of Clustered Index Seek operation.
Performance become very slow when the query is made many times into a loop (not a good practice, but the program can't be changed).
Using ANSI string (8 bit) the problem disappear and query plan shows the use of a Clustered Index Seek Operation

The answer is detailed:

This is not about the collation but about the difference between 8-bit and 16-bit character set.
In fact if you use the default collation (remove "COLLATE SQL_Latin1_General_CP1_CI_AS" from the column definition) then the plans do not change.
There is no generalized valid way to down-convert a 16-bit string to an 8-bit string but an 8-bit string is required to perform a seek because that is the type of the index.
The scan plan works because the 8-bit strings in the table are upconveted to 16-bit and compared to the 16-bit constant.
If you use nvarchar in the table then both forms of the predicate will support index seek (since the 8-bit constant in the predicate can be up-converted to 16-bit).
Thanks,
SQL Server dev.

I well understand all of this, and apparently the issue doesn't affect many of us. However, think about it: when you build a Data Warehouse, for a lot of reasons you could have to manage a VARCHAR field instead of a NVARCHAR one. If they are on the staging database, you have to be careful in comparing them with constants (it appears to be simple). If you decide to build a dimension using VARCHAR types instead of NVARCHAR, the user who access the database could use a query tool which use Unicode constants for user input. Hum... very dangerous, angry users could blame on you for this.

Now, guess what? SQL Server Reporting Services 2005 build parametric queries using sp_executesql statement, putting all constants with a N (for Unicode) prefix. Can you hear the user screaming now?

If I wasn't clear, you can read the feedback here: a repro script is included.

Now, while I can find a workaround here (probably a stored procedure can convert the Unicode constant into the corresponding 8-bit version), I am asking if this only issue should encourage you to move dimension fields to NVARCHAR type. I know all implications and I know that dimensions size

Published Saturday, October 21, 2006 12:25 AM by Marco Russo (SQLBI)

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Kirill said:

Hey Marco,

Here's my question. I'm using 2005 SQL Express Database server. Last night I converted all of my tables from unicode to regular 8-bit. When I'm trying to do a simple loop on SELECT query the result takes forever, and eventually outputs whatever it can with a timeout message.

I've rebuild indexes for the tables, and not sure anymore what else I can do to bring it back to it's original performance. Basically the difference is about 100 times. Which is rediculous.

Thanks in advance!

August 25, 2009 4:57 PM
 

Marco Russo said:

I would take a look at the execution plan - this seems to be an issue larger than simlpe lack of use of an index.

August 25, 2009 5:02 PM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement