THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

John Paul Cook

Shakespeare and storing Unicode characters

This post is about the political issues involved with using multiple languages in a global organization and how to troubleshoot the technical details. The CHAR and VARCHAR data types are NOT suitable for global data. Some people still cling to CHAR and VARCHAR justifying their use by truthfully saying that they only take up half the space of NCHAR and NVARCHAR data types. But you’ll never be able to store Chinese, Korean, Greek, Japanese, Arabic, or many other languages unless you either use NCHAR and NVARCHAR which are double byte Unicode data types or use specific collations with CHAR and VARCHAR. Using collations is something I will address in a future post. Using CHAR and VARCHAR with collations does not save space and is trickier, which is why this post is focusing on using Unicode.

In troubleshooting a problem displaying Chinese characters, I need a test character to experiment with. I picked 坡 (decimal 22369, hexadecimal 5761) as my test character. Why? Because it looked Chinese to me! Now doesn’t that remind you of Act 1, Scene 2 of William Shakespeare’s Julius Caesar where Casca says to Cassius “but, for mine own part, it was Greek to me”? It really looked Chinese to me, but then, what do I know? That presented a problem. What if it meant something really offensive? After consulting a Chinese immigrant, I was relieved to know that my randomly chosen character was politically correct. And just to be extra safe, I double checked with fellow SQL Server MVP Greg Low. Thanks, Greg.

I wrote the following test code to show you what happens when Unicode (which is always double byte) is implicitly converted to single byte CHAR or VARCHAR.

create table #c (sqlString NVARCHAR(30), sqlChar NCHAR(1));

insert into #c values (N'insert into #c values ( ''坡'')', '坡'); -- invalid single byte syntax

insert into #c values (N'insert into #c values (N''坡'')', N'坡'); -- valid double byte Unicode syntax

, sqlChar
, UNICODE(sqlChar) as N'UNICODE(sqlChar)'
from #c;

drop table #c;

sqlString                      sqlChar UNICODE(sqlChar)
------------------------------ ------- ----------------
insert into #c values ( '坡')       ?     63
insert into #c values (N'坡')       坡 22369

Here is another test you can run:

select'坡' , N'坡'

The point is that if you start with a Unicode character such as 坡 but you insert it as '坡', it is no longer a double byte character. You’ve crammed it into a single byte and corrupted it. You have to insert it as N'坡' to prevent corruption and you must have a double byte destination for it. By the way, the N stands for National Language. You may also see references to National Language Setting(s) or even National Character Set (NCS).

The Character Map utility in Windows is helpful when working with Unicode characters.


Figure 1. Character Map utility in Windows.

Personally I prefer the richer functionality of BabelMap, which is available as a free download, online application, or portable application.


Figure 2. BabelMap free Unicode utility.

Published Tuesday, June 25, 2013 10:21 PM by John Paul Cook

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS


No Comments

Leave a Comment


About John Paul Cook

John Paul Cook is a database and Azure specialist in Houston. He previously worked as a Data Platform Solution Architect in Microsoft's Houston office. Prior to joining Microsoft, he was a SQL Server MVP. He is experienced in SQL Server and Oracle database application design, development, and implementation. He has spoken at many conferences including Microsoft TechEd and the SQL PASS Summit. He has worked in oil and gas, financial, manufacturing, and healthcare industries. John is also a Registered Nurse currently studying to be a psychiatric nurse practitioner. Contributing author to SQL Server MVP Deep Dives and SQL Server MVP Deep Dives Volume 2. Connect on LinkedIn

This Blog



Privacy Statement