THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Michael Coles: Sergeant SQL

SQL Server development, news and information from the front lines

ICANN Approves International (國際) Characters for Domain Names

Many people have complained about the space tradeoff involved in storing international characters in the database (using nvarchar instead of varchar data types, for instance).  A lot of folks decided to go ahead and forego the ability to store international character sets, with justifications like "we don't do business internationally."  The argument tends to be along the lines of "we'll change the database in the future... if we have to."

Well if you're waiting for the future, it appears that ICANN has started the process of bringing the future to us.  The organization that controls Internet name and number assignments has announced they are going to start approving international non-Latin characters for domain names (http://www.pcworld.com/article/181138/icann_oks_international_domains_the_pros_and_cons.html).  This will affect Web-based data stored in databases--Web URLs and email addresses, for instance.  Even if you don't currently do business internationally, some of your Web resources and contacts might be implementing international names soon.  In the meantime, you can start future-proofing your databases by using the nvarchar data type where appropriate, including Web addresses, email addresses, and pointers to other Web-based resources.

Published Sunday, November 01, 2009 8:05 PM by Mike C

Comments

 

Greg Linwood said:

International characters dont always take more space.

Take the word "understand" as an example. In Chinese it only takes two characters 明白 ("Ming Bai") or 4 bytes in unicode format, whereas the English version takes 10 bytes when stored in ANSI format (20 bytes in unicode)

My point is that not everything doubles purely due to use of international characters. In fact, allowing Asian languages to store URLs in their native format will actually take up LESS space if your data is predominantly in Asian language.

November 2, 2009 9:22 AM
 

Mike C said:

Exactly, just like the word international [traditional Chinese] I used in the title of this post (國際).  As you pointed out, storing the English word "understand" takes twice as much space in Unicode than it does in ANSI.  Presumably this is where the space argument comes from.

November 2, 2009 5:57 PM
 

Greg Linwood said:

Another thing I love about the Chinese word 明白 is that the two characters individually mean "Bright" & "Clear", which gives a far clearer indication of the meaning "understand" than the English equivalent. What exactly do "under" & "stand" have to do with understanding? (c:

November 3, 2009 3:39 AM
New Comments to this post are disabled

This Blog

Syndication

News

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement