THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Merrill Aldrich

Dear ISV: You’re Keeping Me Awake Nights with Your VARCHAR() Dates

I generally sleep well and deeply. My wife and I once went to sleep, back when we lived in the Heights neighborhood in Houston, and when we woke up the next day, the house across the street had been removed. We never heard a thing. However, tonight it’s 3 AM here in Seattle and I am wide awake writing to you about data types. Why? Because a software vendor is making me crazy with their database schema. This is sad and wrong on many levels, but there it is. It’s harder, I think, to be held responsible for nonsense someone else built, than to be responsible for something you have some control over – even if your own creation is equally flawed.

So, at work we have a big system from a major, TLA vendor that you have certainly heard of. I won’t say who, but if you want to guess, start by Googling Tougher than Leather and rhyme on with Run. Soon, this product will soon be amped up to a bigger, even more mission-critical role for our organization. I’m sure it’ll work fine, despite its … erm … suboptimal database design. I hope so, or I will have a rough year. I’m sure we can make it work.

Anyway, it’s an old song, but I’ll sing it again just to make myself feel better: don’t store dates as strings. Just stop it. Why not? How could it matter? I will give you five seven real, down-to-earth, pragmatic reasons why storing dates as text is a bad idea. These are not highfalutin, fancy-pants, Chris Date-inspired theoretical notions; this is just basic common sense. It’s 2010, for goodness sake.

Dates as Text Waste Storage

Here’s some simple arithmetic: a datetime value in SQL Server 2005 takes 8 bytes of storage. ‘2010-01-01 04:00:00’ takes 19 bytes. The newer date type in 2008 takes 3 bytes of storage. ‘2010-01-01’ takes 10 bytes to store. Multiply by 100 million. For SQL Server 2008, the storage required for the text form of a date is approximately 3.3 times larger than for a real date.

Dates as Text Waste Memory

All data manipulated by the server passes into the buffer cache, in RAM. The bigger the data, the more RAM you need. Simple, right?

Dates as Text Make my Servers More Expensive

So, if the data itself is bigger than necessary, and puts pressure on the server in terms of memory, then that pressure can be relieved by a fast IO subsystem. But a fast IO subsystem is expensive. I only like expensive when it’s either necessary or stylish. This scenario is neither necessary nor stylish. And the more you make me spend on servers, the farther I am from saving for that Maserati I have my eye on.

All things being equal, why not use the better/friendlier/smaller format? It’s FREE. No charge. Honest. But this is all trivial compared to the reliability issues that follow.

Dates as Text Invite a Whole Species of Preventable Bugs into your System

The whole idea of typed languages, and the reason data types caught on, is that strong typing helps you find bugs when designing, compiling and testing a system, so your customers don’t have to find them when they are running your system. Neat idea. So, if the values in a column in SQL Server are “supposed” to be dates, then make a rule that enforces that by typing the column. That way, when someone tries to put the words “N/A” or “Unknown” or “Fred Flintstone” into said column, they will be prevented from doing so, and everybody wins. It’s an idea that is at least 20 years old. At least. Basic defensive programming.

In addition, I would like you to consider the date ‘2009-02-30.’ Expansion of this subtopic is left as an exercise for the reader.

Dates as Text Create More Work for the Developer

If a developer has to write code against a table in SQL Server, and they face a column whose type is “possibly a date; perhaps not,” then he/she has to go to extra lengths to ensure that the app won’t be broken when (not if) that data has the wrong type. This introduces a whole requirement for a layer of exception handling like “cast all the values from the DB to dates, and if things explode, then do x, y, z.” That’s just a nuisance, in the first place. But what’s worse is how one would discover all the places in the system where such error handling is missing. How would you do that?

Dates as Text make Date Arithmetic Painful

When you have dates in a computer, there is clever code, already written, that can do things like add and subtract dates, or find ranges of dates. Etc. Text, not so much. Can dates in text form be stuffed into these functions anyway? Yes. Would you have to check that all the values cast to dates correctly? Yes. Annoying? Absolutely.

Dates as Text Cause Intractable Performance Problems Because of Sorting and Type Mismatch

Dates in a system like the one we use are not decoration – they are keys, they are vital search criteria, they are important. They might, for example, need to be in partition functions. And they need to be fast. Consider how your hapless DBA would index a column of date strings. Of course, the pain could be reduced some by, at minimum, using a string format that sorts, like ‘yyyy-mm-dd’. Just imagine how ‘yyyy-mmm-dd’ sorts. Or ‘m/d/yyyy’ (shudder).

Still, when a T-SQL statement comes in to the server, and the optimizer has to examine it, make a plan, and fetch data, there’s a whole minefield of issues with type casting. For example, if the app sends in a query that has a real date type parameter, and the server is faced with locating that value in an index containing text strings, is it reasonable to expect that to work? I don’t think it is, in the long term. Sure, it might be possible to stay on top of this issue and ensure that every single incoming query has the same date-string format as the stored data, but why set yourself up for such pain and suffering? The right way is so easy, and the wrong way is so hard.

It’s Tomorrow Somewhere

Anyway, I think the sun is up in New York now. I feel a little better, and at least I’ve passed some time. On the other hand, to me these issues are like a bad song played over and over again, which we never seem to be rid of. Why is that?

We’ll make this system work anyway. Afterward, I will try not to look at it too hard, or too often, because it makes my eyes hurt. But it’s all in a day’s work.

Published Thursday, March 11, 2010 6:07 AM by merrillaldrich

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

jamiet said:

Hey Merrill,

If you're going to have "Dates as Text Waste Memory" & "Dates as Text Waste Storage" then surely you can have "Dates as Text Waste IO" too?

-Jamie

March 11, 2010 8:47 AM
 

jamiet said:

Oh hang on, I guess you did, you just happened to call it "Dates as Text Make my Servers More Expensive" :)

March 11, 2010 8:48 AM
 

merrillaldrich said:

Exactly :-)

March 11, 2010 9:05 AM
 

James Luetkehoelter said:

Awesome!

March 11, 2010 9:24 AM
 

Emtucifor said:

Dates as text seem to me to expose an underlying and gigantic conceptual failure on the part of the people who designed the database. You can be sure the same people don't *really* understand pointers, or unicode, or indexes, or how an signed negative number can be equal in memory to a signed positive number. That's because all of those require understanding that "the storage isn't the value" or maybe better said, "the human-presented representation of the value isn't the computer's native encoding of the value."

March 11, 2010 1:49 PM
 

merrillaldrich said:

Aha - you just hit Reason 8, maybe the cause of my sleepless night:

Dates as Text Make People Wonder What the Rest of Your Code Looks Like

One obvious blunder can erode the confidence of a dev or dba in the whole system...

March 11, 2010 3:17 PM
 

TheCosmicTrickster said:

I had this situation recently.  A vendor hired a programmer to create a program & database for a telephony application.  I was then given the task of reporting from this database.  Very reliant on date/time information - and the developer had created the schema using varchar(50) for almost everything, including datetime fields.  Ignoring what both the vendor & I had agreed on for the datatype definitions for the various tables.  Suffice to say, the programmer wasn't there long.  Still left me with all manner of nested CONVERT & CAST statements to do a simple date calculation or comparison.  :o(

March 11, 2010 6:46 PM
 

ALZDBA said:

And time and again, (N)(var)char date/datetime colmuns are just prone to user CLIENT SETTINGS !!

So on top of your arguments of using the correct datatype for the data, just add this buggy factor.

Since the data will not be stored in a "nutral culture", it will be prone to "translation" issues.

March 12, 2010 2:55 AM
 

Emtucifor said:

That was supposed to be "an unsigned positive number"... sorry.

March 12, 2010 1:10 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement