I generally sleep well and deeply. My wife and I once went to sleep, back when we lived in the Heights neighborhood in Houston, and when we woke up the next day, the house across the street had been removed. We never heard a thing. However, tonight it’s 3 AM here in Seattle and I am wide awake writing to you about data types. Why? Because a software vendor is making me crazy with their database schema. This is sad and wrong on many levels, but there it is. It’s harder, I think, to be held responsible for nonsense someone else built, than to be responsible for something you have some control over – even if your own creation is equally flawed.
So, at work we have a big system from a major, TLA vendor that you have certainly heard of. I won’t say who, but if you want to guess, start by Googling Tougher than Leather and rhyme on with Run. Soon, this product will soon be amped up to a bigger, even more mission-critical role for our organization. I’m sure it’ll work fine, despite its … erm … suboptimal database design. I hope so, or I will have a rough year. I’m sure we can make it work.
Anyway, it’s an old song, but I’ll sing it again just to make myself feel better: don’t store dates as strings. Just stop it. Why not? How could it matter? I will give you five seven real, down-to-earth, pragmatic reasons why storing dates as text is a bad idea. These are not highfalutin, fancy-pants, Chris Date-inspired theoretical notions; this is just basic common sense. It’s 2010, for goodness sake.
Dates as Text Waste Storage
Here’s some simple arithmetic: a datetime value in SQL Server 2005 takes 8 bytes of storage. ‘2010-01-01 04:00:00’ takes 19 bytes. The newer date type in 2008 takes 3 bytes of storage. ‘2010-01-01’ takes 10 bytes to store. Multiply by 100 million. For SQL Server 2008, the storage required for the text form of a date is approximately 3.3 times larger than for a real date.
Dates as Text Waste Memory
All data manipulated by the server passes into the buffer cache, in RAM. The bigger the data, the more RAM you need. Simple, right?
Dates as Text Make my Servers More Expensive
So, if the data itself is bigger than necessary, and puts pressure on the server in terms of memory, then that pressure can be relieved by a fast IO subsystem. But a fast IO subsystem is expensive. I only like expensive when it’s either necessary or stylish. This scenario is neither necessary nor stylish. And the more you make me spend on servers, the farther I am from saving for that Maserati I have my eye on.
All things being equal, why not use the better/friendlier/smaller format? It’s FREE. No charge. Honest. But this is all trivial compared to the reliability issues that follow.
Dates as Text Invite a Whole Species of Preventable Bugs into your System
The whole idea of typed languages, and the reason data types caught on, is that strong typing helps you find bugs when designing, compiling and testing a system, so your customers don’t have to find them when they are running your system. Neat idea. So, if the values in a column in SQL Server are “supposed” to be dates, then make a rule that enforces that by typing the column. That way, when someone tries to put the words “N/A” or “Unknown” or “Fred Flintstone” into said column, they will be prevented from doing so, and everybody wins. It’s an idea that is at least 20 years old. At least. Basic defensive programming.
In addition, I would like you to consider the date ‘2009-02-30.’ Expansion of this subtopic is left as an exercise for the reader.
Dates as Text Create More Work for the Developer
If a developer has to write code against a table in SQL Server, and they face a column whose type is “possibly a date; perhaps not,” then he/she has to go to extra lengths to ensure that the app won’t be broken when (not if) that data has the wrong type. This introduces a whole requirement for a layer of exception handling like “cast all the values from the DB to dates, and if things explode, then do x, y, z.” That’s just a nuisance, in the first place. But what’s worse is how one would discover all the places in the system where such error handling is missing. How would you do that?
Dates as Text make Date Arithmetic Painful
When you have dates in a computer, there is clever code, already written, that can do things like add and subtract dates, or find ranges of dates. Etc. Text, not so much. Can dates in text form be stuffed into these functions anyway? Yes. Would you have to check that all the values cast to dates correctly? Yes. Annoying? Absolutely.
Dates as Text Cause Intractable Performance Problems Because of Sorting and Type Mismatch
Dates in a system like the one we use are not decoration – they are keys, they are vital search criteria, they are important. They might, for example, need to be in partition functions. And they need to be fast. Consider how your hapless DBA would index a column of date strings. Of course, the pain could be reduced some by, at minimum, using a string format that sorts, like ‘yyyy-mm-dd’. Just imagine how ‘yyyy-mmm-dd’ sorts. Or ‘m/d/yyyy’ (shudder).
Still, when a T-SQL statement comes in to the server, and the optimizer has to examine it, make a plan, and fetch data, there’s a whole minefield of issues with type casting. For example, if the app sends in a query that has a real date type parameter, and the server is faced with locating that value in an index containing text strings, is it reasonable to expect that to work? I don’t think it is, in the long term. Sure, it might be possible to stay on top of this issue and ensure that every single incoming query has the same date-string format as the stored data, but why set yourself up for such pain and suffering? The right way is so easy, and the wrong way is so hard.
It’s Tomorrow Somewhere
Anyway, I think the sun is up in New York now. I feel a little better, and at least I’ve passed some time. On the other hand, to me these issues are like a bad song played over and over again, which we never seem to be rid of. Why is that?
We’ll make this system work anyway. Afterward, I will try not to look at it too hard, or too often, because it makes my eyes hurt. But it’s all in a day’s work.