Originally posted here.
Someone named "Krygim" posted the following question today in the microsoft.public.sqlserver.programming newsgroup:
"Will defaulting a TEXT field to an empty string take up more space
than defaulting it to Null when no data is entered into the field.
[sic]"
Before reading any further, I ask that you think for a moment and consider what you think the answer should be...
If you answered "yes", you're correct. An empty TEXT column will
take up more space than a NULL one -- quite a bit more space, as it
turns out. Here's the test code I posted to prove it:
USE tempdb
CREATE TABLE atable(acol TEXT NULL)
INSERT atable(acol)
SELECT NULL
FROM pubs..authors a,
pubs..authors b,
pubs..authors c,
pubs..authors d
EXEC sp_spaceused 'atable'
CREATE TABLE btable(acol TEXT NOT NULL)
INSERT btable(acol)
SELECT ''
FROM pubs..authors a,
pubs..authors b,
pubs..authors c,
pubs..authors d
EXEC sp_spaceused 'btable'
DROP TABLE atable
DROP TABLE btable
Running this on my end tells me that the NULLs took up only 3 MB,
whereas the empty strings took up 33 MB -- 11 times more space used!
I did a quick scan of Inside Microsoft SQL Server 2000 and discovered that for TEXT and IMAGE datatypes:
"If the amount of data is less than 32 KB, the text pointer in the data row points to an 84-byte text root structure."
This, in addition to the 16-byte pointer to the off-row location of
the data, explains the huge difference in size between these two
tables.
At this point, you're probably either wondering how to fix this
or you've drifted away and are half-reading, half-thinking about what's
for dinner. If you're in the latter category and not also in the former
category, you either know the answer already or are not enough of a
certified geek to be reading my blog. And if you're in the former
category and not in the latter category, your priorities are certainly
skewed, as we all know that food is more important than any stupid DBMS
space wasted due to empty strings problem.
But for those still following me, the answer is one of my
favorite about-to-be-deprecated features of SQL Server, the mighty
text-in-row option... Re-run the second part of the previous example,
with a new line inserted:
USE tempdb
CREATE TABLE btable(acol TEXT NOT NULL)
EXEC sp_tableoption 'btable', 'text in row', '24'
-- HINT: This is the new line
INSERT btable(acol)
SELECT ''
FROM pubs..authors a,
pubs..authors b,
pubs..authors c,
pubs..authors d
EXEC sp_spaceused 'btable'
DROP TABLE btable
There it is. Back down to 3 MB, even with the non-empty strings in
the TEXT column. By storing the small data in-row, we've eliminated the
16-byte off-row pointer and the 84-byte root structure, and other
assorted bytes that are used by the TEXT datatype that I don't know
about (the math didn't quite add up when I tried to calculate where all
of the space went).
So what does this tell us? I'm thinking that as a best
practice, perhaps the text-in-row option should be used for every table
with LOB columns, and that it should be set to around 100 bytes. This
will still keep row sizes down when larger data is inserted (as it will
go off-row), but it will also keep overall IO way down if the amount of
rows with LOB data larger than 100 bytes is fairly sparse.
Comments?