THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Caveats of the TEXT datatype

Someone named "Krygim" posted the following question today in the microsoft.public.sqlserver.programming newsgroup:

 

"Will defaulting a TEXT field to an empty string take up more space than defaulting it to Null when no data is entered into the field. [sic]"

Before reading any further, I ask that you think for a moment and consider what you think the answer should be...

If you answered "yes", you're correct. An empty TEXT column will take up more space than a NULL one -- quite a bit more space, as it turns out. Here's the test code I posted to prove it:

 

USE tempdb

CREATE TABLE atable(acol TEXT NULL)

INSERT atable(acol)
SELECT NULL
FROM pubs..authors a,
pubs..authors b,
pubs..authors c,
pubs..authors d

EXEC sp_spaceused 'atable'

CREATE TABLE btable(acol TEXT NOT NULL)

INSERT btable(acol)
SELECT ''
FROM pubs..authors a,
pubs..authors b,
pubs..authors c,
pubs..authors d

EXEC sp_spaceused 'btable'

DROP TABLE atable
DROP TABLE btable

Running this on my end tells me that the NULLs took up only 3 MB, whereas the empty strings took up 33 MB -- 11 times more space used!

I did a quick scan of Inside Microsoft SQL Server 2000 and discovered that for TEXT and IMAGE datatypes:

 

"If the amount of data is less than 32 KB, the text pointer in the data row points to an 84-byte text root structure."

This, in addition to the 16-byte pointer to the off-row location of the data, explains the huge difference in size between these two tables.

At this point, you're probably either wondering how to fix this or you've drifted away and are half-reading, half-thinking about what's for dinner. If you're in the latter category and not also in the former category, you either know the answer already or are not enough of a certified geek to be reading my blog. And if you're in the former category and not in the latter category, your priorities are certainly skewed, as we all know that food is more important than any stupid DBMS space wasted due to empty strings problem.

But for those still following me, the answer is one of my favorite about-to-be-deprecated features of SQL Server, the mighty text-in-row option... Re-run the second part of the previous example, with a new line inserted:

 

USE tempdb

CREATE TABLE btable(acol TEXT NOT NULL)

EXEC sp_tableoption 'btable', 'text in row', '24'
-- HINT: This is the new line

INSERT btable(acol)
SELECT ''
FROM pubs..authors a,
pubs..authors b,
pubs..authors c,
pubs..authors d

EXEC sp_spaceused 'btable'

DROP TABLE btable

There it is. Back down to 3 MB, even with the non-empty strings in the TEXT column. By storing the small data in-row, we've eliminated the 16-byte off-row pointer and the 84-byte root structure, and other assorted bytes that are used by the TEXT datatype that I don't know about (the math didn't quite add up when I tried to calculate where all of the space went).

So what does this tell us? I'm thinking that as a best practice, perhaps the text-in-row option should be used for every table with LOB columns, and that it should be set to around 100 bytes. This will still keep row sizes down when larger data is inserted (as it will go off-row), but it will also keep overall IO way down if the amount of rows with LOB data larger than 100 bytes is fairly sparse.

Comments?

Published Wednesday, July 12, 2006 10:02 PM by Adam Machanic
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement