Someone named “Krygim” posted the following question today in the microsoft.public.sqlserver.programming newsgroup:
Before reading any further, I ask that you think for a moment and consider what you think the answer should be…
If you answered “yes”, you’re correct. An empty TEXT column will take up more space than a NULL one — quite a bit more space, as it turns out. Here’s the test code I posted to prove it:
USE tempdb CREATE TABLE atable(acol TEXT NULL) INSERT atable(acol) SELECT NULL FROM pubs..authors a, pubs..authors b, pubs..authors c, pubs..authors d EXEC sp_spaceused 'atable' CREATE TABLE btable(acol TEXT NOT NULL) INSERT btable(acol) SELECT '' FROM pubs..authors a, pubs..authors b, pubs..authors c, pubs..authors d EXEC sp_spaceused 'btable' DROP TABLE atable DROP TABLE btable
Running this on my end tells me that the NULLs took up only 3 MB, whereas the empty strings took up 33 MB — 11 times more space used!
I did a quick scan of Inside Microsoft SQL Server 2000 and discovered that for TEXT and IMAGE datatypes:
This, in addition to the 16-byte pointer to the off-row location of the data, explains the huge difference in size between these two tables.
At this point, you’re probably either wondering how to fix this or you’ve drifted away and are half-reading, half-thinking about what’s for dinner. If you’re in the latter category and not also in the former category, you either know the answer already or are not enough of a certified geek to be reading my blog. And if you’re in the former category and not in the latter category, your priorities are certainly skewed, as we all know that food is more important than any stupid DBMS space wasted due to empty strings problem.
But for those still following me, the answer is one of my favorite about-to-be-deprecated features of SQL Server, the mighty text-in-row option… Re-run the second part of the previous example, with a new line inserted:
USE tempdb CREATE TABLE btable(acol TEXT NOT NULL) EXEC sp_tableoption 'btable', 'text in row', '24' -- HINT: This is the new line INSERT btable(acol) SELECT '' FROM pubs..authors a, pubs..authors b, pubs..authors c, pubs..authors d EXEC sp_spaceused 'btable' DROP TABLE btable
There it is. Back down to 3 MB, even with the non-empty strings in the TEXT column. By storing the small data in-row, we’ve eliminated the 16-byte off-row pointer and the 84-byte root structure, and other assorted bytes that are used by the TEXT datatype that I don’t know about (the math didn’t quite add up when I tried to calculate where all of the space went).
So what does this tell us? I’m thinking that as a best practice, perhaps the text-in-row option should be used for every table with LOB columns, and that it should be set to around 100 bytes. This will still keep row sizes down when larger data is inserted (as it will go off-row), but it will also keep overall IO way down if the amount of rows with LOB data larger than 100 bytes is fairly sparse.
Comments?