THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Aaron Bertrand

Aaron is a Product Evangelist for SentryOne, makers of Plan Explorer and a performance monitoring and event management platform for the Microsoft Data Platform and VMware. He has been blogging here at sqlblog.com since 2006, focusing on manageability, performance, and new features, and also blogs at blogs.sentryone.com and SQLPerformance.com; has been a Microsoft MVP since 1997; tweets as @AaronBertrand; and speaks frequently at major conferences, user group meetings, and SQL Saturday events worldwide.

Bad habits to kick : declaring VARCHAR without (length)

In my last post in this series, I talked about using meaningless table aliases.  This time I'm going to talk about a pet peeve of mine: declaring varchar / nvarchar variables or parameters without specifying how many characters they should hold.

Thankfully, I see this issue seldom in the code I've inherited in my own systems.  But I do see it quite a bit on blog entries, forum posts and newsgroup questions.  Because the length attribute is optional, people seem to make the assumption that defining a VARCHAR in T-SQL is like defining a string in C#: "it will just expand to whatever size you stuff into it."  The problem is, if you don't define a length, SQL Server will assign one for you, and it may not be as wide as you expect.  In some scenarios, it will be 1 (yes, that is the number one, not a typo), and in others it will be 30.  Here is an example:

DECLARE @foo VARCHAR;
SET @foo = REPLICATE('a'255);
SELECT LEN(@foo);
-- 1

SELECT LEN(CAST(REPLICATE('a'255AS VARCHAR));
-- 30 

I have asked for the default size of VARCHAR values with unspecified lengths to be consistent; Erland wants the "optional" part deprecated.  We've each pleaded our case on Connect:

#244395 : Deprecate (n)varchar with out length specifcation

#267605 : Make size of VARCHAR (no length) consistent

Anyway, back to the original assumption (that any size will do); this assumption is not devastating when you are creating a table, because you will get an error message if you try to stuff more than 1 character.  An example:

CREATE TABLE dbo.foo(bar VARCHAR);
GO
INSERT dbo.foo SELECT REPLICATE('b'2);
GO
DROP TABLE dbo.foo

The column is actually created as a VARCHAR(1).  So the result of the INSERT statement:

.Net SqlClient Data Provider: Msg 8152, Level 16, State 14, Line 1
String or binary data would be truncated.

The statement has been terminated.

It will only take one or two tries before you (or some of your users!) discover the problem; surely it will be corrected quickly.  However, it can very easily lead to data loss or corruption in other scenarios, and you might not notice for some time.  Imagine if you forget to specify the length of a parameter to a stored procedure; since any data value > 30 characters will be truncated silently, portions of your strings will simply disappear!  A more elaborate example to illustrate this:

USE [tempdb];
GO

CREATE TABLE dbo.foo
(
bar VARCHAR(255)
);
GO

CREATE PROCEDURE dbo.foo_create
    
@bar VARCHAR
AS
BEGIN
    SET NOCOUNT
ON;
    
INSERT dbo.foo(barVALUES(@bar);
END
GO

DECLARE @bar VARCHAR(255);

SET @bar REPLICATE('x'255);

EXEC dbo.foo_create @bar = @bar;

SELECT
    
[variable] = LEN(@bar),
    
[stored]   = LEN(bar)
FROM
    
dbo.foo;
GO

DROP PROCEDURE dbo.foo_create;
DROP TABLE dbo.foo;
GO

Result:

variable  stored
-------- --------
255    1

This is bad news.  Unless your testing process includes checking the data in the table(s) directly after the procedure call, you might not figure this problem out for a while.

So, please don't be lazy: declare your string lengths, always!

Similar problems, though probably not as severe, can happen with DECIMAL and NUMERIC types.  George Mastros blogged about it this morning.

I am working on a series of "Bad habits to kick" articles, in an effort to motivate people to drop some of the things that I hate to see when I inherit code.  Up next: using dashes and spaces in entity names.

Published Friday, October 9, 2009 10:04 AM by AaronBertrand

Comments

 

jamiet said:

Aaron,

Not sure if this is the sort of thing you're looking for but one of my pet peeves is developers using a stored proc to return a dataset (i.e. it contains only a single SELECT statment) when a TVF would do the job just as well with the added bonus that you have some metadata in sys.columns.

Of course, this is rather subjective so I know many won't agree.

-Jamie

October 9, 2009 9:20 AM
 

George Mastros said:

There is a very similar problem with the Decimal/Numeric data type (defaulting to (18,0)).  Your blog inspired me to write this.

http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/decimal-and-numeric-problems-when-you-do

October 9, 2009 10:09 AM
 

Adam Machanic said:

I'm guilty of using the default VARCHAR in pretty much all of my code whenever I convert integers or dates to strings, e.g.:

SELECT CONVERT(VARCHAR, 1)

I never declare variables like that, though.

October 9, 2009 10:13 AM
 

AaronBertrand said:

Adam, in those cases I use RTRIM() instead of CONVERT().  Not that that's a better habit; just agreeing that I have a similarly bad habit.  :-)

October 9, 2009 10:17 AM
 

dd said:

It seems kind of ridiculous to me that a varchar has an automatic internal limit in case we do not specify one.  A varchar must mean you can store any string of any length in it.  This seems like a feature stuck in the last century when we had to count every byte and word to accommodate hardware limitations.

October 12, 2009 8:16 AM
 

AaronBertrand said:

dd, that's what VARCHAR(MAX) is for.  If I don't *need* the ability to store virtually unlimited text, why should SQL Server create a column automatically that is capable of doing so?

Personally, I believe that there shouldn't be a default limit at all, but I think you should be forced to specify one.

Just because hardware has improved does not give you an excuse to be wasteful.  I/O is still a very vital metric in SQL Server performance, and so page usage and fragmentation are very important factors to consider when designing tables.  Making every column a VARCHAR(MAX) is just inviting abuse.

October 12, 2009 9:03 AM
 

Madhivanan said:

October 12, 2009 9:57 AM
 

Madhivanan said:

Adam, note that converting to varchar without specifying the length results to having maximum of 30 characters

October 13, 2009 8:39 AM
 

AaronBertrand said:

Madhivanan, this is certainly okay if he is converting dates and integers (unless you know of dates or integers that will be > 30 characters)?  He wasn't suggesting that what he's doing is a good idea, but compared to other ways this "feature" is abused, I think his exception is relatively safe.

October 13, 2009 9:19 AM
 

Joe Celko said:

The Standards say the default length is one.  End of discussion.

October 13, 2009 10:48 AM
 

AaronBertrand said:

Ha!  So the standard says x, therefore we can ignore all implementation details, even if they don't match the standard.  <shrug>  Can't win with you Celko.

October 13, 2009 10:55 AM
 

Ken Lee said:

I agree. Always specify the length for clarity. Also if you really do want a 1 character field in a table, specify CHAR(1) not VARCHAR(1) because of the overhead. A zero length field actually take 3 more bytes to store than the fixed length field. There is also some lookup performance hit since any prior variable length fields would have to be parsed to find your VARCHAR(1) field value in the table.

PS I was unaware that it could be 30 characters long in some situations since I never investigated a practice I didn't use.

October 21, 2009 4:22 PM
 

Bogdan Calmac said:

Sometimes it is not reasonable to assume the size of function arguments. Let's say I want to write a function IS_SUBSTRING(s1,s2) which is a wrapper around CHARINDEX(s1,s2).

Now, since this is a general purpose function, I cannot make an educated guess about the size of the arguments and I would be forced to play safe and use the maximum size, varchar(8000).

Is there any performance penalty for doing that? Are those 8000 bytes ever allocated if the actual arguments are small?

Regarding my use case, I need the wrapper so i can write SQL which is poetable across mutiple DBMSs.

October 22, 2009 2:36 PM
 

AaronBertrand said:

Bogdan, obviously choosing an appropriate maximum size is not the same problem as leaving off the size entirely.  I don't believe you will see any performance penalty for doing so, and even if there is, you will probably experience a much greater overhead for using the function than anything having to do with the size of the data types you pass in.  I am sure there are cases where that size matters, but not in the simple case you describe.

October 22, 2009 2:42 PM
 

Bogdan Calmac said:

Hi Aaron,

While I understand why it makes sense to specify the size of DB columns (because you need a hint of how to organize data in the blocks), I think that requiring to specify the size of varchar function arguments is an unjustified and arbitrary restriction.

Now talking about performance, if those 8000 bytes are actually allocated on every invocation of the function and I do something like:

select * from A_BIG_TABLE where IS_SUBSTRING(TAG, ?)

(where the function is called tens of millions of times) then I'm not sure if we can say that the allocation overhead is negligible. My use case is just as described above and I'm a bit concerned. I'll probably need to run some benchmarks to make sure that the 8000 bytes don't affect performance.

October 22, 2009 3:17 PM
 

AaronBertrand said:

Bogdan, your use case looked like you were passing in two strings, not a column reference.  Obviously, ALWAYS test where you're not sure about performance impact when you could do things in a number of ways.

As for the unjustified and arbitrary restriction, I disagree wholeheartedly.  First of all, this is not *my* restriction.  Second of all, there are good reasons where the potential size of a string should be well-defined, and not just "any string of any size can go here."  If that is your need then perhaps consider testing with a MAX type instead of <= 8000.

October 22, 2009 3:24 PM
 

Bogdan Calmac said:

Hi Aaron,

Well, then why should my IS_SUBSTRING function be subjected to different standards than the built-in CHARINDEX function? Were the developers of CHARINDEX forced to decide on a size for the arguments? Is this all because arguments are passed value?

Also, Oracle does not force you to specify the size of varchar function arguments. If they can do without it ...

October 22, 2009 4:19 PM
 

AaronBertrand said:

You think the CHARINDEX function is not restricted in any way?  The first expression is limited to 8000 characters (my guess is that the documentation should say bytes here, not chars).  The second expression is limited to the MAX type.  The function behaves differently depending on whether the second parameter is <= 8000 or > 8000, but your IS_SUBSTRING function could check the length and behave in a similar way as well.

As for what Oracle does that SQL Server doesn't, well, I don't know what to tell you.  I can't snap my fingers and magically change what SQL Server can do in response to some wonky Oracle feature.  For that you'll have much better luck at connect.microsoft.com/sql than telling me about it.

October 22, 2009 4:29 PM
 

Bogdan Calmac said:

Hi Aaron,

No, I'm not that unrealistic to expect SQL Server getting changed because of a chat :-)

But regarding your comment about "varchar without limit is a bad habit", I don't know what to say, it's not that obvious if you are talking about it in principle. Yes, given that Sql Server transparently truncates arguments to varchar(1), it's a recipe for disaster (I have spent about the whole morning debugging this).

But if you were my teacher in a DBMS class, you would have a hard time justifying why a function that takes a string argument must always declare the max size of the string. If it's by reference, who cares? Yes, the implementation of the function might decide to check the length and report some error but it should not be forced by the language to declare a size. Transparently truncating to a string of size 1 is probably the worst option. How can one figure out the problem before seeing your blog post? :-)

October 22, 2009 5:04 PM
 

AaronBertrand said:

Still not sure I understand the problem.  If you don't want to specify a max size, use a MAX type.  Then you can send in 1 byte or 2 GB, so who cares?

Keep in mind that not everyone working with databases understands (or needs to understand) the differences between by reference and by value.  I'm not teaching a DBMS class, or reveling in theory; I'm trying to show some bad habits around specific use cases.

The point of my post was that in SQL Server, because when you don't specify you can get different and potentially tragic behavior in some scenarios, it is a bad habit.  I was not intending to get into a philosophical discussion about whether or not, in general, defining the bounds of a string is a good idea.  I think it is; you don't.  We'll have to agree to disagree, and sit by and watch whether Microsoft jumps on your suggestion and changes the way the platform currently works.

October 22, 2009 5:13 PM
 

Bogdan Calmac said:

Yes, I'm happy to disagree.

But I hope there is no debate that silently truncating arguments defined as "varchar" to "varchar(1)" is an unfortunate and error-prone design choice. Disallowing varchar without a size or reporting a truncation error would have been a much wiser approach.

Do you consider this quirk as something that everybody the works with Sql Server should know? Is it intuitive?

October 22, 2009 5:39 PM
 

AaronBertrand said:

Yes, it was a very unfortunate and error-prone design choice.

We have asked for the elimination of the lazy syntax, or at least making the silent truncation consistent (see the two connect items in the original post).  And truncation errors abound in other areas (such as when you make your parameter VARCHAR(255) and try to insert > 32 characters into a VARCHAR(32) column - why can't we choose to get a similar error when passing to a parameter or variable?).

No, I do not think this behavior is intuitive, and yes, I think everyone working with SQL Server should be aware of it.  I have seen countless people hitting the truncation problem and not understanding why.  Part of the reason I put this post together.

October 22, 2009 5:45 PM
 

Bogdan Calmac said:

Hi Aaron,

Actually I forgot to thank you for the post, so I'm doing it now at the end. It is because of your post that I was able troubleshoot my problem.

Thanks,

Bogdan.

October 22, 2009 6:10 PM
 

Emtucifor said:

>> Jeo Celko said: The Standards say the default length is one.  End of discussion.

God said: The Standards say pride is a sin. End of discussion.

October 23, 2009 8:44 PM
 

Niall Baird said:

Thanks Aaron - I've just come across this issue after nearly 12 years of working with SQL Server (from 6.5->2005).   I have a table (ref_Paramters) with one column declared as varchar(max).   I (following the naming standards where I currently work) named the columns [Object], [Parameter], [ParameterValue] and when I tried to bring in the value (ParameterValue) into and SSIS variable, found that I had to convert "ParameterValue" to a varchar (even though its a varchar in the database).

My sql was "SELECT ParameterValue FROM dbo.ref_Parameters WHERE [Object] = 'Something' AND [Parameter] = 'Path'"

Had to change that to "SELECT CONVERT(varchar,ParameterValue) AS ParameterValue FROM dbo.ref_Parameters WHERE [Object] = 'Something' AND [Parameter] = 'Path'" to at least get it putting something into my SSIS variable.   But - the length of the value was 80 characters, and it was being truncated to 30 characters.

Eventually, after asking around at work, someone mentioned that SQL Server truncates on Convert(varchar) if you don't put a length in - problem solved.  What really interests me, is why I have never come across this in 13 years of coding as a database developer?

Is this 'collation' dependant?

September 22, 2010 8:49 PM
 

Aaron Bertrand said:

In my last post in this series , I talked about the assumptions many people make about IDENTITY columns.

April 25, 2012 8:45 AM
 

TxCharlie said:

jamiet,

When Table Variables first came out like a decade ago, I seem to remember a limitation that quickly blew my TVF's out of the water. That's one reason why old-timers like me use Stored Procs that return recordsets instead of Table Valued Functions.

I THINK it was a varchar size limit of 256 and it must have been SQL 2000, but I just tried to Google it, and couldn't find any mention of the limitation.

On Aaron's pet peeve, I've been guilty of using CONVERT() to VARCHAR (no size), and it seemed to work for long strings, but maybe I was just lucky. If that's a problem, I guess I have some code out there that's truncating data!

Oh well, I'm laid off now, so that's someone else's problem, like maybe one of you guys!!! :-)

I remember at the time that the limit wasn't well-documented.

Also, sometimes you need to return four or five recordsets with transaction consistency, for the consumption of a C# function, which is just easier and cleaner with Stored Procs - Eventually you have to do a SELECT on the table variables, anyway.

May 18, 2013 6:13 AM
New Comments to this post are disabled

About AaronBertrand

...about me...

This Blog

Syndication

Archives

Privacy Statement