THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Louis Davidson

Commenting your code

As I am easing back into real life from writing the book, I am in search of easy targets for blogging.  My boss mentioned this blog over on Jeff Atwood's Coding Horror Blog and it got me thinking about commenting.  His advice is to only comment "why" the code works.  I can't quite agree, because the code he claims to be acceptable is:

private double SquareRootApproximation(n) {
  r = n / 2;
  while ( abs( r - (n/r) ) > t ) {
    r = 0.5 * ( r + (n/r) );
  }
  return r;
}
System.out.println( "r = " + SquareRootApproximation(r) );

I mean, it is better than some code I have seen,  but still, I would like a bit more information about why this works.  Maybe the name of the algorithm used, or at least what to do if this fails to provide the expected results.  Admittedly this is probably something that could be easily found, but most algorithms are not.  Comments in my mind should at least lead you to understand the mindset of the programmer.  What would actually improve this code in my mind is to change the variables to full words (though in this case it might not make sense to do this.)

On an extremely different side of things is this article from "Edgewood Solutions Engineers" on mssqltips.com. Their answer is to explain what the code is doing in simple terms, making sure to comment almost everything.  They have a very elaborate header devised, with dependencies, both users of the object and objects it used.  Most of what is said seems a bit like overkill, but their point here "Comment all of the major code blocks of the code and the critical minor points that can be easily overlooked such as a obscure WHERE clause." is a good one.  I generally pepper my code with comments where I think it will be hard to debug for myself later, with a consideration for others, particularly when those others will call me to explain the code.

Which brings me to my commenting philosophy. I personally think you have to comment to the expected lowest common denominator.  Think of the dumbest person who could have the need to read your code who is also qualified to have their job (otherwise you would have to write instructions on every line of code). If the qualified person can figure out what you are doing just by your naming conventions and , then it doesn't need comments. But if that person would look at the code and reasonable figure it out, then there is no need to comment the code.  What this requires is a few things:

  • Naming objects - if your procedures, tables, columns, functions all have meaningful names, you won't have to explain what they mean, saving time
  • Good design - if the relationship between objects and the cardinality of those relationships is clear, then you don't need to explain that what you are doing is hack due to poor thinking...
  • Naming variables - probably the most important thing to avoid the need for comments is naming stuff.  Name variables with words, not single character values (except sometimes i, x, etc will suffice for obvious typical uses)
  • Reasonable code formatting - SQL has no real form, so you could write procedures on a single line.  You could.  You could smash your hand with a hammer too.  Neither action would be very good.  (Consider using Red-Gate's SQL Refactor tool if nothing else.)

However, the fact is, for SQL code, the real problem comes in when you start coming up with cool relational methods of solving problems that most moderately qualified people wouldn't get. For example the trick of using a sequence table to break apart a comma delimited list. Couple that with a join and you get some amazingly cool code, but how do you comment it?

For example, say an architect that shouldn't be an architect designs a table with a comma delimited list like this (didn't I mention good design earlier?  I hate having to say this is a hack, but it is an elegant hack...)

--excerpted from Chapter 7 of Pro SQL Server 2008 Relational Database Design and Implementation
CREATE TABLE poorDesign
(
     poorDesignId int,
     badValue varchar(20)
)

INSERT INTO poorDesign --using 2008 syntax
VALUES (1,'1,3,56,7,3,6'),
            (2,'22,3'),
            (3,'1')

You can "normalize" this set using a table of numbers (in my examples named tools.sequence) and a really cool join:

SELECT    poorDesign.poorDesignId as betterDesignId,
          SUBSTRING(',' + poorDesign.badValue + ',',i + 1,
                CHARINDEX(',',',' + poorDesign.badValue + ',',i + 1) - i - 1)
                              
as betterScalarValue
FROM     poorDesign
             JOIN tools.sequence
                    on i >= 1
                      AND i < LEN(',' + poorDesign.badValue + ',') - 1
                      AND SUBSTRING(',' + + poorDesign.badValue + ',', i, 1) = ','

But are there enough pixels available on the planet to make that more understandable to most SQL programmers? Even the reasonably qualified?   I mean, I am still kind of amazed at the technique and the fact that it returns the following:

betterDesignId betterScalarValue
-------------- -----------------
             1                 1
             1                 3
             1                56
             1                 7
             1                 3
             1                 6
             2                22
             2                 3
             3                 1

still impresses me.  Frankly I don't know how to comment that code to make it readable.  In a real situation I would settle for a comment before the SELECT that stated:

--Uses a table of numbers to parse the comma delimited list into a SQL acceptable format.
--If you don't understand this code, read this article: http://www.sommarskog.se/arrays-in-sql-2005.html#tblnum

Opinions? What do you use for a comments in your code?  Do you have commenting policies?

Published Wednesday, July 30, 2008 10:19 PM by drsql

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

Personally I try to err on the side of over-commenting if the code is especially complex or important to the system.  But I also try to keep things relatively "self-commenting" when possible.  Good use of clean coding style and descriptive variable names, etc.  This greatly reduces the need for a lot of comments.

The biggest headache for me is going back through code I've already written and adding comments that should have been there to begin with.  To combat this, there is an interesting coding technique where you write all of your comments first, THEN write your code.  This way it's already commented.

So you would start with, e.g.:

--declare loop variable

--loop 1000 times

 --print the value of the variable

And then flesh it out, adding code beneath each comment:

--declare loop variable

DECLARE @x INT = 1

--loop 1000 times

WHILE @x <= 1000

BEGIN

 --print the value of the variable

 PRINT @x

 SET @x = @x + 1

END

... I don't use this technique as often as I probably should, but I think it would work quite well in conjunction with a test-first methodology, producing well-tested and well-commented code in one shot.

July 31, 2008 11:29 AM
 

drsql said:

Yeah, writing the comments first is a very good thing to do.  I *try* to do that when I have to write complex code.  Definitely makes you design the code you write, rather than just letting it unspool as you go. And when you realize what you have done is wrong, you have all that code written...eh, making it work isn't such a bad idea

July 31, 2008 4:19 PM
 

Geoff Bennett said:

I think what Jeff is getting at is, coupled with a good test, that code really needs no comments. At the most, if the algorythm was particularly daunting you could add a hyperlink to some internal company document or website describing the workings of it. Ie:

private double SquareRootApproximation(n) {

 // uses the brackston-hicks apathetic approximation method

 // see: http://www.b-h.com/apathy/

 r = n / 2;

 while ( abs( r - (n/r) ) > t ) {

   r = 0.5 * ( r + (n/r) );

 }

 return r;

}

Or something similar.

The test will prove that the routine works correctly for a given set of inputs. The method name tells me if I need it or not, because I'll know if I want an approximate square root or to delete a file. The how is not (yet) important to me. In the end, there's no comment you could add to the routine that could help me better understand *why* I would want to use it.

I've seen too much code (even still today...) that looks like this:

private double SquareRootApproximation(n) {

 // divide n by 2

 r = n / 2;

 // while the absolute value of yada yada....

 while ( abs( r - (n/r) ) > t ) {

   // multiply the engstrom by the manifold pressure

   // of a european swallow...

   r = 0.5 * ( r + (n/r) );

 }

 return r;

}

All of which adds nothing, and actually makes it harder to debug the code when something's wrong anyway. I mean, what if it's meant to be an African swallow? The comment says it should be European, and it's using a European one, so that can't be the problem...

August 1, 2008 12:31 AM
 

drsql said:

Geoff, thanks for the comment.  I agree with you mostly, but the how is important for later debugging.

// uses the brackston-hicks apathetic approximation method

// see: http://www.b-h.com/apathy/

This is perfect for ideas that you copy from somewhere. However, if you are the person stuck trying to figure out this:

r = 0.5 * ( r + (n/r) );

Without that...I pity you :)  

I am 110 percent in agreement with this:

// while the absolute value of yada yada....

while ( abs( r - (n/r) ) > t ) {

An English (or whatever your language is!) reiteration of the code is not useful.  Even using the method Adam mentions, to comment the code first, you would rarely see that occur.

You would however see something like

//loop until the ____ calculation is reater than whatever the heck t

//is, since ______.

>>I mean, what if it's meant to be an African swallow? The comment says it should be European, and it's using a European one, so that can't be the problem...<<

I think that messing things like that up is actually grounds for justifiable homicide in many jurisdictions. (don't think that is spelled right, but you should get the point :)

And while I agree with you too many comments can make things hard to read (especially when people explain what a system function does, rather than why they are using it.) It is certainly better than no comments on a poorly named, poorly formatted module:

private double sApp(n) {

r = n / 2; while ( abs( r - (n/r) ) > t ) {  r = 0.5 * ( r + (n/r) );  }  return r; }

Guh?

And + 10 points for the Python reference :)

August 1, 2008 1:05 AM
 

Geoff Bennett said:

Ha, thanks. :)

Absolutely! I'm not against comments, just not for lots of them. I think a problem that SQL has is a lack of unit test-ability. Quite often, a unit test will tell you more about the code than any comment will, but it still won't explain an algorythm that looks like someone bounced a basketball on the keyboard.

And set operations can be particularly tricky to understand if you're not up on your joins! I still find developers here resorting to cursors when you could achieve the same thing with a join, not to mention nested subqueries.

Typically, the only time you need comments is when you're trying to figure out why you should call a method, or a method you're calling has broken and you need to understand whats going on. What Jeff was writing about goes a long way to point 1, and where you're coming from (I think, could be wrong...) goes a long way to point 2.

August 1, 2008 2:41 AM
 

Donald Knooth said:

Only weenie programmers leave comments like bread crumbs to find their way home.  It was hard to code, it should be hard to understand!  ;-)

August 4, 2008 2:32 PM
 

JJEugene said:

I comment a lot - both for myself and that mythical person that is going to take my place in the future when I win the lottery and no longer have to work (better than getting hit by a bus).

The idea that you need to comment for the lowest common denominator doesn't hold with me.  Even the smartest cookie on the block, who can figure out every bit of code no matter how obscure, will be able to read real English faster than s/he can read code.  One of the many reasons to put in lots of good documentation is to make maintenance much faster.  I may have written the code a year ago or even last month.  I can understand my code no problem, but I can read the comments much faster.  When I need to fix something quickly, I want to be able to quickly scan my code to get to the offending parts.  Comments help with that.

That's just one reason to comment, but since it hadn't been mentioned yet, I wanted to mention it.

The idea of commenting first is great.  I use it all the time both when working on hard procs and to help me stay focused.  It actually speeds up code writing while making it more accurate/less likely to have bugs and preventing the need to go back and document later.

The arguments concerning SquareRootApproximation miss the point because that is a trivial example.  Yes, I can figure out what SquareRootApproximation does by reading the name.  And I have similar procs in my databases (for example: fs_FormatDate()).  However, most of my procs and application code are not so trivial.  And even for the trivial code, WHY are you writing it?  How do you justify it?  What other code calls that code?  Why didn't you do it the other way which looks better because of bla bla reason?  By explaining all that in comments, then future coders will know when your original code is no longer needed because conditions have changed.

Comment what.  Comment why.  Comment why not.  Comment use.

August 6, 2008 2:52 PM
 

JJEugene said:

I forgot one more important area to comment - assumptions.  Having to document your assumptions may shame you into doing more error checks (so that you are not assuming anything after all).  At a minimum, it really forces you to think about your code, consider it's weaknesses and decide what trade-offs you consider worth risking for faster coding.  

By actually documenting/writing up assumptions, and not just thinking about them (leading to forgetting this step completely), you force yourself to do this process and create some documentation that may even be helpful in catching future program bugs.  If the assumption is written out in plain English (or whatever your language), then it is much easier to catch the problem when the assumption is no longer true.

Ask yourself for every line or related code block: what am I assuming when I use this syntax?  If it is an important assumption, then write it down.

August 6, 2008 6:36 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Links to my other sites

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement