THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul Nielsen

www.SQLServerBible.com

What’s wrong with SELECT * ?

As an interview question this one reveals the mindset of the person.

 

“Returns more columns than needed – wastes resources”

He’s concerned about the network/systems.

“If you add a column it might break the app”

He’s a developer.

“It compromises the extensibility of the database.”

He’s an architect type.

“It’s hard to build a covering index for a SELECT * query”

He’s a SQL Perf Tuning & Optimization guy.

Published Wednesday, December 12, 2007 2:21 PM by Paul Nielsen
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

andyleonard said:

LOL! Too true!

:{> Andy

December 12, 2007 3:29 PM
 

Adam Machanic said:

OK, apparently I'm not an architect type.  How does it "compromise extensibility"?

December 12, 2007 3:37 PM
 

Paul Nielsen said:

Select * would build a weak abstraction layer or be an indication that there's no abstraciton layer. Abstraction is a key for extensibility. Beside, architects see everything in terms of extensility ;-)

December 12, 2007 4:28 PM
 

Adam Machanic said:

Paul,

I'm confused about what abstraction has to do with extensibility in this case.  Isn't extensibility the ability for an application or system to be extended -- i.e., to add new features, attributes, or whatever else?  If I'm correct there, then couldn't it be argued (playing a bit of devil's advocate here) that SELECT * actually -helps- in that regard, by "automatically" sending new attributes to the caller?

December 12, 2007 6:32 PM
 

Paul Nielsen said:

Abstraction is the means of loosely coupling the database from the client. By defining and enforcing a contract API between the database and the client (be it a middle tier, ETL, report, or app) then both sides of the contract can freely refactor without breaking the other side of the contract. When new features are required, the change to the API contract can be agreed upon and both sides can build out to the new API, and the number of components affected is contained by the API. It's the same idea as SOA, but between the database component and the client component. And both sides of the contract need an abstraction layer, not just the client.

In contrast, a tightly coupled database – one with ad-hoc SQL directly touching the tables - is a brittle database; any change to the database structure causes a ripple effect of broken code (reports, tiers, forms, ETL, etc). Shops with tightly coupled databases spend more time figuring out how much code will break if they implement a database change than they do coding the change, or they can’t make the change at all which leads to outsourcing or switching to a competitor. I’ve spoken with a number of clients who have tightly coupled databases. I recommend changes to the schema and they say, “wish we could but we can’t; the SQL code is scattered all over in 2 million lines of VB code.”

That's why abstraction is the key to extensibility.

December 12, 2007 6:33 PM
 

Denis Gobo said:

Why do I have a feeling this will be kind of a repeat of the comments in the Data Architecture post  :-)

December 12, 2007 6:43 PM
 

Adam Machanic said:

Denis: Not a chance!

Paul: You seem to have drifted the discussion from SELECT * into the world of ad hoc SQL.  I think we are in perfect agreement there (see: http://www.simple-talk.com/sql/t-sql-programming/ to-sp-or-not-to-sp-in-sql-server-an-argument-for-stored-procedures/).  And I agree that a properly designed system is key to extensibility, but I'm not sure that proper encapsulation can be said to contribute, given that encapsulation will create MORE code to change any time a new feature is added.  This seems to me to sacrifice some of the extensibility, in favor of numerous other "-itys" -- security, maintainability, auditability, etc.

December 12, 2007 7:20 PM
 

Jared Ko said:

You forgot the ever-popular answer: "I don't know. I've always heard that it's bad."

December 12, 2007 7:30 PM
 

Denis Gobo said:

Oh I am not missing this thread, I have setup the email alert, in my gmail I created a filter which forwards the email to my blackberry

Now if we could get Joe Celko and Tony Rogerson to join the debate then it would really become interesting. We could even throw in a little Steve Dassin to tell us that SQL is crap and we should all use RAC  instead  :-)

December 12, 2007 7:48 PM
 

Alex Kuznetsov said:

December 12, 2007 8:40 PM
 

Sqlgoof said:

And lastly, putting people in a box :) But it's a good way to scope follow up questions.

December 12, 2007 8:52 PM
 

Paul Nielsen said:

obviously, my attempt to migrate my career from DBA to blog comedian is failing and I should keep my day job (as if I had one) ;-)

December 12, 2007 10:31 PM
 

Paul Nielsen said:

So these two atoms are walking into a bar, and one atom says to the other,

"Hey, I think I lost an electron."

"Are you sure?" say the other atom.

"Yes, in fact I'm positive!"

I know, day job.

December 12, 2007 10:34 PM
 

Paul Nielsen said:

Hi Adam,

Andy “got it” in the first comment. The post was never intended to launch a debate on abstraction layers and extensibility. Nevertheless, you’re a friend, and I want you to understand my view even if you don’t agree.

There’s plenty of theory around the ideas of loose vs. tight coupling and SOA. But, as you know, I’m not very impressed with theory unless I can observe the benefit. So I’ll be very pragmatic in my reasoning.

If I understand your question correctly, we agree that stored procedures are the way to go, but you’re asking me to clarify why I believe extensibility is a reason for a stored procedure abstraction layer even though it means more code. Right?

Let me build a common enough example.

Let’s assume there’s a widget system and five tables store widget configuration data. Based on four criteria some widgets can be selected for some purpose. A few websites, several forms, a couple ETL processes, and several reports all use this feature adding up to 50 separate components.

A tightly coupled database would have code in each of the 50 components that accesses the five tables, does the same logic, and gets the result.  Number of routines: 50

A loosely coupled database would place the logic that selects the widgets in a proc and the 50 components would call the proc. Number of routines: 51

Now the business requires a change. The same four criteria will be used to select widgets, but now the method will be a bit more complex and will require adding two tables, modifying three tables, and completely rewriting the logic.

The tightly coupled database requires the project team to modify the schema and then find, modify, and unit test 50 routines in multiple technologies and languages.

The loosely coupled database requires the project team to modify the schema and then modify and unit test one proc. The 50 external components need never be touched. The loosely coupled design is more extensible because it reduces the redundency of code, reduces the cost of change, and reduces the risk of errors introduced by the change.

Admittedly, if the new feature requires a new parameter or a new column then the API and all components have to modified, but from my experience, a large percent of of changes involve feature changes or improvements inside the database that can be easily coded with an abstraction layer but become nearly impossible, or at least very expensive, with a tightly coupled database. And if the change request does modify the API, having the API makes the entire process much more manageable.

December 13, 2007 2:59 AM
 

Alexey Knyazev said:

IF EXISTS(SELECT * ...)

December 13, 2007 4:07 AM
 

Denis Gobo said:

Joke ha? Here is one

A surgeon, a civil engineer, and a software engineer are sitting in a bar arguing about whose profession is the oldest.

The surgeon says "God created Eve from Adam's rib. Clearly that required some surgery, and my profession is the oldest"

Then the civil engineer says "I don't know, in the beginning it says God created the earth from chaos, so my profession must be the oldest"

On hearing this, the software engineer's eyes light up and he says "Hold on a second fellas. Who do you think created the chaos?"

December 13, 2007 6:48 AM
 

AaronBertrand said:

Alexey, FWIW, I prefer to use SELECT 1 inside an EXISTS clause.  Not that it makes any difference to the optimizer, but when I spot the SELECT query I know at a glance that this subquery does not return any data (you know, if I'm blind, and didn't see the wrapping EXISTS()).

December 13, 2007 6:56 AM
 

Adam Machanic said:

Hi Paul,

Of course anything you say on any public Web forum can spark a debate, unless it's a joke about atoms.  I think we've seen that time and again :-)

December 13, 2007 7:50 AM
 

steve dassin said:

Denis Gobo said:

'We could even throw in a little Steve Dassin to tell us that SQL is crap

and we should all use RAC  instead  :-)'

Well Denis I really don't think sql is crap but I much prefer another

database system:) Now Rac is something else. Rac is conceptually on the

same level as LINQ. Just as LINQ hides the sql and allows a declarative

syntax much more in line with a programming language so does Rac. So

anyone having a problem with Rac also has a problem with MS LINQ:) Do Rac

and LINQ imply a judgement on the value of using sql directly? Sure they do.

Don't all utilities. Now that judgement may not sit well with many people but

it nevertheless is there. MS and I do not think it necessary that all developers

have to possess indepth knowledge of sql. Does this mean MS thinks sql is crap?:)

www.beyondsql.blogspot.com

December 20, 2007 10:27 PM
 

Denis Gobo said:

A year in review, The 21 + 1 best blog posts on SQLBlog Best posts according to me, it might have been

December 27, 2007 4:11 PM
 

Brian Tkatch said:

A query is to return information. So, ask for what you want. Don't just say "give me everything". That's being lazy and inefficient.

When you re-look at your code tomorrow, SELECT * tells you nothing. SELECT important_column_1, useful_column_2 .... says a lot.

I always tell people * is for EXISTS, COUNT, and ad-hoc.

EXISTS - because i don't care about any individual COLUMNs, i only care if the record itself is there.

COUNT - to include NULLs, also, same reason as EXISTS. I'm usually COUNTing records, not values.

ad-hoc - Well, that just easier. :)

July 22, 2009 9:31 AM

Leave a Comment

(required) 
(required) 
Submit

About Paul Nielsen

Paul Nielsen believes SQL is the romance language of data. As such he’s a hands-on database developer, Microsoft SQL Server MVP, trainer, and author of SQL Server Bible series (Wiley). As a data architect, he developed the concepts of Smart Database Design and Nordic – an open source O/R dbms for SQL Server. He lives in Colorado Springs.

This Blog

Syndication

News

news item test
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement