THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Aaron Bertrand

Bad habits to kick : putting an IDENTITY column on every table

Back in October, I started a series of blog posts called "Bad Habits to Kick," and thought I would revive the theme.

I've worked with developers that dabble in SQL, and they tend to have a few common traits.  I'm not sure where they come from, but one that I find rather distracting is the tendency to place an IDENTITY column on every single table.  Usually this is done because this is an "easy" way to add a column to the table that allows you to identify a single row.

Now don't get me wrong, I am a firm believer that IDENTITY columns have their place, and I use them quite liberally.  When you are storing information about customers, products or companies, where there are going to be multiple references to those entities in related tables, it makes little sense to copy all of the "real" attributes that identify that entity into all of the related tables.  In some cases this would make your storage requirements surge dramatically, never mind that the performance of JOINs will suffer.  It also can make for a real mess when you are repeating information that is likely to change - such as surname, e-mail address, phone number, ISBN.  Both of these issues are technical and not logical limitations - SQL Server doesn't support a "behind-the-scenes" foreign key construct that would make repeating the information unnecessary, and there are cases where cascading updates do not exactly work as advertised.

A simple example is where you have an Orders table and an OrderDetails table.  (And as a heads up, whiteboarding your perception of Barnes & Noble's database schema is a typical part of my interviewing process - how detailed we get is up to you :-).)

CREATE TABLE dbo.Orders
(
  
OrderID INT IDENTITY(1,1PRIMARY KEY,
  
...
);

CREATE TABLE dbo.OrderDetails
(
 
OrderDetailID INT IDENTITY(1,1PRIMARY KEY,
  
OrderID INT NOT NULL FOREIGN KEY REFERENCES dbo.Orders(OrderID),
  ProductID INT NOT NULL FOREIGN KEY REFERENCES dbo.Products(ProductID),
Quantity INT NOT NULL,
  
...
);

Now tell me, what business purpose does the IDENTITY column on the OrderDetails table satisfy?  When are you ever going to need to reference rows in this table by the OrderDetailID, and not by the OrderID and/or ProductID?  And why is *that* the column that should uniquely identify the row?  This seems to imply that you would allow multiple rows for the same order and product.  Even if the PK is elsewhere and these questions disappear, the value of this column is questionable at best - for no discernible gain whatsoever, you seem to be taking up an extra 4 bytes (plus space in any indexes that include the column, and then the additional I/O involved in maintaining those indexes).

Another example is a logging table.  Let's say you have a table that logs exceptions, or API calls, or other events over time.  You insert rows monotonically either in real time or from log files.  I often see this designed as:

CREATE TABLE dbo.ActivityLog
(
 
LogID INT IDENTITY(1,1PRIMARY KEY,
  
EventDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  
...
);

I think part of the knee-jerk reaction to add an IDENTITY column and make it the PK is that events can happen at the same time, so it makes it hard to define a real natural key.  I'll concede this point, as I have several log tables that do not have a PK or a unique constraint - but they certainly have a clustered index, and they do not have an IDENTITY column.  The purpose of this type of table is usually to maintain a history of activity and not to do a lot of searching for individual rows (and when you are searching for individual rows, you are not looking for a specific LogID, you are looking for details in other columns - say, part of an error message).

An interesting thing that came up in one of these cases was the need to purge older data.  So, the developer intended to have a 30-day running window, and once a day a job would wake up and would delete any rows from the log that were more than 30 days old.  Since the clustered index was on the IDENTITY column, their code basically did this:

  - determined the highest IDENTITY value more than 30 days old
  - assigned that to a variable
  - deleted from the table where IDENTITY < variable

With the clustered index moved to the DATETIME column, and the IDENTITY column disposed of, his code became a lot easier (simply a DELETE with a WHERE clause), and more efficient too, since it didn't have to perform a lookup first.

Now let's talk about a very similar case of a logging table where I think an IDENTITY column *is* useful, such as a logging table that keeps track of backup or index activity.

CREATE TABLE dbo.ActivityLog
(
  
LogID INT IDENTITY(1,1UNIQUE NONCLUSTERED,
  
EventType TINYINT-- assume some kind of FK
  
StartDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  
EndDate DATETIME NULL,
  
...
);

Now why is an IDENTITY column useful here?  Well, rather than logging the start and end events in different rows, keeping the data on a single row makes it much easier to consolidate the start and end of a specific event.  So the code would do something like:

DECLARE @LogID INT;

INSERT dbo.ActivityLog(EventTypeSELECT 1;

SELECT @LogID SCOPE_IDENTITY();

-- do other work

UPDATE dbo.ActivityLog
SET EndDate CURRENT_TIMESTAMP
WHERE LogID @LogID;

The unique constraint makes it slightly easier for the engine to locate the row to be updated, but the clustered index could still be placed on the StartDate and/or EndDate columns to provide useful searches on date ranges (with the same monotonically increasing property as the IDENTITY column), and a primary key could still be defined on other columns.  I am sure Celko is rolling his eyes right now, though he has probably not even read this far.*

So back to the basics, am I telling you to not use IDENTITY columns?  Absolutely (and hopefully obviously) not.  Just showing that before blindly applying an IDENTITY column to a table, think about what it means and why you are using it.

* P.S. I don't want to debate the merits of using an IDENTITY column or otherwise surrogate key (such as NEWID() or NEWSEQUENTIALID()), vs. using a natural primary key.  This comes up with Celko a lot - he is completely against any kind of system-generated identifier that can't be physically verified.  When asked a pointed question such as, "what do you think Barnes & Noble uses as the primary key to identify a customer or an order?" he disappears, never to appear in the thread again.  So while he is passionate about his convictions, he has no desire to defend them.

Published Monday, February 08, 2010 1:33 AM by AaronBertrand

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Greg Low said:

Hi Aaron,

"When are you ever going to need to reference rows in this table by the OrderDetailID, and not by the OrderID and/or ProductID?"

Easy example: if I tell you that if you buy 12 items, I'll give you two more at a discounted price, how are you going to show that with just an OrderID and a ProductID? This sort of thing happens all the time in real systems.

Regards,

Greg

February 8, 2010 12:41 AM
 

Twitter Trackbacks for Aaron Bertrand : Bad habits to kick : putting an IDENTITY column on every table [sqlblog.com] on Topsy.com said:

February 8, 2010 12:49 AM
 

AaronBertrand said:

Greg, there is a row for each OrderID / ProductID combination (as well as a quantity column, etc.).  So unless I misunderstood, the query to answer your question is simply:

SELECT COUNT(*) FROM dbo.OrderDetails WHERE OrderID = <x>;

What additional information would a system-assigned OrderDetailID provide that makes solving the problem you mention easier?  Surely you aren't suggesting that you subtract the difference between the highest OrderDetailID and lowest OrderDetailID and expect to get a reliable answer that indicates the count?

February 8, 2010 12:54 AM
 

AaronBertrand said:

(Or, if you're saying 12 items, not 12 *different* items):

SELECT SUM(Quantity) FROM dbo.OrderDetails WHERE OrderID = <x>;

February 8, 2010 12:59 AM
 

Uri Dimant said:

Hi Aaron

I met some developers who do not know/ doubt what columns should be PK and simple create an IDENTITY property to 'meet' business requirements. When I ask them what is the purpose of they say "to prevent page spilts". (like reading books --keep CI as small a possible and create CI on monotonous increasing column)

I agree with you that we do not have to create blindly an IDENTITY on every table , it really depends on business requirements /work load and many other things...

February 8, 2010 1:30 AM
 

Uri Dimant said:

Hi Aaron

I met some developers who do not know/ doubt what columns should be PK and simple create an IDENTITY property to 'meet' business requirements. When I ask them what is the purpose of they say "to prevent page spilts". (like reading books --keep CI as small a possible and create CI on monotonous increasing column)

I agree with you that we do not have to create blindly an IDENTITY on every table , it really depends on business requirements /work load and many other things...

February 8, 2010 1:30 AM
 

Uri Dimant said:

Hi Aaron

I met some developers who do not know/ doubt what columns should be PK and simple create an IDENTITY property to 'meet' business requirements. When I ask them what is the purpose of they say "to prevent page spilts". (like reading books --keep CI as small a possible and create CI on monotonous increasing column)

I agree with you that we do not have to create blindly an IDENTITY on every table , it really depends on business requirements /work load and many other things...

February 8, 2010 1:30 AM
 

RamiReddy said:

OrderDetails

(

OrderId int,

ProductId int,

Quantity int

)

1. Suppose, I wants to make Payments Per the Ordered Product instead of entire Order..  I might made the payments in installments alos..

in that case, i need a seperate table to store the payment details of that product.. so my billpayments table need a reference to the Orderdetails

table..

BillPayments  Table

(

OrderId int,

ProductId int

)

2. Suppose, assument that I ordered a book in an order with the quantity 50. Now assume,we might not able to send all  the books ordered

in one go due to some stock availablitity.. and we might send in installments again.. so to capture this info we need another table

   ProductsDistribution

   (

OrderId int,

ProductId int,

QuantitySent int

   )

 So, in this 2 situations, i created 4 columns in the 2 tables... if the orderdetails table has an identity column, i could have been reduce columns to 2..

 and my select queries also requires, less join conditions...

 We never sure about our future needs... if someday, some new requirement came, we should be in a position to extend that simply....

 I always go for identity column atleast in business related tables....

February 8, 2010 1:35 AM
 

AaronBertrand said:

Rami, in both of these cases I find it highly unlikely that I could justify another IDENTITY column to save 4 bytes.  (Note that you are only saving *one* column, even if *both* of these situations affect you, because you forgot to count the IDENTITY column that you added to the OrderDetails table, that doesn't exist in my version.  So you've saved 4 bytes in two tables, that are going to be <= number in the details table, and added 4 bytes in the other.)  I think I would *want* both the OrderID and ProductID information in the other tables, since that combination is not going to change, and I am just going to have to do a lookup to figure out the OrderID and/or ProductID anyway.

As for throwing an IDENTITY column on a table to future-proof, you can add an IDENTITY column at any time, so I don't think it's prudent to put one there "just in case."  Why not also put a GUID, CreatedDate, ModifiedDate, CreatedbySPID, ModifiedBySPID columns on every table, just in case business requirements change someday?  Further to that, I can't think of any cases where going back to an existing table and adding an IDENTITY column would suddenly solve some new business problem or requirement.

February 8, 2010 1:54 AM
 

AaronBertrand said:

Uri, I think your submit button got stuck again.

February 8, 2010 1:58 AM
 

uberVU - social comments said:

This post was mentioned on Twitter by aaronbertrand: [Blog] : Bad habits to kick : putting an IDENTITY column on every table : http://bit.ly/9zj7dJ

February 8, 2010 2:07 AM
 

Marzena said:

I agree that there are situations, where IDENTITY column is definitely not needed. But the example of Order and OrderDetails tables is IMHO not appropriate. May be this is more a philospical than a technical issue. I have contributed to two ERP systems - first of them did not allow multiple multiple rows per one product in a single order and it was definitely painful for the users.

The other one allowed it and both for the users (as was mentioned - different price, different tax etc.) and and the programmers benefitted from such architecture. Having a single identity column simplifies 90% queries and DB objects related to that table. My experience tells me, that discussion about correct PK to such table is just a waste of time - of course, you could probably find a better architecture from the storage size point of view. But the performance gain is unlikely to be as significant. Simplier SQLs (and moreover standardized approach accross the whole system, where all or almost all tables have the same PK architecture and the programmer does not need to waste time searching for correct ID combination of fields) is the benefit that I prefer. Sparing four or eight bytes of data in a row that total consumes at least 100 bytes is IMHO not a good reason to change the standard.

Also I do not agree that an architect can assume, that there will never be need for such column - our customers always find a way to surprise us by their requests ;-).

On the other hand there is at least one pattern, where special IDENTITY column is definitely not needed but very often is present - M:N relationship binding table with no other information columns.

CREATE TABLE Suppliers (Id IDENTITY...)

CREATE TABLE Products (Id IDENTITY...)

CREATE TABLE SuppliedProducts (

  Id IDENTITY - why???,

  SupplierId,

  ProductId)

February 8, 2010 4:41 AM
 

AaronBertrand said:

All right, so I guess my choice in examples was poor - I consider the IDENTITY column on OrderDetails as useful as the IDENTITY column on Marzena's SuppliedProducts table, but clearly others have found a use for it.  Hopefully the point of my post is not lost due to this detail (perhaps I should re-write it with a better example).

February 8, 2010 7:10 AM
 

Greg Linwood said:

Nice article & I agree, too many db designer throw identities around willy nilly, often without reason.

Why did you choose NONCLUSTERED in your ActivityLog example? I'd have thought that would have been a good candidate for clustering?

February 8, 2010 7:25 AM
 

AaronBertrand said:

Greg, I explained after the example:

"...but the clustered index could still be placed on the StartDate and/or EndDate columns to provide useful searches on date ranges..."

Now, if all you were doing was inserting and updating the rows, and never running any queries to find errors that happened, say, yesterday, or last Tuesday between 3 and 4 PM, and never cleaning up older data based on a point in time, then I would probably agree that the IDENTITY column could be clustered.

I've found that what you lose by choosing a non-clustered index for the lookup for single-row updates, you gain several times over when doing date range queries.  Of course it depends on how you are using the data, but for this use I've found that putting the clustered index on the datetime column is more useful in the long run, and since both are monotonically increasing, it doesn't change the impact of table growth, page contention etc.

February 8, 2010 8:13 AM
 

Brian Tkatch said:

"I have several log tables that do not have a PK or a unique constraint - but they certainly have a clustered index, and they do not have an IDENTITY column."

Well said!

I like this post, it illustrates a good point. I so have an issue with this post, however. The post comes from a position that:

1) IDENTITY should be used unless there is no point in using it.

2) IDENTITYs usefulness is defined by it's use.

I approach it differently. I separate TABLEs (mostly) into two groups:

1) Object

2) Relationship

Object TABLE define objects, such as Customer, Order, or Book. Each record is an instance of that object. Hence, the TABLE is named in the singular--it is not collection of its records--it defines what each record is. This requires a unique way of identifying each individual record. Because each individual record is another object.

The question is, how is this uniqueness defined. The answer is, by it being another record. That is, two records can have exactly the same information, and yet we have two records--two objects. They are unique only because they are separate, not because they are different. In a sense, the unique "id" is the internal identifier for that particular record.

However, the internal id is not exposed. So, another identifier is required. In comes IDENTITY. It is an exposed unique identifier, we not have a unique COLUMN.

That is all true if the data can be the same in two different records. What if they cannot, by data rule definition, be the same. Then there is already a unique identifier in the field. And there would be no reason to expose a new method via IDENTITY. That would be redundant, and mask the true uniqueness of the object.

The other type of TABLE is a relationship TABLE. That is, a TABLE the related two objects in the system, such as Order_Product that related Order and Product. In that case, usually only one instance of any particular join itself is allowed (the user can order each item only once in any given Order, otherwise it's just quantity that goes up), making a natural PK. If this uniqueness is not guaranteed, that TABLE is not a join of those objects, and likely not a relationship TABLE.

There are also logs and lists. But those aren't TABLEs, per se, they just happen to be stored in TABLEs for convenience, so the method used is indeed more based on use.

I'd suggest then, IDENTITY is to be used as a unique identifier of an instance of an object, when no other data rule guarantees uniqueness.

February 8, 2010 9:29 AM
 

mjswart said:

Very cool article Aaron, this query gives a list of unreferenced identity columns:

select t.name, c.name

from sys.tables t

join sys.columns c on c.object_id = t.object_id

left join sys.foreign_key_columns fkc

on fkc.referenced_object_id = c.object_id

and fkc.referenced_column_id = c.column_id

where fkc.constraint_object_id is null

and c.is_identity = 1

order by t.name, c.name

All that's left is to determine whether the application uses (or should use) these columns.

February 8, 2010 9:31 AM
 

Nick Canale said:

I really liked this article and the comments on this article. I'm one of those guys who adds an identity column on every table for the reason that Marzena described, so that developers know that that is the key and doesn't have to waste time searching. I haven't ran into any performance issues by doing this, even if it doesn't always make sense sometimes. :)

IMO you can still have a problem with this approach if 2 rows are exactly the same except their identity primary key. I don't think this should ever realistically happen.

February 8, 2010 10:21 AM
 

AaronBertrand said:

Nick, that's right, I think another thing that tends to get lost is when people put a PK constraint on a surrogate key they forget to put a unique constraint on the column(s) that make up the natural key.  (Well, forget, or just don't think it's necessary.)

February 8, 2010 10:41 AM
 

Paul Nielsen said:

Another pro for using a surrogate keys is taht a database-wide consistent PK enables building database-wide systems (such as audit, temporal updates, data policy checks, etc). I’ve found this to be very powerful when combined with code-gen techniques.

February 8, 2010 1:10 PM
 

AaronBertrand said:

So Paul, you would advocate using an IDENTITY column for code generation purposes even on the relational table that Marzena described in a comment earlier today?

February 8, 2010 1:16 PM
 

Paul Nielsen said:

Hi Aaron,

Yes, for the purposes of adding common utilites/system to the tables. But where there is suitable natural key I would also add a unique constraint/index on that natural key. I think the greatest con to identity columns is that developers assume that the identity col satifies the need for a PK and they fail to add a correct unique constraint whcih allows duplicate rows (different ID but all other data identical.)

to summerize: It's almost as if the surrogate key is there for internal system purposes (FKs and Utilties) and the unique constraint is there for relational table design (entity integrity) purposes.

and, conposite primary keys is in my list of top ten most evil database design and development practices.

February 8, 2010 1:55 PM
 

Aaron Bertrand said:

In my last post in this series , I talked about the common habit of creating an IDENTITY column on every

February 8, 2010 10:02 PM
 

AmosFiveSix said:

We use a large system (NOT designed by our company ;-) That is setup the way your describe with an OrderDetails table with an OrderDetailID. There are different types of products and for each type we need to collect different information when an order is placed for that product. Like guest information for a hotel registration. Those bits are in separate tables so we only add a record when someone orders that type of product, and those are linked by the OrderDetailID.

February 9, 2010 9:39 AM
 

Stephen Horne (Bluedog67) said:

I lean towards using IDENTITY columns as primary keys but not in all cases. Some examples where I think an IDENTITY column as PK is excessive include the following: ZIP Code table, State table (the USPS 2 character code is sufficent for PK), a Fiscal Year table (if like some companies you have a 52 week FY - CISCO), anywhere you have a 1:0/1 relationship (even if main table has IDENTITY PK the supplementary table only needs a non-IDENTITY int column as a PK and is FK back to main table - automatically get the 1:0 and 1:1 behavior. An IDENTITY column on 2nd table is excessive and unnecessary).

I believe strongly that everywhere a developer uses an IDENTITY column as the PK in a table, the Natural Key must be determined and a unique constraint defined on the NK column(s). In all likelihood a NK exists - make it unique either as PK or with unique constraint. If an NK cannot be found then this probably indicates a problem in the design or this fact needs to be well documented. Even simple lookup tables that are just an IDENTITY PK column and a Name/Description - the Name/Description should be uniquely constrained - if not your database is incomplete.

February 9, 2010 9:22 PM
 

Lee Everest, M.S. said:

"When are you ever going to need to reference rows in this table by the OrderDetailID, and not by the OrderID and/or ProductID?"

A: Only when you need to use the order of entry

"Easy example: if I tell you that if you buy 12 items, I'll give you two more at a discounted price, how are you going to show that with just an OrderID and a ProductID?"

A: Good point.  Requires additional information in the record.  Should add a column to note the special order type. Then you add use orderdate and ordertype as the key.

Great topic for discussion, Aaron.  Made a few comments of my own on your great entry... http://www.texastoo.com/post/2010/02/09/IDENTITY-property-and-Primary-Keys.aspx

Thanks,

Lee

February 10, 2010 10:10 AM
 

Mark Stacey said:

I always use A) an ID column that is an Identity and B) an Active column that is an int(not bit? LONG set of reasons. Sometimes it's bit. But by default int)

And on the tablet tables, I have a [Key] column as well.

The reason being, I build Occasionally connected applications : ensuring that I consistently ALWAYS have a column with the same name to uniquely identify every row in every table makes coding my sync framework (yes, I use MS Sync as a backbone) much easier, and in many cases faster as well.

Order details? Well, yes. There too.

As for having multiple products of the same product for the same order....

There is a particular data model that is common, and can lead to ISSUES.

The most common way to implement this is with a quantity.

So here we have an operational system where if someone returns one of a pair of products because it is faulty, we cannot uniquely identify which one it is.  Can be coded round, but it's an instance that needs upfront thought.

March 1, 2010 7:14 AM
 

telcogod said:

I would use identity column on the orderdetail table as the primary key clustered index because it should never become fragmented.  this table should never have deletes and so should always be sequential.  A composite key would not be, and would need lots of maintenance.  And with the online option available only for clustered indexes on tables that do not have a lob, rebuilding the clustered index on the table if it had a varchar(max), etc., would impact 24*7 uptime.

March 1, 2010 7:19 AM
 

stan lewis said:

I think you have missed the most important point here; the use of an identity key removes the proper primary key identity which itself prevents insertion of duplicate key values and naturally organises the table data. of course you can add the real key columns as secondary uniquing key but this is generally not done even by IT professionals and thus duplicate data creeps into these mis analysed tables. then databases get a bad name because if they do not prevent duplicate data then what use are they? In the case given above, an order is naturally a generated row and therefore to use a rising sequential number is easily the best method of creating a unique id. This example is a red herring.

March 1, 2010 10:50 AM
 

Aaron Bertrand said:

Mark, if two chocolate bars can be uniquely identified (and should be), then I would argue that they should each be represented on their own line.  If one has been returned then that seems like information that is stored elsewhere anyway... the original order details should remain intact somewhere.

If two chocolate bars can't be uniquely identified, and one gets returned, then why isn't Quantity -= 1 sufficient?

March 1, 2010 5:35 PM
 

Paul K said:

I often purchase the same book (same order and product) but want to send them to different addresses. Wouldn't supporting that require an OrderDetailID? Maybe not.

I note what Brian said about not needing an Identity column on a many-to-many relationship table. That's a case where I tend to use the Identity and probably don't need it.

Another (too obvious) case where Identity is not needed and we avoid it is when the incoming data has a field that is essentially an Identity column anyway. But what if the developers working on the source system read this column and decide to get rid of the Identity column? Ouch.

Another case where we manage to not use Identity columns is when the table itself is so small, say <= 3 columns. That's when the "extra 4 bytes" starts to sound like a big problem to me.

But our log table's LogID is used in other tables (holds generated SQL strings and it is not 1 to 1) and I tend to search or order by LogID all the time. Can't order by Date because messages might have the same date and I want to be sure of the ordering.

March 1, 2010 5:49 PM
 

Aaron Bertrand said:

Telcogod, if the OrderID is already sequential, then you're not gaining much by adding an identity column.  Now, I've already admitted that my example wasn't the best one.  What about a many-to-many join table?  You'd put a clustered identity on that one too?  Why?  

March 1, 2010 6:13 PM
 

Aaron Bertrand said:

Stan, I agree that is a problem as well.  I am trying to publish this series in small, distinguishable units.  You can treat slapping an identity column on every table separately from picking the wrong pk.  I could treat every single bad habit in one post but it wouldn't be very consumable... and it would take so long to write that by the time I got to the end my opinions about the beginning may have changed.  :-/

March 1, 2010 6:18 PM
 

Brian Tkatch said:

"I often purchase the same book (same order and product) but want to send them to different addresses. Wouldn't supporting that require an OrderDetailID? Maybe not."

Paul, good one! I love it when someone shows reality is not as we think or want. :)

It doesn't matter which is shipped to which address. As long as one goes here and one goes there. So, add ship-to-address to the order detail-table.

Ship-to: Id (PK), Address...

Order: Id (PK), Customer

Order_Detail: Order, Product, Quantity, Ship-To

PK(Order, Product)

To enforce the ship-to matches the customer, just pull the customer_id down:

Ship-to: Id (PK), Address...

Order: Id (PK), Customer

Order_Detail: Order(PK), Customer(PK), Product, Quantity, Ship-To

PK(Order, Product)

FK(Order, Customer) to Order

FK(Customer, Ship-To) to Ship-to.

March 2, 2010 8:51 AM
 

AjarnMark said:

Aaron, I particularly LIKE your example with the Order Detail table because it made me think.  I probably would have put the IDENTITY on there in the past, but now I'll have to give it more thought.  One thing that might influence my decision is whether the ProductID really uniquely identifies a particular physical product, or is there more?  For example, is a sweater that comes in blue and red two different product IDs or is it one ID with additional defining information (e.g. color). Then you have to consider the multiple columns to define uniqueness issue and make an informed decision on what approach to take.  And therefore your post was successful because it made me think instead of respond out of habit.

Regarding your first Log table example, again it makes me think.  I probably would have put the IDENTITY on there, but you make a convincing argument that it is not necessary.  Although I have to admit to a somewhat visceral reaction to the idea of not having a PK on a table, even though you still have a CI.

March 5, 2010 1:20 PM
 

AaronBertrand said:

>>

Although I have to admit to a somewhat visceral reaction to the idea of not having a PK on a table, even though you still have a CI.

>>

I am sure in some cases there would be a good candidate key, though I have seen several cases where there is none.  Consider an ad serving platform where an ad can be served from multiple servers at the same time, and in a particular log there is nothing to uniquely identify them.  You don't have the ability to pre-aggregate the count (since it is coming from multiple servers) so you record each event individually.  I don't think that the requirement of a PK is so overwhelming that throwing an IDENTITY PK solves anything.  It is basically having a PK for the sake of being able to say "this table has a PK."  Now how do you answer the question, "what do you *use* the PK for?"  YMMV.

March 5, 2010 1:30 PM
 

AjarnMark said:

I agree Aaron...another area for me to do a little more thinking and introspection.  It's good to occasionally ask ourselves WHY we believe the things we believe.  Thanks for the prodding.

March 5, 2010 6:23 PM
 

An update of my SQL Server “Bad Habits” series « OTO One to One Interactive said:

March 8, 2010 11:31 PM
 

free blogs said:

A word about home canning and food preserving: DO YOUR HOMEWORK. Nobody wants to end up in the hospital or worse, dead, over food that’ s not preserved correctly. This sounds like it should be common sense, but this summer I had to convince a friend to

May 2, 2010 3:01 AM
 

free blog hosting no ads said:

Kite Aerial Photography- Puts Your Eye in the Sky. To take pictures from a kite, you need three things: a kite, a camera, and a special rig that attaches the camera to the kiteline and activates the shutter button on the camera. Here\'s how to do it.

June 1, 2010 4:53 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About AaronBertrand

...about me...

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement