Early in 2007, I wrote an article for Simple-Talk
called Ten Common Database Design Mistakes
, and it has been my most successful bit of writing every with 333994 views at last counting. Anything where you point out the negatives of what other people are doing is going to be polarizing, and this was no different. Recently a reader posted a comment on my blog Why I love to (and am starting again) to write about design
, and rather than hijacking that topic I figured I would push it out to a full topic and see if I could start me up a flame war in the SQL Blog comments section for this post.
My particular point of view is that every domain deserves its own table because every table should have a single purpose both for the logical (requirements) and physical (implementation) parts of a design.
- Using the data in a query is much easier, as you join naturally to tables, not subsets of one big table
- Data can be validated using foreign key constraints very naturally, something not feasible if you have one table, perhaps named “genericDomain”. It isn’t that it isn’t totally possible, it is just messy to use constraints
- All of the smaller domain tables will fit on a single page of disk. This ensures a single read (and likely a single page in cache). If the other case, you might have your domain table spread across many pages, unless you cluster on the referring table name, which then could cause it to be more costly to use a non-clustered index if you have many values.
- You can still have one editor for all rows, as most domain tables will likely have the same base structure/usage. And while you would lose the ability to query all domain values in one query easily, why would you want to? (A union query could easily be created of the tables easily if needed, but this would seem an unlikely need.)
While I am less enthusiastic about the third entry at this point, I still stand behind everything said. The counter point was provided by Prakash Nadkarni on his article here: http://ycmi.med.yale.edu/nadkarni/domainlists.htm. To summarize, his reasons for desiring the single table approach are:
- Efficiency Issues, from a management standpoint. Having more tables makes more work for the modeler, and the tables really don’t merit inclusion on the model because they are the equivalent to metadata
- The consolidation of domain-value tables simplifies the creation of user-interface generation code, or the use of generic routines to present these values in the interface. – Basically that the coder has the single place to go to get domains from, so code generation of domain values is easier
- It simplifies the administration of content maintenance greatly - if you're a DBA, are the number of DDL statements that you write reflected in your pay or your level of importance, or do you care about getting the job done efficaciously? (Admittedly I am not a Yale man, so I had to look up that word).
- To sum it up, he says:
All three issues above emphasize the importance of looking at the big picture. The database schema is not only the Database Administrator's concern. The database exists to meet a business need, and DBA has to partner with applications developers and power users/content experts, and ensure that appropriate schema design facilitates productive software development and metadata-content maintenance.
He makes some good points, to be sure, but honestly I still must almost completely disagree. First off, using a little bit of metadata, you can counteract all of these issues and not cause a few new ones. For example, consider the efficiency issue. When I drop a domain table in my model I know that it is implemented correctly, will be used, etc. An FK and PK/AK is easily implemented as desired, and I can choose a key style (surrogate, either guid or integer, or natural string code) that makes sense for the situation. With the single table method, that domain never shows up in the model. Using a proper modeling tool, I have a view that shows domain tables and one that doesn’t (a bit of coloring helps out too :) This also makes the use of domains a declarative operation. Need to reuse the same domain? Simply add a relationship to the domain from a different column.
The generation of code is just a lazy factor really. If you have 100 tables with the same pattern of creation, it isn’t rocket science to implement a solution that allows creation of an editor to manage the individual tables just as they were one. It isn’t “simple”, but a proper editor with security built in to manage what user can modify what rows is ugly (avoid row level security at all costs is one of my primary tenets of modeling…row level security is doable, but often it just shows a lacking in the model.) Actually this is one of the places where a simple set of stored procedures, a bit of dynamic SQL, and a few extended properties to identify the domain style tables could encapsulate the whole thing so the app wouldn’t even realize that there are > 1 tables.
Finally as a data architect/developer, I look for solutions that make my future self happy. It is certainly more work up front to build the model with 100s of domain tables, no doubt. If I just want to leave things up to the developer to manage, just creating every domain as an integer is easier. But I don’t like leaving data integrity up to the application, even when I do the coding. Having a foreign key constraint that easily validates that the value chosen is in the desired range is the complete win. When I use the data, it is going to be right, and that is all that matters to me.
Look, if it turned out that people had to dynamically add domain columns to the model (perhaps to implement an EAV pattern), and one of the types was “named range” or some such, then this single domain set would be an interesting solution. But this is almost never the case in relational databases. We add a column as design decision. We choose a domain table as a design decision, etc. And when I add a new domain table, there is a testing benefit too. I am 100% sure I haven’t introduced regression bugs because someone coded a use of the domain that didn’t work right.
In the end, I will always choose more work upfront for less work later as long as it doesn’t impact the user.