This one should be simple to anyone who sees it (once I decode what I mean by fundamentally… and sound…by then for sure!) In the initial post I defined this as – fundamental rules enforced such that you don’t have to check datatypes, base domains, relationships, etc. The gist here is that you at a minimum don’t have to spend all of your time validating what has been saved.
As an analogy, consider the first car your mom and dad (or mom or dad or grandparent or even yourself, so as not to be offensive to anyone!) purchased. If it was like my first car, my father made sure that it was fundamentally a safe vehicle, but that was pretty much all you could say about it. It didn’t have all of the bells and whistles (unlike my car now that has lots of whistles) but it had an engine that was solid and managed, a steering wheel that turned, and brakes that stopped the car. Near the end of my ownership of the car, it had a set of pliers for a gear shift, but at that point the car was losing the fundamentally sound “zone”.
So why does this matter? There is a large number of non-db programmer types who would prefer it if the database was treated as nothing more than a data storage device with some search capabilities built in. SQL Server is happy to serve in this capacity, but what a waste of resources. Why?
- Data has to be validated to be used– ETL type processes are all to often ETCCCCCL processes, where instead of Extract, Transform, and Load, there is a bunch of Clean, Clean, Clean processes tossed in. The problem is that no matter how good you are, without using at least foreign key constraints and transactions, you will end up with less than pleasant data once users and the reality of concurrent data access gets a hold of your data.
- Some forms of validation are very difficult to do outside of the data layer – starting with foreign key constraints, this list is actually more large than you might imagine. For example, any validation where another table or multiple rows is involved, like a check constraint that check the cardinality of a relationship. Another example is uniqueness. Uniqueness enforcement should always be done in a UNIQUE constraint. Worst case you are going to end up with a unique index that is very helpful to the query processor.
- Loading data is far safer – If you rely completely on a layer to manage the integrity of the data, what happens when the ETL layer is pointed AT your data? Do you have to recode all of the rules? Of do you just hope that the rules are all followed? Unfortunately hope springs eternal in a lot of situations where it was not such a dandy idea, and the UI starts failing because of unknown data rules.
- Data in a nullable column ought to allow nulls – Nothing quite cheeses my cracker more than trying to set up a scenario and I leave all data nullable and it crashes the app. So you start filling in data as you figure it out (hopefully the app tells you what column it is stumbling on) until you finally get there. If the app requires data, then the database should too.
Look, In the early days of client-server programming, we tried using the data layer as the be-all and end-all layer of data programming, and it stank. Our UIs were as bad as a lot of web pages are now. No validation, just let a different layer handle it and report errors after the user had entered lots of data. I am not suggesting that. And be clear about something else too. This is NOT about stored procedures. Stored procedures are part of the data access layer, not the data layer. Whether or not you favor procs as a data access layer is immaterial to this discussion. I am talking about the fundamentals that can be done with or without a TSQL implemented data access layer.
- Theoretical data ranges – Salary greater than 0, Temperature for the weather forecast within a range that could actually occur, etc.
- Data types – don’t just use sql_variant for every column, leaving the user to deal with it at runtime. Pick the type that actually represents what you are modeling.
- Data Length – And in the same vein as data type, choose a length that is reasonable. Too many databases have varchar(50) for all string types, unless it is text. Come on, think about it and set the max limit if there is one. Don’t use an “unlimited” type and only give the user 50 characters. And don’t do the opposite either. Varchar(50) with 100 characters on the UI. Unless your testers are excellent (or psychotic) you won’t find that out until much later
- Uniqueness – Already mentioned. Don’t leave that to the UI, you are going to need an index in any case to do this well.
- Relationships – Foreign key constraints rule and take no thought.
And implement as much of this on the UI as needed also. Best case you generate the code from both from a rule repository. Don’t force the poor user to type data into 100 columns and then find out that the need another piece of information in a proper format, only have the form timed out when they come back to it. Unless you hate your users, in which case you are excused as long as one of my friends isn’t stuck using your evil UI…then it just isn’t cool.