I guess eventually I had to admit that performance matters. Whenever I speak, and in all of my books, I try to stress over and over that performance is NOT the only thing. The fact is all too much time is spent trying to make database applications run faster when the real goal should be to architect data structures that solve the problems of the user in a natural way with proper integrity.
But performance is important. Extremely important. Like a fancy sports car, it is the first second thing that a user will notice about the system…The first will be the UI (the paint job).
The problem is that it isn’t the most important thing. Just like a sports car, the most important thing is that everything works. If the car runs 100 miles an hour in first gear, it doesn’t matter if the seats aren’t bolted in and the steering wheel works backwards. I initially defined well performing as “Gives you answers fast”, and in the context of performance, that is true, but in reality, that is a very simplistic attitude. Certainly taken out of context, this is definitely NOT a good explanation of well performing. Maybe: “Does all operations required in a minimal amount of time in consideration of the realities of physics?”
When considering performance, there are many facets to the problem:
- Initial Database design – Sometimes the reality of the problem to be solved cannot be done quickly, but often it is just poorly designed data structures that get in the way. Designing the database with the concepts of Normalization close at hand is the first step in getting things right. The relational engine of SQL Server is named relational because it likes data to be formed in a relational manner. The concepts of Normalization are the backbone of relational design (sorry, I have to make mention of Normalization every time I blog, I think).
- Concurrency – Hey, if all I ever had to work on were single user systems, life would be easy. But the fact is, on the main database system I work with, there are FAR more than one user. And most of the “users” aren’t humans (no, they aren’t zombies either) but rather machines, reading in data from mail, internet, and phone calls and processing the data. Actual human beings work slowly compared to what a computer can pump in data. And to make sure that the data isn’t complete rubbish by the time the humans actually look at it, we have to use locks, and locks slow things down…But the people want to feel like they come first…it is a difficult task, but good design and decent throttling mechanisms can be used to make it happen. (And don’t get me started on all of the moving parts, disk, memory, etc, etc. Beyond data level locks, you have hardware/resource locks called latches that make certain that the laws of physics are honored.)
- Indexing – The first thing most people think of when they are adjusting performance, and for a good reason. An index can turn a full table scan operation into a single row operation, simply by adding a primary or unique constraint (usually part of the logical database design) or any other unique or non-unique indexes that may be needed for the queries. But indexes aren’t free, and you have to be careful not to put useless and irrelevant indexes on “just in case.”
- Hardware – SQL Server can be run on simple cheap hardware, and for most situations you wouldn’t notice if it was run on a server that could easily be a file server. Unfortunately, the problem is that as you start to need “real” power, the hardware configuration cannot be so…simplistic. Adding CPU power and RAM is simple, but the worst part of the process is disks. Until solid state drives really hit it big (and even afterwards in many ways) disk drive speed and redundancy is big in making your server run fast.
- Good code that accesses your data – No matter how well your database is designed, if you code in a crappy manner, using loops where queries would work better, lots of unnecessary temp tables, poorly formatted search arguments like WHERE datediff(day, columnName, getdate()) > 1 (would not use an index on columnName), you are hosed. If you have the foresight to use stored procedures, you can go back and tune later in the process as you need to (in case you get stuck with programmers who aren’t really relational “yet”), or at least have an architecture where you can adjust the code being executed from your app, you can fix poorly performing code…if not, well, good luck (and get ready to throw away a lot of bucks on hardware.)
- Adjusted Database Design – sometimes you just can’t use the design you have, and as a good designer/architect, you have to be big enough to admit that. Denormalization can be a solution (especially if you are trying to optimize reports…) but often you just have to go in and adjust the design in other ways. The better your code/encapsulation layer, the easier it is to do (it might take a bit longer, but it will be straightforward/safe).
- Etc – Network speed, faulty hardware, poorly performing applications…I could go on for days, but I won’t. The fact is, there are many problems that can make performance bad, and sometimes they are database related, and sometimes not.
I am not numb to the fact that performance is an issue. It is really a major pain point, because it changes so much with all of the aforementioned factors, but also because it is negotiable. Data integrity is not a negotiable factor. You can’t say “bah!” to users because they complain that “their numbers aren’t adding up”, but “the server seems kind of slow” can easily be written off as whining (mostly because it usually is.) The problem is, whether your company admits it to itself or not, time is money, and if a user has to do a task 200 times a day and performance issues makes it take 10 seconds instead of 4 seconds, it doesn’t seem like much… but 1200 seconds is not a small amount of time in the least. And if there are 100 people doing these same repetitive tasks…that’s a lot of time (and no uncommon in say, a call center.)
So what’s to be done? Planning, testing, having proper environments for testing performance are great things but to do it right you will probably need to double your spending on hardware to have a production and a “pre” production environment to test out your designs under load. Most smaller companies (particularly if the hardware isn’t the backbone of what they do) don’t have that. Production is where the load is tested, and they hope for the best.
Well stop hoping for the best, and design for great performance. Whether you can test performance or not, you can still be prepared by building your database for performance and just as important, for tuning in the easiest manner.