THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

The Rambling DBA: Jonathan Kehayias

The random ramblings and rantings of frazzled SQL Server DBA

RANT: Got Performance Problems? Buy bigger hardware!

Ok, so the title of this thread is very tongue in cheek, or in layman's terms a bunch of crap.  However, it is the most common answer I seem to get from application vendors these days when dealing with performance issues, despite the fact that I can point out a dozen reasons why their application design/code is the problem.  As I tell my end users all the time, I am sure that bigger hardware will hide the problem, but shouldn't we expect an application vendor to fix their junk design/code rather than throwing more hardware at the problem?

In my own experience in database and application design I have hit up against more than a few problem spots where performance problems could have easily been fixed by adding bigger hardware.  However, when I started with SQL Server, hardware was still quite expensive, and for this reason, all attempts were made to resolve the problem by refactoring database design, indexing strategies, and/or application code to optimize the code well ahead of considering the purchase of new hardware.  Lately with declining hardware costs, the lower cost of 64 bit computing, and the scalability of hardware, I've seen this dynamic change both with vendors and with home grown applications. 

It seems as if the talent pool for writing highly performant code is being drained by the ease of availability of high power hardware.  In my own personal experiences, I recently purchased a home test server for under $1K that had 4-3Ghz Xeon processors and 16GB RAM.  For that kind of cost, I have to personally admit that adding hardware can be a much cheaper alternative to spending the time to learn how to solve the code problems.  However, personal experience has taught me that hardware can only get you so far, and eventually, the cost to scale up hardware exceeds the cost to fix the actual problem.

I experienced this recently in working on the Extended Events Addin for SSMS.  One of my pain points in the original application that really came to light when I ported it over to an SSMS Addin was the time it could take to query data from the asynchronous_file_target.  My first design of the Target Data Viewer was shortsighted in that it wasn't designed to handle the amount of data that could be stored in the file target, but was instead built for the memory resident targets.  The problem I ran into was that the UI would hang while reading the data.  This was easily fixed by refactoring the code to use a BackgroundWorker, but a new problem popped up, a nasty OutOfMemory exception when reading data that on disk was under 100MB in size.

To put things into perspective, I have a Duo-Core 4GB RAM laptop that was running Visual Studio 2008, and SSMS in debug mode for development purposes only.  The problem was that as I read the data, SSMS would eventually consume over 2GB or memory, exceeding the usermode VAS limit for a 32 bit system and exception out, so a bit of debugging was required to figure out what exactly was happening.  Admittedly, I am not a .NET expert, and this took a bit of time to figure out, but after a few days of work, I was able to refactor the code so that the same file data that was causing a OOM exception now requires less than 50MB of memory.  Even better, it lead me to understand better how to refactor other code, so that the impact of the SSMS Addin was almost negligible when performing most operations.

How does this relate to SQL Server?  Well, a number of ways actually.  I am not necessarily the best at performance tuning, but I can certainly hold my own, and I've helped a number of people/companies solve problems with bad performing code.  Finding ways to improve performance happens to be a personal love of mine though I rarely get to do it these days.  Understanding design patterns in SQL Server is just as important as understanding design patterns in object oriented code.  In general when I get to look at badly performing code, I often find the same problems, and I have a short list of things I look for up front before I dive into more complex troubleshooting.  Correlated subqueries, user defined functions that perform data access, table valued parameters holding thousands of rows of data, and functions on the column side of a predicate in a query seem to be the most common killers to TSQL performance that I've seen, though there are other problems I see as well.

So what exactly did I expect to accomplish with this blog post, well, not all that much really, except to make you think before you scale up hardware to solve a performance problem.  If you are an ISV, this might be a quick fix, but eventually you are going to meet the customer that will stress your application beyond what is reasonable for hardware and associated software costs.

Published Tuesday, August 04, 2009 3:32 AM by Jonathan Kehayias
Filed under:

Comments

 

AaronBertrand said:

I have a similar problem with one of my vendors.  After demonstrating that their code simply wasn't able to scale, they are now adamant that we must replace all of our dual-core, x86, 4GB machines in our data center with dual (or more!) quad-core, x64, 16GB machines in order to squeeze any more performance out of their stuff.  These are application servers, mind you.  So rather than even investigate the shortcomings in their code, we should spend $40,000 (or more!) to defer that work for them.  My concern of course is that, since they've clearly never tested this level of scale, the new hardware won't allow their application to do much or any better ... they honestly believe that all of their performance limitations are due to old, crappy hardware.

August 4, 2009 7:38 AM
 

BlackWasp said:

I had mixed feelings when I read this post.

On the one hand, I understand that business wants return on investment. If a software vendor produces non-performant code I have two choices (or actually a range between these two choices). I can have them improve the performance or I can get bigger kit. If the new hardware costs the no more to run and maintain that the existing, and improves the speed of the application by 100% (come on, let's pretend!) then I have instant satisfaction. If going back to the vendor means development that I have to pay for, I may still be able to get satisfaction pretty quickly. So now it comes down to money. If the redevelopment costs £20,000 and the tin costs £10,000 we have an answer - buy the tin.

On the other hand, as a developer I like to think that my code gives good performance. I say "good" rather than "the best possible" because sometimes it isn't viable to optimise every last detail. It's better to get rid of anything that gives poor performance and ship software than spend the rest of your life optimising whilst the competition corners the market. So, I develop on the best machine I can and test on an old 1GHz laptop with 768MB RAM. If it is quick enough on that, it should be good on anything remotely approaching modernity!

August 4, 2009 7:52 AM
 

Jonathan Kehayias said:

>> If the redevelopment costs £20,000 and the tin costs £10,000 we have an answer - buy the tin.

If this is an internally developed application maybe, but not if this is a major application provided by a vendor that cost a fair amount of money.  We wouldn't be the only customer that has problems and in the grand scheme of things, the vendor should invest the £20,000 rather than telling their customers to lay out an additional £10,000 because their application has problems.

>> So, I develop on the best machine I can and test on an old 1GHz laptop with 768MB RAM. If it is quick enough on that, it should be good on anything remotely approaching modernity!

I think that this kind of thinking is part of the scalability problem.  Unless you are actually throwing production level load at an the application in a simulated real world environment you don't actually know how the application will work.  For example, twice I have done consulting for performance problems where the exact statement I got walking in the door was, "The database worked great for the first few months, but the longer it runs the slower it gets."  

The problem in both cases was that all testing was done against a 100MB database with only a few rows of data in each of the tables rather than against a simulated production copy of the expected database size.  It worked great on their "1GHz laptop with 768MB RAM" but it chokes under a production workload on a 4way 16GB RAM SQL Server because the database size is 30GB and there are no supporting index structures, so everything is resulting in a table scans.

August 4, 2009 8:36 AM
 

noeldr said:

I also have mixed feelings.

This is a "business decision" and for the most part is not a technical one.

I do understand that as professionals we want well designed and balanced systems but the bottom line always comes first! If you don't get a perfect system but it is good enough for the task by just adding tin you should probably do that if there is a good cost/benefit ratio.

August 4, 2009 9:21 AM
 

Denis Gobo said:

I hear you and this is a common occurance  :-)

I developed on machines that were faster than the production machine, I remember specifically one  instance where the 'prod' SQL server was a 700mz PIII box with 1GB of RAM. I had at least 2GB of RAM and I believe 2GZ CPU. The customer would restart this box every so often because there was a 'memory leak' in their code and SQL consumed all the memory...sigh....

On stack overflow someone had this gem:

I do not believe there is a problem with the create trigger statement itself. The create trigger statement was successful and quick in a test environment, and the trigger works correctly when rows are inserted/updated to the table. Although when I created the trigger on the test database there was no load on the table and it had considerably less rows, which is different than on the live/production database (100 vs. 13,000,000+).

Of course you can't do stuff like this...Your testbed has to have the same volume of data as on production in order to simulate normal usage.. See also here: http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/your-testbed-has-to-have-the-same-volume

August 4, 2009 9:37 AM
 

Wesley Brown said:

I always run into the "throw more hardware at it" solution. They always assume just because it runs like crap on the current setup that if we double the ram, CPU and disks it will run twice as fast. If the code was fixed you could stay on a 3 to 5 year hardware cycle. Instead I see an arms race where hardware is purchased every few months to offset crappy code with diminishing returns.

Sometimes is bad queries but most of the time I see bad schema. They ran just fine at smaller database sizes but have no hope of scaling. You can do all the fun query tricks you want if the schema is fundamentally broken all the hardware in the world won't fix it.

But, to play devils advocate...

I have seen situations where you knew your database would never be bigger than XX gigabytes, it will never see more than XX number of transactions and you have fixed all the low hanging fruit as far as query performance, sizing the hardware to the database can be the cheaper alternative to a rewrite of the app and underlying schema.

Sometimes, you just don't have to scale or you scale so slowly that hardware can purchases make more since than the capitol outlay of a rewrite.  

August 4, 2009 9:46 AM
 

Jack Corbett said:

>> If the redevelopment costs £20,000 and the tin costs £10,000 we have an answer - buy the tin.

Yeah, but how often are you going to have t upgrade the tin vs. how often do you need to re-write the code.

To me this is another reason to encourage stored procedure use.  I can tune the code in the stored procedure without having to touch the compiled app code vs. having to re-compile and release and upgrade to the app if I'm using embedded SQL.

August 4, 2009 10:00 AM
 

David Stein said:

I deal with this problem on a regular basis with one of my vendors.  They make healthy profits from selling customizations to certain customers as "enhancements" and then roll that code into future versions for general consumption.  

They finance a lot of their software development that way.  It's highly frustrating.

August 4, 2009 10:26 AM
 

BlackWasp said:

@Jonathon

>>If this is an internally developed application maybe...

Absolutely the authors of the software should improve it. However, as a user with a business that has to make decisions based at least partially on cost, getting the tin in _some_ cases is better. I think my point still stands. If the vendor is going to charge me £20,000 for the software upgrades and I get the same benefit from hardware upgrades for £10,000 then why would I sign the larger cheque? If it is a major application from a major vendor with lots of customers then it may be difficult to get the vendor to make the changes you want anyway. At the extreme, try getting Microsoft to make CRM quicker (please). Life is too short and I would spend the money on kit.

>> I think that this kind of thinking is part of the scalability problem.

I disagree. Running your software on low-end kit gives you a very good idea of the performance that your low-end users will get. Where I do agree with you is that you can't test with less data / load / users, just because you are on that slower platform. Nobody should release software that has not been thoroughly and appropriately load-tested. You misconstrue my meaning here.

I could turn the argument around. Let's say I test on the 4way 16GB RAM SQL Server and everything is fine with the 30GB of data that you cite. When my next customer complains that it doesn't run on the 1GHz laptop with 768MB RAM, what are you going to say to them? "Buy bigger hardware?".

Now, these are extremes and it is unlikely that the same software would be targetted at these same machines but the point holds I think.

August 4, 2009 12:33 PM
 

AaronBertrand said:

>> £20,000 for the software upgrades and I get the same benefit from hardware upgrades for £10,000 then why would I sign the larger cheque?

Because fixing the code is more long-term, whereas throwing more hardware at the problem is just a temporary solution serving only to delay the inevitable... the bad code will take over the better hardware at some point, too.

August 4, 2009 1:03 PM
 

Jonathan Kehayias said:

>> If the vendor is going to charge me £20,000 for the software

>> upgrades and I get the same benefit from hardware upgrades for

>> £10,000 then why would I sign the larger cheque? If it is a major

>> application from a major vendor with lots of customers then it may

>> be difficult to get the vendor to make the changes you want anyway.

I wouldn't pay a vendor to fix their broken code, I'd change vendors at that point personally.  This is actually what my recommendation is currently, send a RFP out to the competition. You reach a point of throwing good money after bad even in making a major hardware purchase.  Lets say that you make the investment in the hardware and in 6 months you have performance problems again do you scale up the hardware once again?  I've seen the requirements change twice for this one application, each time doubling the size of the recommended configuration.

>> You misconstrue my meaning here.

That wasn't intentional, but you did catch on to the point I was trying to make.  

>> I could turn the argument around.

Yes, you certainly could.  My argument wasn't about the hardware that you were actually using, but the assumption that it should run on any modern hardware.  In most cases where small hardware is being used for development/testing, the testing itself doesn't meet real world scenarios.  If you are developing a single user local only application that is one thing, but if you are developing a multi-user web application with multi-tier services, you can't test a legitimate production load on the the hardware you listed.  

That being said I think that a vendor should know their application enough to make a proper recommendation configuration based on the environment being implemented.  There shouldn't be a blanket supported configuration of 4CPU 16GB RAM for all installations.  If they had done proper testing they should know the benchmarks for X users + Y database size requires Z hardware configuration.  This kind of information would negate my entire argument here because they can show benchmarks for where the server sizes would need to be in the future as well.

August 4, 2009 1:07 PM
 

Dandy Taylor said:

Having worked at an ISV before, I know that it's incredibly hard to get some of these things right.  There are a large number of variables at play.  That being said, just because it's hard doesn't mean you should just suggest a server with more power.  You're right, when you suggest a client upgrade their hardware, you're only prolonging the inevitable.  They will hit the wall again and they'll be back to you looking for answers.

In my opinion, too many ISVs have it wrong when they benchmark their systems.  All too often they do that benchmarking in huge labs and start with state of the art servers, moving up from there.  When in reality, they should be starting with a less than ideal modern system.  The reason being, no matter what you test on, your customers will try to run your multi-tiered web app on a laptop.  It seems absurd to us as developers, but I know we've all seen server rooms with old 486s running web servers and the like.  You can try to ignore it, but your customers will not all have the best hardware, but they will expect equal performance.

August 4, 2009 1:40 PM
 

Glenn Berry said:

Unless your hardware is truely pathetic, a software vendor should never recommend that you upgrade your hardware to "solve" a performance or scalability issue.

Outside of SANs, hardware has gotten so cheap that it is often much cheaper and certainly easier to throw hardware at a performance problem, rather than have a team of developers fix it with rearchitecture or refactoring.

I can certainly see why a business might decide to go that route for an internally developed application. I don't think it is acceptable for an ISV to ask their customers to throw hardware at a performance problem.

Of course, when we are lucky, we can find and fix something easy like a missing index, that makes your boss think you are a magician...

August 4, 2009 2:04 PM
 

noeldr said:

>>>I wouldn't pay a vendor to fix their broken code, I'd change vendors at that point personally.

Oh brother, How much do you charge Microsoft for each Service Pack or CU they release?

:-))

August 4, 2009 2:25 PM
 

Jonathan Kehayias said:

noeldr,

I think you misread change as charge, which would definitely affect the meaning of that sentence.  If I, or any other individual for that matter, had to pay Microsoft to fix a problem with the current version of their applications, Microsoft would sink.  People would, as I suggested, change vendors because that is bad business.  You might have a few customers dedicated to the platform that would have to accept that kind of solution and pay for it but that would be the minority.

August 4, 2009 2:48 PM
 

Jason said:

I had a customer that grew from a few mb of data to a couple of TB over a few years. It probably started out as a contact form and grew into a CRM. In that time, the code base grew to several GB of classic ASP with inline SQL. They tried to scale out with linked servers. You know where this is going...

Anyway, it is like a house of cards with all of the dependent processes and integration points from internal and external systems.

To rewrite following all best practices would probably cost several million in dev time. You can buy a loaded 8 way 6 core box with 256\512GB of RAM for less than $100k.

Not much of a case you can make to the decision makers with that math.

August 4, 2009 3:30 PM
 

Jonathan Kehayias said:

Jason,

At some point the code base will have to be updated from what is more and more becoming legacy code.  Your example however, is an internal app and not a vendor application.

August 4, 2009 3:44 PM
 

BlackWasp said:

@Jonathon

I think in the end we probably agree that poor software needs attention (or replacement). Sometimes though the hardware provided by users is ridiculously below par.

I have had customers who specced ten users and then upgraded to many hundreds, each executing an order of magnitude more transactions and wondered why the software slowed and we suggested hardware improvements. These tend to be the same customers that read the minimum hardware requirements for the O/S, the DBMS, the Mail solution and stick rigorously to those requirements - even when they decide to install all three on the same box.

On the other hand, I have been in situations where the customer has specified a solution with anticipated usage scenarios that have ended up being way off the mark. I remember one gambling web site I worked on were the customer specified that the solution must support 5,000 bets per hour but they ended up getting closer to ten. It was interesting to see a small farm of servers running almost permanently at 0% utilisation.

Still, it's all a part of the game and what keeps life interesting, eh?

August 4, 2009 3:59 PM
 

Remus Rusanu said:

I fully agree that the application is the *best* place to invest your money. Most applications I see when I'm called to 'save the day' are dismal: every query is a table scan, every update is table lock (well, not quite, but close).

Unfortunately the know-how on how to fix these is usually quite hard to find. In my experience the typical application developer simply has absolutely no idea of SQL  behavior beyond the syntax, and that at a level of 'WHERE goes after FROM'. This is really sad, and specially when you consider that reading a curriculum of about 3 books or less would make any developer worth its salt a more than decent database developer. And if your application uses databases, then 99% of your problems are going to come from the database.

With the advent of LINQ I see things being even worse, developers straight out *refuse* to learn the SQL syntax. Don't get me wrong, I really like LINQ and admire the way it was integrated into the language. But LINQ is not yet anywhere close to the point where you can ignore what locks are or how logging and recovery works.

So being that SQL talent is actually quite hard to come by, I understand why an expenditure of 5k-10k on hardware is actually sound for some shops. That being said, any shop that actually has access to someone like Jonathan or Aaron and still wants to throw money on the hardware first, now that I don't understand.

August 4, 2009 9:41 PM
 

Kristofer said:

I'll just reiterate the point others have made above about cost of upgrading hardware vs fixing the software. In my experience many shops have 10s of "database ignorant" devs per developer who has a clue what goes on in the database. That, and the extra overhead of changing existing code without introducing regression bugs usually makes it expensive to fix even simple mistakes. First someone need to wake up and drag in the [sometimes only] db-aware developer and then he or she need to dig into [usually] someone elses code to fix the errors.

I am starting to lose track of how many times I have been asked to do performance analysis on production systems only to discover that system wide performance problems are caused by a handful of queries containing newbie errors resulting in either table scans (#1) or excessive roundtrips (looped queries) (#2). Simple things like "select .. from some_large_table slt where slt.one_column + slt.another_column = 'whatever'", or join predicates with 'or' conditions, or someone running the same query over and over in a loop, or ...

So what is the solution? Developers writing db-code/queries need to bother being more 'database aware' instead of having the 'oh, a dba can always tune that later' attitude. Most db 'performance problems' I come across boil down to [less than 20] common mistakes. The few patterns that keep repeating themselves should be enough for any developer slightly interested in doing a good job to memorize...

Oh, and peer-reviews, peer-reviews, peer-reviews of code. With peer reviews of all code, simple mistakes can easily be spotted early on and fixed before the dev who wrote the code has forgotten the how/why/where. Fixing that kind of stuff early on are often (by magnitudes) cheaper than 6/12/18/24 months down the road when noone any longer remember the details surrounding the implementation of a piece of code...

August 4, 2009 11:35 PM
 

Adam Haines said:

I have been there.... done this.. The very first "solution" seems to be bigger hardware. Over the past year, my company has been migrating all of our data to a new third party application.  We choose to do a phased implementation, where we migrate specific regions per phase. Anyway, to make a long story short, I noticed a trending relationship that as we added more phases the application became lesser and lesser performant. I reported this to both upper managment and the vendor. I recommended we start looking at the code to make sure it scales with our user load and data.  The vendor recommended that we beef up our hardware and management agreed.  I again told both management and the vendor that hardware would not solve the scalability problem.  I knew hardware alone was not enough because this application has a sleu of coding problems and inefficiencies.  I will leave it at this, we have finished our phased implementation and now the vendor is starting to optimize the code and has even asked us to help them identify problematic code.  

The bottom line is hardware will only take you so far...

August 5, 2009 12:50 AM
 

Ian said:

One problem I have had working for many application vendors over the years, is that one customer’s “production level load” is very different form another customers, as customers use the application in very different easy.  This can mean that a table may normally only have a few hundred rows, but for one customer it will have 50,000 rows, or that the mix of queries are very different between customer.

It is easy for the customer DBS to say,  “change this and it will go fast” without any understanding of how it might effect other customers and the cost of testing any such change.  Even if the code change was done on the day the customer had the problem, it could take over a year for a new release to get to the customer by the time the testing etc has been cone, new hardware get be put in place a lot quicker.

(The best way to get the vender to make the application un fast for your data, is to let the vender’s developers use a copy of your database for their testing.  This will take a few years to have an effect, but developers to like to test with “real data” when it is not to hard to do so.)

Often the customer will pay lots of money for the software then try to run it on a old server that is already running other applications – this is common enough that you have to assume it is the customers IT people at fault until proven otherwise!

August 5, 2009 1:55 PM
Anonymous comments are disabled

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement