THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

The Rambling DBA: Jonathan Kehayias

The random ramblings and rantings of frazzled SQL Server DBA

Some thoughts on the Virtualization Feedback in the SSWUG Newsletters

Last Thursday, March 25, 2010, the topic of Virtualization of SQL Server came up in the SSWUG Newsletter, with Steven Wynkoop asking if peoples perceptions and experiences have changed since the last time he covered virtualizing SQL Server.  I unfortunately missed the last coverage of this topic, but it appears from the newsletter that there was a general consensus that “low-traffic solution could be fine, but if you had a heavy hitting application, the net advise was to avoid a virtual environment because the load just couldn't be answered in the same way that a dedicated server could provide.”  I’ve personally heard this information numerous times over the last four years, so its not surprising to see it being repeated in the SSWUG newsletter.

There are certainly SQL Server implementations where the current limitations of a virtual world would preclude virtualizing the environment, since there are limits on the number of vCPU’s (Hyper V allows 4, and VSphere 4 allows 8) and amount of memory (Hyper V allows up to 64GB, and VSphere 4 allows up to 255GB) each virtual machine can use.  If you have a SQL Server with dual duo core processors and 32GB of RAM that is consistently running forty percent CPU load, its not a good candidate for virtualization, and attempting to do so will result in less performance than a physical implementation.    This doesn’t describe the majority of servers that I have seen in my own work, or in doing consulting work, and I’d say that a majority of the SQL Servers in use by most businesses could be virtualized and still achieve acceptable, if not better performance than their current physical implementations.

I the last two weeks I’ve answered quite a few forums posts that dealt with SQL Server performance problems that turned out to be virtualized servers, I’ve been having discussions on Twitter with Denny Cherry (blog/twitter) and Buck Woody (blog/twitter) about how to best configure a new virtual cluster we are building at my job for SQL Server, and I’ve answered some questions about monitoring virtualized servers in VMware ESX for a couple of MVP’s that do consulting work and have had to look at virtualized SQL Servers.  I am by no means an expert at virtualization, but I have almost four years of experience working with virtualized production SQL Servers that hosted all sizes and types of SQL databases, and I’ve had great results.

I am not alone in this experience based on the feedback provided by Mike in the March 30th newsletter, but if you read through Mike’s response and compare it to every other piece of feedback provided, you should easily see that there is a fundamental difference in his implementation and the level of technical knowledge of the staff.  Many of the other comments provided in the other newsletters show a fundamental lack of technical knowledge by the respondents as well as their support staff, which is a big contributor to their real and/or perceived problems with virtualizing SQL Server. 

Busting Some Myths in the Feedback

While reading through the feedback I noticed that there was a good bit of misinformation being presented by people who were against virtualizing SQL Server.  Some of the items might have been true three or four years ago, but hypervisor and feature improvements in the virtual space have done a lot to allow virtual machines to perform at comparable levels to similarly sized VM’s.

You won’t get acceptable IO performance from a VM

A large number of the responses focused around problems with IO performance in a virtualized SQL Server, for example:

“if you focusing on the disk IO as the key factor in performance you’ll find that in all but the most heavily used systems that the virtual server option is viable.” – Steve 03/31/2010

“The problem is that we test the setup on a new iSCSI device, then end up with a variety of systems sharing the same “shelves” (sets of drives) and the result is slower than expected performance.” – Rick 03/26/2010

“The main problem will always be one of storage subsystem and I/O.  The biggest challenge with SQL virtualization, I believe, is how to optimize the storage back-end, especially with so many IT shops deciding to go with iSCSI NAS solution for their guest OS farm.” – George 03/26/2010

In each of these scenarios, the virtual environment hasn’t been given a fair opportunity to perform.  There seems to be a major disconnect in how people think about storage for a VM versus a physical implementation of SQL Server, and this drives a vast majority of the performance problems I find when people run SQL Server in a virtual server.  This problem is almost entirely rooted in over selling the cost savings associated with virtualization to the point that you cut your legs out from under you.  Would you try running a physical production SQL Server using a heavily shared NAS iSCSI device for database storage? 

If you did, you’d have the same IO performance problems in the physical server that you would in the virtual server.  The problem here is that you are using shared disks, and you have competing IO demands, something that is well documented as a performance bottleneck for SQL Server.  In addition to this, a lot of NAS iSCSI devices I see these days have large drives 300GB up to 1TB in size in a RAID 5 configuration, so a NAS with 6 drives yields roughly 5TB of storage which is great capacity wise, but really slow performance wise.  The problem here has absolutely nothing to do with whether or not the SQL Server is virtual, it has to do entirely with the configuration of IO subsystem backing the server.

If the IO subsystem is properly configured good IO performance can be achieved from a virtual machine running SQL Server.  The following charts show the results from SQLIO on a 2 vCPU 4GB RAM, virtual machine running in VSphere 4, using VMDK’s on heavily shared LUN’s from a fiber attached SAN. The tests used a 48GB test file and ran 64K Random and Sequential Read and Write tests using 4, 16, and 64 threads with 4, 8, and 8 pending respectively, based on a short test set provided to me by Brent Ozar (blog/twitter) for benchmarking a server before and after a reconfiguration of the disk arrays in a minimal downtime to show improvements.

image

image

This is hardly the best configuration for SQL Server, but the limit for IO load is just at the limits of our 4Gb/s SAN fabric.  A better configuration would use raw device mapping for the data and log LUN’s and have MPIO properly configured for the HBA’s, a configuration I’ll be testing in the near future, but you can certainly get good IO performance out of a virtualized SQL Server when the hardware has been sized properly for IO in general.

You can’t adequately troubleshoot performance problems in a VM

This comment is very common from people that don’t have much experience working with VM’s. 

“Again, since it’s hard to tell what’s using the I/O, it’s hard to give them metrics as to why/how this isn’t performing well.” – Rick 03/26/2010

“And then has to prove the performance problem is with the server/storage area, instead of the application, yet doesn't have access to see anything about the vmWare/storage side that can really help to troubleshoot the problem?  And it sounds like he has no way, nor do the VM admins, to monitor the I/O in an application-oriented way?” -

Both Hyper V and VMware provide the ability to perform detailed analysis of performance at the host level.  In VMware has esxtop for performing analysis from the host level and monitoring the resources being used by the guests, and Hyper-V can be monitored through perfmon.  However, you don’t really need to look at the host to know that a problem or bottleneck exists, especially with SQL Server.  The guest OS still has performance counters and SQL Server has wait stats and file stats that can tell you where your bottlenecks are.  If it takes 25ms to read from disk it takes 25ms to read from disk.  If you have high accumulated wait times for PAGEIOLATCH_*, WRITELOG, ASYNC_IO_COMPLETION, or IO_COMPLETION waits, associated with high latency numbers for the Windows Performance Counters for Physical Disk in the VM, you can bet that you have an disk related bottleneck and at that point you need to jump out to the host to identify the what/why with the host level tools.

My own thoughts

It is very clear that there is a significant lack of understanding by the community and apparently some of its leaders how virtualization can or should be done with SQL or any other server for that matter.  The biggest problem I see, and it has occurred in my own jobs, is oversubscription of hardware, resulting in bottlenecks.  The other prevailing problem is the fact that SQL Server DBA’s aren’t benchmarking accurately or troubleshooting problems correctly leading them to incorrect decisions that make absolutely no sense.  Take for example the comments by Leanne in today’s newsletter:

"I just got done supporting a client that we started off with SQL Server 2008 on a Virtual Server 2008.  We were three months before going into production with it when it started getting heavy hitting from testing.  Over about 1 to 2 days of use, the memory use would climb slowly up to 100% and SQL would choke.  We would be forced to reboot the server to clean things up and then it would start all over again.  We tried everything we could think of with how SQL Server was using memory but nothing helped.  The sys admins tried everything they could think.  They finally told us that from what they could see, for some reason, the virtual was not recognizing the memory getting freed up by processes so slowly it chocked on itself.
Two weeks before we went live we were given a real Server and we had no problems with memory use from that point on.  Given my experience, I would be hesitant to recommend to anyone to use SQL Server 2008 on a virtual server without extensive testing in this vein."

To be honest, some more information is needed to be certain, but the problem described has nothing to do with being a VM, and is rooted in SQL Server being misconfigured for memory usage.  This is one of the most common problems people ask about on the forums, and it has nothing to do with a memory leak it is how SQL Server uses memory, and will be the same on the physical server if they had an identical setup. (Troubleshooting the SQL Server Memory Leak)

Another reader, Dan, comment in today’s newsletter:

“I can check waitstats in the DMV if it’s a recent version of SQL Server (but we still have a lot of SQL 2000 here).  But then what?”

You can check waitstats in SQL 2000 as well, DBCC SQLPERF(waitstats) will return the accumulated waits for the SQL Server.  Why didn’t Steven point this out and provide a service to the community instead of posting comments that are riddled with incorrect information and prone to lead others down the same incorrect thought process about virtualizing SQL Server?  I don’t know, I’ll leave that to Steven to address because I am sure he reads through the SQLBlog content based on other posts on here.

The complete list on newsletters covering this topic can be found at

Friday, March 26, 2010
Monday, March 29, 2010
Tuesday, March 30, 2010
Wednesday, March 31, 2010
Thursday, April 2, 2010

This is another good example of where you should be careful what you trust from the internet, even when it is purportedly coming from a trusted source in the community such as SSWUG, because the information contained in the feedback hasn’t been vetted for any kind of technical accuracy and is full of incorrect information.  I’d point out that Microsoft has migrated a majority of their own SQL Server environment to VM’s and even provides a whitepaper that documents how they did this:

http://technet.microsoft.com/en-us/library/dd557540.aspx

Published Thursday, April 01, 2010 8:49 AM by Jonathan Kehayias

Comments

 

MeltonDBA said:

Good post!!!!  

April 2, 2010 2:04 PM
 

Stephen Wynkoop said:

Great article and excellent feedback.

I do want to point out that what we're publishing in the newsletter is a discussion of the poster's experiences based in their reality.  I do get a lot of feedback where different experiences match or don't match a specific feedback item, but we pass along what makes sense to help further the conversation.  

We aren't in a position to confirm one's experiences that they take the time to write in and participate with - we can publish them and talk about them and let others chime in with pro and con feedback, and that's specifically what I try to do - further the conversation/discussion.

It's not a statement of ultimate fact, but it *is* a statement of the poster's experience, findings, and opinion.  I think these are incredibly valuable to the community. Even in cases where they're "wrong" I get people writing back saying that they've had the same experience about "X" and were glad to hear a solution, or glad to hear others hit the same issues, or glad to hear about options...

It's a discussion about reality.  I think it's really important to understand that and take experiences from it, experiences that are based on their own real use of the product or solutions we cover.  Sometimes it's great, sometimes not so much, most times somewhere in the middle with great lessons learned.

Hope this helps.

April 2, 2010 2:14 PM
 

Jonathan Kehayias said:

Steven,

I get that the comments are being quoted as they come in, what I don't get is why there isn't any further discussion from SSWUG about the legitimacy behind some of the comments when they are completely inaccurate.  What we perceive and what is actually correct for the product can be very different, for example SQL Server 2005 on a x64 server with 16GB RAM will use all the memory on the server by default, and people perceive that as a memory leak or bug/problem with SQL, but that perception is wrong, they just have no idea how SQL Server works so they make incorrect assumptions.  

I think that providing feedback in this manner drives incorrect perceptions.  Maybe its a lucky coincidence that I was already working on a couple of blog posts on this topic but I will certainly make sure to finish and publish them in the next week to show that consolidation through virtualization is very viable, and can handle sustained workloads without performance problems.  

A number of the problems people listed would probably very easy to pinpoint to oversubscription of the hardware, either at the VM Host or disk subsystem.  Everyone wants to do it as cheap as possible, and that is a recipe for disaster in any scenario.  Why you would expect performance from a shared disk configuration that doesn't meet the minimum requirements for SQL Server alone, is beyond me, but many of the comments in your feedback have SQL on shared iSCSI NAS/SAN devices that don't have multipathing from the information provided, and then when it doesn't perform with 8 VM's configured on it, the problem is SQL can't be virtualized?

April 2, 2010 2:36 PM
 

James Luetkehoelter said:

Great post Jonathan!

April 2, 2010 2:50 PM
 

Stephen Wynkoop said:

Your comments and questions point out exactly why I don't look to solve problems for all of these folks, but rather facilitate discussion.  Questions and variables that cannot be defined effectively in a single email would lead to posts that are far more than a discussion of experiences.  

I did indeed point out that there was a big difference between "doing it right" with all the right hardware and such and doing it with what you've got on hand.  But I can't (effectively) list every variable or dispute every situation with possible solutions in every case... in the context of an editorial.  Behind the scenes, I do write back to many folks every date with ideas, approaches and things I've seen solve problems.  But, in the context the editorial discussion, it's important to be able to present just that; a discussion.  

If there experiences lead to further discussions, be it on SSWUG or here on your articles or ... whatever, then I've done my job and people will learn from one-another.  

I can't change their experiences - it's reality.  I'm just talking about those experiences in a way that others can participate too and learn from them, good or bad.

Hope that helps explain the approach and intent.

April 2, 2010 3:21 PM
 

Rich said:

Jonathon is totally right about how a trusted source should handle user supplied 'experiences'. My own experience is a case in point. I was sent this newsletter and since our techs are getting closer to virtualize our sql server 2008 machine I have been reading anything I can find on it. Based on my readings I have been hesitant. When SSWUG sent the newsletter it was all I needed to go to my boss and tell him about my concerns and that we really needed to be sure this thing is done right. Had Stephen provided some feedback on the user's experiences, with corrections or advice on resolution i would have simply added the info to my list and no red flare would have been sent up to the director of my department. It's embarrassing to me. I want to be fair to both software and hardware when I raise my concerns and SSWUG did those in my position no favors by providing me with a series of impressions by non-expert users with incorrect assessments about virtualization. I will definitely think thrice about the accuracy from ANY internet source in the future.

April 6, 2010 1:43 PM
 

Jonathan Kehayias said:

Rich,

Thank you for the comment, this was unfortunately the exact kind of thing that I was hoping to get back from this post.  If you'd like to discuss the merits of virtualizing SQL please feel free to send me a message through the contact form on this blog.  I have extensive experience with running and troubleshooting SQL virtualized in VMware and I'd be happy to provide any information about troubleshooting/benchmarking the environment that I can.  I will be posting a couple of blog posts here shortly on the topic that should help you with making the determination of whether or not to virtualize based on requirements.

April 7, 2010 12:57 AM
 

The Rambling DBA: Jonathan Kehayias said:

At the beginning of the month I blogged about my thoughts on the virtualization feedback provided by

April 13, 2010 10:29 PM
 

Wes Brown said:

Excellent post!

I test our VM'ed SQL Servers just like I do our physical ones. The ones we can VM we do period. The big things on VM that kill SQL Server are the same things that kill it in any shared environment. If you over provision memory, CPU and IO it won't perform. Microsoft and VMWare both have recommended guidelines for virtualizing applications like databases. What I find is people start out there but then quickly overload their VM servers and then blame VM for not performing well.  

Again, it isn't what you don't know, it's what you do know that ain't so.

-wes

October 8, 2010 10:53 AM
Anonymous comments are disabled

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement