Last Thursday, March 25, 2010, the topic of Virtualization of SQL Server came up in the SSWUG Newsletter, with Steven Wynkoop asking if peoples perceptions and experiences have changed since the last time he covered virtualizing SQL Server. I unfortunately missed the last coverage of this topic, but it appears from the newsletter that there was a general consensus that “low-traffic solution could be fine, but if you had a heavy hitting application, the net advise was to avoid a virtual environment because the load just couldn't be answered in the same way that a dedicated server could provide.” I’ve personally heard this information numerous times over the last four years, so its not surprising to see it being repeated in the SSWUG newsletter.
There are certainly SQL Server implementations where the current limitations of a virtual world would preclude virtualizing the environment, since there are limits on the number of vCPU’s (Hyper V allows 4, and VSphere 4 allows 8) and amount of memory (Hyper V allows up to 64GB, and VSphere 4 allows up to 255GB) each virtual machine can use. If you have a SQL Server with dual duo core processors and 32GB of RAM that is consistently running forty percent CPU load, its not a good candidate for virtualization, and attempting to do so will result in less performance than a physical implementation. This doesn’t describe the majority of servers that I have seen in my own work, or in doing consulting work, and I’d say that a majority of the SQL Servers in use by most businesses could be virtualized and still achieve acceptable, if not better performance than their current physical implementations.
I the last two weeks I’ve answered quite a few forums posts that dealt with SQL Server performance problems that turned out to be virtualized servers, I’ve been having discussions on Twitter with Denny Cherry (blog/twitter) and Buck Woody (blog/twitter) about how to best configure a new virtual cluster we are building at my job for SQL Server, and I’ve answered some questions about monitoring virtualized servers in VMware ESX for a couple of MVP’s that do consulting work and have had to look at virtualized SQL Servers. I am by no means an expert at virtualization, but I have almost four years of experience working with virtualized production SQL Servers that hosted all sizes and types of SQL databases, and I’ve had great results.
I am not alone in this experience based on the feedback provided by Mike in the March 30th newsletter, but if you read through Mike’s response and compare it to every other piece of feedback provided, you should easily see that there is a fundamental difference in his implementation and the level of technical knowledge of the staff. Many of the other comments provided in the other newsletters show a fundamental lack of technical knowledge by the respondents as well as their support staff, which is a big contributor to their real and/or perceived problems with virtualizing SQL Server.
Busting Some Myths in the Feedback
While reading through the feedback I noticed that there was a good bit of misinformation being presented by people who were against virtualizing SQL Server. Some of the items might have been true three or four years ago, but hypervisor and feature improvements in the virtual space have done a lot to allow virtual machines to perform at comparable levels to similarly sized VM’s.
You won’t get acceptable IO performance from a VM
A large number of the responses focused around problems with IO performance in a virtualized SQL Server, for example:
“if you focusing on the disk IO as the key factor in performance you’ll find that in all but the most heavily used systems that the virtual server option is viable.” – Steve 03/31/2010
“The problem is that we test the setup on a new iSCSI device, then end up with a variety of systems sharing the same “shelves” (sets of drives) and the result is slower than expected performance.” – Rick 03/26/2010
“The main problem will always be one of storage subsystem and I/O. The biggest challenge with SQL virtualization, I believe, is how to optimize the storage back-end, especially with so many IT shops deciding to go with iSCSI NAS solution for their guest OS farm.” – George 03/26/2010
In each of these scenarios, the virtual environment hasn’t been given a fair opportunity to perform. There seems to be a major disconnect in how people think about storage for a VM versus a physical implementation of SQL Server, and this drives a vast majority of the performance problems I find when people run SQL Server in a virtual server. This problem is almost entirely rooted in over selling the cost savings associated with virtualization to the point that you cut your legs out from under you. Would you try running a physical production SQL Server using a heavily shared NAS iSCSI device for database storage?
If you did, you’d have the same IO performance problems in the physical server that you would in the virtual server. The problem here is that you are using shared disks, and you have competing IO demands, something that is well documented as a performance bottleneck for SQL Server. In addition to this, a lot of NAS iSCSI devices I see these days have large drives 300GB up to 1TB in size in a RAID 5 configuration, so a NAS with 6 drives yields roughly 5TB of storage which is great capacity wise, but really slow performance wise. The problem here has absolutely nothing to do with whether or not the SQL Server is virtual, it has to do entirely with the configuration of IO subsystem backing the server.
If the IO subsystem is properly configured good IO performance can be achieved from a virtual machine running SQL Server. The following charts show the results from SQLIO on a 2 vCPU 4GB RAM, virtual machine running in VSphere 4, using VMDK’s on heavily shared LUN’s from a fiber attached SAN. The tests used a 48GB test file and ran 64K Random and Sequential Read and Write tests using 4, 16, and 64 threads with 4, 8, and 8 pending respectively, based on a short test set provided to me by Brent Ozar (blog/twitter) for benchmarking a server before and after a reconfiguration of the disk arrays in a minimal downtime to show improvements.
This is hardly the best configuration for SQL Server, but the limit for IO load is just at the limits of our 4Gb/s SAN fabric. A better configuration would use raw device mapping for the data and log LUN’s and have MPIO properly configured for the HBA’s, a configuration I’ll be testing in the near future, but you can certainly get good IO performance out of a virtualized SQL Server when the hardware has been sized properly for IO in general.
You can’t adequately troubleshoot performance problems in a VM
This comment is very common from people that don’t have much experience working with VM’s.
“Again, since it’s hard to tell what’s using the I/O, it’s hard to give them metrics as to why/how this isn’t performing well.” – Rick 03/26/2010
“And then has to prove the performance problem is with the server/storage area, instead of the application, yet doesn't have access to see anything about the vmWare/storage side that can really help to troubleshoot the problem? And it sounds like he has no way, nor do the VM admins, to monitor the I/O in an application-oriented way?” -
Both Hyper V and VMware provide the ability to perform detailed analysis of performance at the host level. In VMware has esxtop for performing analysis from the host level and monitoring the resources being used by the guests, and Hyper-V can be monitored through perfmon. However, you don’t really need to look at the host to know that a problem or bottleneck exists, especially with SQL Server. The guest OS still has performance counters and SQL Server has wait stats and file stats that can tell you where your bottlenecks are. If it takes 25ms to read from disk it takes 25ms to read from disk. If you have high accumulated wait times for PAGEIOLATCH_*, WRITELOG, ASYNC_IO_COMPLETION, or IO_COMPLETION waits, associated with high latency numbers for the Windows Performance Counters for Physical Disk in the VM, you can bet that you have an disk related bottleneck and at that point you need to jump out to the host to identify the what/why with the host level tools.
My own thoughts
It is very clear that there is a significant lack of understanding by the community and apparently some of its leaders how virtualization can or should be done with SQL or any other server for that matter. The biggest problem I see, and it has occurred in my own jobs, is oversubscription of hardware, resulting in bottlenecks. The other prevailing problem is the fact that SQL Server DBA’s aren’t benchmarking accurately or troubleshooting problems correctly leading them to incorrect decisions that make absolutely no sense. Take for example the comments by Leanne in today’s newsletter:
"I just got done supporting a client that we started off with SQL Server 2008 on a Virtual Server 2008. We were three months before going into production with it when it started getting heavy hitting from testing. Over about 1 to 2 days of use, the memory use would climb slowly up to 100% and SQL would choke. We would be forced to reboot the server to clean things up and then it would start all over again. We tried everything we could think of with how SQL Server was using memory but nothing helped. The sys admins tried everything they could think. They finally told us that from what they could see, for some reason, the virtual was not recognizing the memory getting freed up by processes so slowly it chocked on itself.
Two weeks before we went live we were given a real Server and we had no problems with memory use from that point on. Given my experience, I would be hesitant to recommend to anyone to use SQL Server 2008 on a virtual server without extensive testing in this vein."
To be honest, some more information is needed to be certain, but the problem described has nothing to do with being a VM, and is rooted in SQL Server being misconfigured for memory usage. This is one of the most common problems people ask about on the forums, and it has nothing to do with a memory leak it is how SQL Server uses memory, and will be the same on the physical server if they had an identical setup. (Troubleshooting the SQL Server Memory Leak)
Another reader, Dan, comment in today’s newsletter:
“I can check waitstats in the DMV if it’s a recent version of SQL Server (but we still have a lot of SQL 2000 here). But then what?”
You can check waitstats in SQL 2000 as well, DBCC SQLPERF(waitstats) will return the accumulated waits for the SQL Server. Why didn’t Steven point this out and provide a service to the community instead of posting comments that are riddled with incorrect information and prone to lead others down the same incorrect thought process about virtualizing SQL Server? I don’t know, I’ll leave that to Steven to address because I am sure he reads through the SQLBlog content based on other posts on here.
The complete list on newsletters covering this topic can be found at
Friday, March 26, 2010
Monday, March 29, 2010
Tuesday, March 30, 2010
Wednesday, March 31, 2010
Thursday, April 2, 2010
This is another good example of where you should be careful what you trust from the internet, even when it is purportedly coming from a trusted source in the community such as SSWUG, because the information contained in the feedback hasn’t been vetted for any kind of technical accuracy and is full of incorrect information. I’d point out that Microsoft has migrated a majority of their own SQL Server environment to VM’s and even provides a whitepaper that documents how they did this:
http://technet.microsoft.com/en-us/library/dd557540.aspx