When you deploy a production cluster, you want any failover to be as transparent to your users as possible. It follows that the nodes within the cluster should be identical in terms of their hardware and software configurations. It further follows that the nodes are expected to perform the same or virtually the same.
In practice, performance parity among the nodes is often implicitly assumed, and this implicit assumption is rarely, if at all, checked. After all, if everything is the same among the nodes (other than the fact that they are different machines), why shouldn’t you expect them to perform the same or similarly? And since these machines are made of literally millions of elements—hardware components, software code lines, and program configurations, it is actually very difficult to verify whether two machines indeed perform the same or similarly enough. Furthermore, in many cases when we compare different computers, we are not even sure what performance should be measured or whether the performance we are measuring is relevant to what really matters at the database application level. Consequently, implicitly assuming identical or similar performance among the nodes may not be a lazy cop-out, but rather a practically cost effective decision.
That is all fine until the performance difference among the nodes starts to hurt. And it turns out that identically spec’ed machines can perform differently more often than you may expect.
In fact, I have run into enough cases where the nodes had significantly different performance that I believe each/every node should always go through a standard benchmarking test before the cluster is deployed into production. The benchmark is not expected to spot all the performance difference, but it can give us a higher level of confidence that when a SQL Server instance fails over from one node to another, the application performance will not degrade.
The question then is: what kind of benchmark should you run? Different people would offer different answers. None is necessarily wrong, nor is any necessarily the best in all scenarios.
Here is my own experience.
Since we are talking about SQL Server, it needs to be a SQL Server specific benchmark. I have run into many cases where multiple generic hardware tests show no difference between two nodes, but the application database behaves differently after a failover.
In addition, if we are going to run a benchmarking test regularly, it needs to be sufficiently simple to administer and does not require an excessive amount of time before we can determine whether the nodes do in fact perform the same or not. Ideally, we should be able to make that determination within an hour. Obviously, there is a tradeoff between simplicity and coverage, and it is not realistic to expect a benchmark to be both simple and to be able to uncover all the difference that your application may be sensitive to.
So it makes sense to focus on common or likely scenarios. A common set of scenarios that I have run into is related to differences in memory modules. Memory modules that are spec’ed and expected to be the same are in fact significantly different in performance, resulting in database performance being different on different nodes. In many of these scenarios, I tried straightforward memory benchmarks without much success. Note that these memory benchmarks typically perform repeated memory copies/writes, either single threaded or multi-threaded. When I ran them, they reported no difference between the nodes. But the database applications suggested otherwise.
Increasingly, I have found these simple memory performance benchmarks to be less than effective in predicting how databases performance may vary with different memory modules. I think the following factors may be in play here:
- Servers are having more memory chips. And the chance of some of them not performing well or not performing well with other modules increases.
- Simple memory benchmarks may not exercise all the modules, or may have no option to configure how much memory to exercise.
- Simple memory benchmarks may not sufficiently exercise the interactions among the memory modules, making it very different from how SQL Server may exercise them.
My solution is to use an entirely cached database with a sufficient number of concurrent users randomly querying the cached data. This forces all the processors to be used ~100%, doing pure page lookups covering all the cached pages.
I have been using this approach for many years to benchmark server performance, and found it to be sufficiently effective. Note that this is not strictly a memory test. Rather, it tests the entire host computer (including CPU, memory, memory controller/chipset). Most recently, I went through a series of test cycles with this benchmark in both controlled and uncontrolled environments to check the impact of memory modules on the SQL Server performance, and I’m happy to report that the benchmark was able to report the difference that simple memory copy tests often failed to see.
One of the nodes in your cluster may already be different (due to differences in memory modules or any other components), but when your database fails over between the nodes, it not only performs fine but exactly the same. Although the performance difference may never show up at the database level, it often shows up when it really hurts, i.e. when the load has reached a certain capacity limit. It’s far better to find out in advance any difference in node performance before it becomes visible at the database level.