Too many context switches per second are considered bad for your database performance. But how many is too many has never been clear. With the core count of new servers going up rapidly, it becomes even less clear how we should evaluate this counter to help understand the SQL Server behavior in the environments we support. Recognizing that any attempt to rehash what is already said/recommended out there will more likely be a disservice than a service, I’d like to look at it from a different angle, and hopefully contribute to its understanding with some data points.
Personally, I subscribe to the belief that one of the best ways to understand a behavior is to be able to create and manipulate the behavior on demand. And it naturally follows to ask: how can we drive up the value of the System\Context Switches/sec counter with a SQL Server workload?
Knowing how SQL Server schedules its tasks, I’d expect to be able to drive up context switches/sec by running a lot of very small tasks.
And that is indeed the case. Here is how it goes.
I first create two stored procedures that basically does nothing on the server side. These are just null transactions. (By the way, the parameters in these proc don’t mean anything. They are there because the client program I use expect them and I’m too lazy to modify the client program. Plus, it adds absolutely no value to modify the client code.)
CREATE PROC spStockLevel @w_id int, @d_id tinyint, @threshold smallint AS SET NOCOUNT ON SELECT 5 GO CREATE PROC spOrderStatus @w_id int, @d_id tinyint, @c_id int, @c_last char(16) = '' AS SET NOCOUNT ON SELECT 1, 'Jones', 'John', 'M', '2012-01-01', 2, 21, 2 GO |
Then, I simulate 200 concurrent users by starting 200 threads on a client and each thread calling these two procs within an infinite loop with no wait between the calls. The following chart shows the sustained values of the Context Switches/sec on a DL580 G7 with four E7-4870 processors when different number of cores are enabled. In all the cases, hyperthreading is enabled. And note that each E7-4870 has 10 cores.
With this approach, the value of the Context Switches/sec counter is driven into 200,000 to 250,000 per second range. These are pretty high numbers. I have no idea if they can be driven even higher with a different approach. But I know that I have not seen the counter approaching this level in any real production environment. If you have, let us know what kind values you have seen and with what kind of workload.
I should also report that this null transaction workload fails to push the total processor time very high. See the following chart:
The maximum %Processor Time (_Total) that can be reached by this workload ~24%. And this is not only the case with the different number of core count, but also is the case no matter how many concurrent users (threads) are submitting the transactions.
It is worth noting, and it is evident from the chart, that the %Privileged Time (_Total) accounts for a very large percentage of the %Processor Time (_Total). In a real production environment, this would spell trouble. With this null transaction workload, I don’t know whether this should be expected and is by design, or something is not behaving properly and the %Privileged Time should be much lower. But I do know that when the transactions are actually doing something useful (e.g. by including some non-trivial SELECT statements), we’ll see the %Privileged Time (_Total) value go down rather quickly. For instance, with the workload used in this previous post, the %Privileged Time (_Total) is typically around 1% while %Processor Time (_Total) is near 100%. And with that workload (which is doing a lot more useful work in its transactions), the Context Switches/sec counter is typically observed to be less than 49,000.
How useful are these data points? I’m not really sure. Hey, at least we know that this particular workload can drive up the Context Switches/sec counter. And if this starts a conversation, it would be a plus.