Through empirical data presented in my two previous posts on the performance impact of controller cache configurations on sequential disk I/O workloads and large SQL workloads, we observed that allocating controller cache to writes would result in better performance, at least with the described workloads. Let me now turn attention to SQL Server OLTP workloads, and check how the controller cache configurations may impact the transaction throughput of an archetypal OLTP workload--TPC-C in its standard transaction mix.
As you'll see below, the same observation still holds.
The best way to understand TPC-C is to read its specifications at http://www.tpc.org/. But briefly, it models after the data processing of a business that sells items to customers in multiple districts. The customers may place orders for items, and the business maintains items in stock in multiple warehouses. The database keeps track of these business activities. So you'll find a customer table, an order table, an item table, a stock table, and so on. And like any real business, there are certain constraints among these tables so that you can't just scale one table without also scaling some other tables. In fact, the number of rows in most tables--thus the size of the database--is ultimately determined by the number of warehouses. So, if you want to test a larger database, you specify a larger number of warehouses.
For my tests, I sized the test to 100 warehouses that resulted in about 9GB of allocated storage space. And to ensure that the disk I/Os would be heavily exercised, I set the SQL Server buffer pool to 2GB.
The data file was placed on the F drive and the log file was placed on the G drive. F and G were two individual 72GB 10,000rmp SAS disk drives that shared the same HP Smart Array P400 controller with 512MB for cache.
The stored procedures executed against these tables model five typical business activities: (1) placing a new order, (2) making a payment, (3) checking the order status, (4) making a delivery (i.e. fulfilling an order), and (5) checking the stock level. Among these stored procedures, order status and stock level are read only; the others are read/write. The transaction throughput metric is the number of new orders processed.
The standard transaction mix is as follows:
New Order: 45%
Order Status: 4%
Stock Level: 4%
This simply means that, for instance, 45% of the time the client program submits the new order transaction (i.e. calling the new order stored procedure). Strictly speaking, what I ran can't be called TPC-C because not all the stringent TPC-C rules were followed. For instance, to simplify the tests, I did not leave any think time or wait time between consecutive database calls. But that works fine for this post since I just wanted to apply an OLTP workload to the test system and I'm not here to publish any official TPC-C results.
The test results are shown in the following chart:
Note that what's important in this case is the relative throughputs among the three different test scenarios, and these three test scenarios were different only in how the controller cache was configured. The absolute throughput numbers do not mean anything.
So as suspected, giving all 512MB of the controller cache to reads led to a significantly lower throughput, compared to the scenarios where a large portion of the cache (384MB or 512MB) was given to writes.
How was the I/O subsystem exercised by these workloads? Let me focus on the tests with 100 users. From a disk I/O perspective, the OLTP workloads during the steady state can be characterized as follows:
On the F drive with the data file
I recorded the following perfmon counters:
* Avg. Disk Bytes/Read = ~8KB, and Avg. Disk Bytes/Write = ~8KB
* Avg. Disk sec/Read = ~275ms, and Avg. Disk sec/Write = ~320ms
So 8KB pages were being read from the data file into the buffer pool randomly, and the lazy writer was writing 8KB pages out to the data file randomly. And these I/O requests saturated the F drive. Note that during the 3 minute test duration, one checkpoint was issued manually.
On the G drive with the log file
I recorded the following perfmon counters:
* Avg. Disk Bytes/Write = ~4KB
* Avg. Disk sec/Write = ~28ms
So SQL Server was writing to the log in ~4KB blocks, and the G drive was pushed hard.
If we look at the waitstats obtained during the test run, not surprisingly the waits were heavily concentrated on Page IO latch wait.