RAID 5 is a dirty word in the DBA community and beyond. There are websites devoted to trash RAID 5. I've seen DBAs declaring performance root cause found the very moment they found out that some database files were placed on RAID 5 volumes. You'd be ridiculed and run out of town if you dare to suggest putting the transaction log file on RAID 5. Is this knee jerk reaction to blaming RAID 5 for a database's performance failings really justified?
Why am I even asking this question? Isn't it already settled that RAID 5 is bad for writes? Well, only if life is that simple.
Below are two charts showing the performance of the same 8K sequential writes at various I/O queue depths applied to two RAID volumes presented to the same server: one is RAID 5 and the other is RAID 10. Both of them are presented from SANs. For all the data points in the charts, sqlio.exe--which is a free download from Microsoft--is configured with 32 threads to generate the I/O loads.
In the charts, the RAID 5 volume sustains much higher I/O throughput than does the RAID 10 volume, and it has significantly lower I/O latency than does the RAID 10 volume. Before you question the validity of the data, let me assure you that the performance difference illustrated in the charts are no fluke. The pattern is consistently reproducible.
How can RAID 5 outperform RAID 10 on 8K sequential writes?
Let's note that behind each of the two volumes is an I/O path consisting of many hardware/software components. Some of them are shared between the volumes, others are not. Many of the components have a significant impact on the I/O throughput and latency of the volume; they include HBAs and the software that manages the I/O paths on the host, SAN architecture, model of the SAN frames, cache on the SAN, RAID configuration, specifications of the physical disks inside the SAN frames, configuration of hardware hardware replication, number of spindles used by the volume, and so on.
So the RAID configuration is but one of many factors that influence the I/O performance of a volume. Differences in some of these other factors can result in a RAID 5 volume outperforming a RAID 10 volume on 8K sequential writes. The easiest example is that if the RAID 10 volume is enabled for synchronous replication in the SAN.
Okay, you can accuse me of comparing apples and oranges--i.e. it's not strictly RAID 5 vs. RAID 10 in the charts. Guilty as charged. But when you automatically declare the performance root cause found at the first sight of RAID 5, you have just committed the same fallacy of comparing apples and oranges.
I'm not disputing that RAID 5 is not as good as RAID 10 for writes with everything else being equal. However, the fact of life in a real enterprise SAN environment is that a RAID 5 volume and a RAID 10 volume almost never differ only in their RAID configuration. In such an environment, to be certain about the performance of a volume, you need to actually evaluate/test it. The information on the RAID configuration alone is far from being sufficient in making a recommendation.
If you have been asked to decide on which of these two volumes to place the transaction log file of a database, how would you proceed?