Over the last two years, I have stood up several proof-of-concept
(POC) database server systems with consumer grade SSD storage at cost $2-4K per TB.
Of course production servers are on enterprise class SSD, Fusion-IO and others, typically $25K+ per TB. (There are some special
situations where it is viable to deploy a pair of data warehouse servers with
non-enterprise SSD).
PCI-E SSDs - OCZ RevoDrive, RevoDrive X2, & RevoDrive 3 X2
The first POC system was a Dell T710 with 2 Xeon 5670
processors 96GB (12x8GB) memory, 16x10K SAS HDDs and 6 OCZ RevoDrive (original version)
PCI-E SSDs supporting table scans at nearly 3GB/s. The most difficult query repeatedly
hashed a large set of rows (as in there were multiple large intermediate result sets) generating
extremely heavy tempdb IO. With tempdb on 12 10K HDDs, the query time was 1
hour. With tempdb on the 6 OCZ RevoDrives, the query time was reduced to 20min.
Before SSDs became viable, I would normally have configured
a 2-socket system with 48 (2x24) 15K HDDs, with one RAID controller for each
24-disk enclosure. This setup costs about $11K per enclosure with 24x146GB 15K
SAS drive and can be expected to deliver 4GB/s sequential bandwidth, 10K IOPS at
low queue, low latency (200 IOPS per 15K disk) and in the range of 15-20K IOPS
at high queue, high latency. As it was my intent to deploy on SSD, I only
configured 16 HDDs in the internal disk bays and did not direct the purchase of external HDDs.
The 6 OCZ RevoDrive 120GB PCI-E SSDs in the POC system cost about
$400 each at the time (now $280?). I recall that the tempdb IO traffic was
something like 40K IOPS (64KB), around 2.5GB/s bandwidth. This was consistent with
the manufacturers specifications of 540MB/s read and 480MB/s write at 128K IO,
and considering that there will be some degradation in aggregating performance
over 6 devices. The IO latency was somewhere in the range of 40-60ms (note that the
SQL Server engine issues tempdb IO at high queue depth). OK, so the real purpose of
the POC exercise was to tell the SAN admin in no uncertain terms that the
350MB/s from his $200K iSCSI storage system (4x1GbE) was pathetic, and even the
700MB/s on 2x4Gbps FC ports does not cut mustard in DW.
The next set of systems was ordered with 4 OCZ RevoDrive X2,
160GB (<$500 each). There was some discussion on whether to employ the OCZ enterprise
class ZDrive R3, but this product was cancelled and the OCZ substitute, the
VeloDrive (4 SandForce 1565 controllers, rated for ~1GB/s), was not yet
available. I was expecting somewhat better performance for 4 RevoDrive X2 (4 SandForce
1222 controllers each, rated for 700MB/s) over 6 of the original RevoDrives (2 SandForce
controllers each). The tempdb IO intensive query
that took 20min with the 6 RevoDrives now ran in 15min with the 4 RevoDrive X2s.
In additional, IO latency was under 10ms.
I was hoping to test the new OCZ RevoDrive 3 X2 with 4
SandForce 2281 controllers, rated for 1500MB/s read and 1200MB/s write.
Unfortunately there is an incompatibility with the Dell T110 II with the E3-1240
(Sandy Bridge) processor which has a new UEFI replacing the BIOS. OCZ does not
provide server system support on their workstation/enthusiast products.
Hopefully Dell will eventually resolve this.
SATA SSDs - OCZ Vertex 2, Vertex 3 & Vertex 3 Max IOPS, Crucial C300 & m4
My preference is to employ PCI-E rather than SATA/SAS SSD
devices. This is mostly driven by the fact the disk enclosures reflect the IO
capability of HDDs, with 24 bays on 4 SAS lanes. An SSD oriented design should
have 4 SSDs on each x4 SAS port. Of course, 4 SSDs and 4-8 HDDs on each x4 SAS
port is also a good idea.
So I have also looked at SATA SSDs. Earlier this year, I
started with the OCZ Vertex 2 and Crucial C300 SSDs. After encountering the
issue with the RevoDrive 3 on the new Dell server, I acquired OCZ Vertex 3,
Vertex 3 Max IOPS, and Crucial m4 SATA SSDs. The OCZ Vertex 2 has a 3Gbps
interface, the Vertex 3 and both Crucial C300 and m4 all support 6Gbps SATA
interface.
The OCZ Vertex SSDs use SandForce controllers; the Vertex 2 uses
the previous generation SandForce 1222 and the Vertex 3 uses the current generation
2281 controller. The Crucial SSDs use Marvel controllers (both?). Perhaps the
significant difference between the OCZ Vertex and Crucial SSDs are that the
SandForce controllers implement compression. The OCZ Vertex SSDs have far
better write performance with compressible data, but is comparable for
incompressible data. It does appear that SQL Server tempdb IO is compressible and
benefits from the compression feature.
Another difference is that OCZ offers 60, 120, 240 and 480GB
capacities while Crucial offers 64, 128, 256 and 512GB capacities. All
capacities are in decimal, that is, 1GB = 10^9 bytes. Both OCZ 60GB and Crucial
64GB presumably have 64GB NAND flash, the 64GB being binary, meaning 1GB =
1024^3, or 7.37% more than 1GB decimal. Basically, OCZ has more over-provisioning
than the Crucial, which in theory should also contribute to better write
performance. (Earlier Vertex drives had 50 and 100GB capacities. But there are so
many varieties of the Vertex 2 that I cannot keep track.)
In brief, the performance difference between SSD
generations, both from the OCZ Vertex 2 to Vertex 3 and from the Crucial C300
to m4, is substantial, so I will focus mostly on the newer Vertex 3 and m4
drives. The performance observed in SQL Server operations seemed to be consistent
with manufacturer specifications for both generations of OCZ and Crucial SSDs. It
was not noted whether writing compressed SQL Server database tables were
further compressible by the SandForce controller. This may be because it is
difficult to achieve the high write performance necessary to stress modern SSDs
in SQL Server with transactional integrity features.
Test System - Dell T110 II, Xeon E3 quad-core Sandy Bridge processor
The test system is a Dell PowerEdge T110 II, Xeon E3-1240
3.30GHz quad-core processor (Sandy Bridge) with 16GB memory. This system has 2
PCI-E Gen2 x8 and 1 Gen 1 x4 slot. All SSDs were attached to a LSI MegaRAID SAS
8260 controller (PCI-E Gen2 x8, 8 6Gbps SAS ports). I did not some testing with
2 SSDs on the SATA ports (3Gbps) but did make detailed observations.
Incidentally, the cost of this system, processor, memory and 1 SATA HD was $1078? The 128GB SSDs were about $220 ($268 for the Max IOPS). So a very capable system with 2 SSDs could be built for $1300-1500 (64GB or 128GB SSDs). A better configuration with 2 SATA HDDs, 4 SSDs and SAS controller would push this to $2500. But if Dell and OCZ could resolve this RevoDrive 3 -UEFI issue, then I would recommend the T110 II, E3 processor, 16GB memory, 2 SATA HDDs and 1 RevoDrive 3 X2.
One unfortunate aspect of this system is that
the SATA ports are all 3Gbps per the Intel C202 PCH, even though the ever so
slightly more expensive C204 supports 6Gpbs SATA on 2 ports. Basically, this
has similar characteristics as the database laptop with super IO that I
proposed earlier, except that the laptop would be 2.5GHz to keep power
reasonable.
Performance tests with the TPC-H SF 100 database
With 8 SSDs (2 Vertex 3, 2 Vertex 3 MaxIOPS, 2 m4, and 2 C300)
I was able to generate 2.4GB/s in table scan aggregation query, possibly
gated by the 355MB/s rating of the C300s. A configuration consisting of the 4
Vertex 3 and 2 m4’s would have been gated by the 415MB/s rating of the m4. If
can get a total 8 Vertex 3s, which are rated at 550/500MB/s for compressible
and incompressible data, then I would either be limited by the adapter or the
PCI-E Gen2 x8 limit of 3.2GB/s. There is an LSI SAS8265 adapter with dual-cores
that has even higher IOPS capability, but it is not known whether this is
necessary for large block IO.
The tests consisted of running the TPC-H queries, single
stream (but not per official benchmark requirements).
The figure below show the time to run the the 22 queries (excluding statistics, parse and compile) for 2, 4 and 6 SSDs with no data compression (raw) and with page mode data compression.
Run time on 2 OCZ Vertex 3 (regular) SSDs was 815 sec with compression and 936 sec raw (w/o compression).
On 4 OCZ Vertex 3 SSDs (2 regular, 2 MaxIOPS)
total query times were reduced to 658 sec with compression and 622 sec raw.
On 6 SSDs, 4 OCZ and 2 Crucial m4, total query times are 633 sec with compression and 586 sec raw.
The figure below shows tempdb IO write latency for 2, 4 and 6 SSDs, with raw and compressed tables.
On the 2 OCZ SSDs, IO latency from fn virtual file stats (for the entire run)
averaged 30ms (temp write) and 90ms (data read) with compression
and 60ms (temp write) - 130ms (data read) without compression.
The performance with 2 Crucial m4 drives was less, showing much higher write latencies.
On 4 OCZ Vertex 3 SSDs, IO latency was 14 temp write and 30ms data read with compression
and 18-60ms without compression. The IO latencies on the Max IOPS models were lower than on
the regular Vertex 3 models.
For 6 SSDs, IO latencies are now down to the 15ms range, with somewhat higher
latency on the Crucial m4 SSDs.
With data and tempdb on 2 OCZ Vertex 3 SSDs, performance was decent but IO constrained. Performance was 15% better with data compression (page) than without, even though CPU was 23% higher.
Performance with 4 OCZ Vertex 3 SSDs (2 regular, 2 Max IOPS)
was 20% better for compression on and 34% better without compression, relative
to performance with 2 SSDs. The performance without compression was now 6% better than
with compression.
At 6 SSDs (4 Vertex 3, 2 m4), there was another 5%
performance improvement relative to 4 SSDs, for both compressed and not
compressed.
In the above tests, each SSD was kept as a standalone disk, i.e., I did not use RAID.
There was 1 data and 1 tempdb file on each SSD.
I noticed that the uncompressed (raw) database tended to generate 64K or 128K IO,
while the compressed database tended to have 256K IO.
Two queries, 17 and 19(?) generated 8KB IO,
and would have much better performance with data in memory.
There was also wide variation from query to query in whether performance was better with or without compression.