While going through the
Flash Management Summit 2012 slide decks, I came across the session
Flash Implications in Enterprise Storage Designs
by Denis Vilfort of EMC, that provided information on performance of the CLARiiON, VNX, a VNX2 and VNX Future.
A common problem with SAN vendors is that it is almost impossible to find meaningful performance
information on their storage systems.
The typical practice is to cited some meaningless numbers like IOPS to cache or the combined IO bandwidth of the FC ports, conveying the impression of massive IO bandwidth, while actually guaranteeing nothing.
The original VNX was introduced in early 2011?
The use of the new Intel Xeon 5600 (Westmere-EP) processors was progressive.
The decision to employ only a single socket was not.
EMC did provide the table below on their VNX mid-range systems in the document "VNX: Storage Technology High Bandwidth Application" (h8929) showing the maximum number of front-end FC and back-end SAS channels along with the IO bandwidths for several categories.
It is actually unusual for a SAN storage vendor to provide such information, so good for EMC.
Unfortunately, there is no detailed explanation of the IO patterns for each category.
Now obviously the maximum IO bandwidth can be reached in the maximum configuration,
that is with all IO channels and all drive bays populated.
There is also no question that maximum IO bandwidth requires all back-end IO ports populated
and a sufficient number of front-end ports populated.
(The VNX systems may support more front-end ports than necessary for configuration flexibility?)
However, it should not be necessary to employ the full set of hard disks to reach maximum IO bandwidth. This is because SAN systems are designed for capacity and IOPS.
There are Microsoft Fast Track Data Warehouse version 3.0 and 4.0 documents for the EMC VNX
5300 or 5500 system. Unfortunately Microsoft has backed away from bare table scan tests of disk rates in favor of a composite metric. But it does seem to indicate that 30-50MB/s per disk is possible in the VNX.
What is needed is a document specifying the configuration strategy for high bandwidth specific to SQL Server. This includes the number and type of front-end ports, the number of back-end SAS buses, the number of disk array enclosures (DAE) on each SAS bus, the number of disks in each RAID group and other details for each significant VNX model.
It is also necessary to configure the SQL Server database file layout to match the storage system structure, but that should be our responsibility as DBA.
It is of interest to note that the VNX FTDW reference architectures do not employ
Fast Cache (flash caching) and (auto) tiered-storage.
Both of these are an outright waste of money on DW systems and actually impedes performance.
It does make good sense to employ a mix of 10K/15K HDD and SSD in the DW storage system,
but we should use the SQL Server storage engine features (filegroups and partitioning)
to place data accordingly.
A properly configured OLTP system should also employ separate HDD and SSD volumes, again using of filegroups and partitioning to place data correctly. The reason is that the database engine itself is a giant data cache, with perhaps as much as 1000GB of memory.
What do we really expect to be in the 16-48GB SAN cache that is not in the 1TB database buffer cache? The IO from the database server is likely to be very misleading in terms of what data is important and whether it should be on SSD or HDD.
CLARiiON, VNX, VNX2, VNX Future Performance
Below are performance characteristics of EMC mid-range for CLARiiON, VNX, VNX2 and VNX Future.
This is why I found the following diagrams highly interesting and noteworthy.
Here, the CLARiiON bandwidth is cited as 3GB/s and the current VNX as 12GB/s
(versus 10GB/s in the table above).
I am puzzled that the VNX is only rated at 200K IOPS. That would correspond to 200 IOPS per disk and 1000 15K HDDs at low queue depth. I would expect there to be some capability to support short-stroke and high-queue depth to achieve greater than 200 IOPS per 15K disk.
The CLARiiON CX4-960 supported 960 HDD. Yet the IOPS cited corresponds to the queue depth 1 performance of 200 IOPS x 200 HDD = 40K. Was there some internal issue in the CLARiiON.
I do recall a CX3-40 generating 30K IOPS over 180 x 15K HDD.
A modern SAS controller can support 80K IOPS, so the VNX 7500 with 8 back-end SAS buses
should handle more than 200K IOPS (HDD or SSD), perhaps as high as 640K? So is there some limitation in the VNX storage processor (SP), perhaps the inter-SP communication? or a limitation of write-cache which requires write to memory in both SP?
Below (I suppose) is the architecture of the new VNX2. (Perhaps VNX2 will come out in May with EMC World?) In addition to transitioning from Intel Xeon 5600 (Westmere) to E5-2600 series (Sandy Bridge EP), the diagram indicates that the new VNX2 will be dual-processor (socket) instead of single socket on the entire line of the original VNX. Considering that the 5500 and up are not entry systems, this was disappointing.
VNX2 provides 5X increase in IOPS to 1M and 2.3X in IO bandwidth to 28GB/s. LSI mentions a FastPath option that dramatically increases IOPS capability of their RAID controllers from 80K to 140-150K IOPS. My understanding is that this is done by completely disabling the cache on the RAID controller. The resources to implement caching for large array of HDDs can actually impede IOPS performance, hence caching is even more degrading on an array of SSDs.
The bandwidth objective is also interesting. The 12GB/s IO bandwidth of the original VNX would require 15-16 FC ports at 8Gbps (700-800MBps per port) on the front-end. The VNX 7500 has a maximum of 32 FC ports, implying 8 quad-port FC HBAs, 4 per SP.
The 8 back-end SAS busses implies 4 dual-port SAS HBAs per SP? as each SAS bus requires 1 SAS port to each SP? This implies 8 HBAs per SP? Intel Xeon 5600 processor connects over QPI to a 5220 IOH with 32 PCI-E gen 2 lanes, supporting 4 x8 and 1x4 slots, plus a 1x4 Gen1 for other functions.
In addition, a link is needed for inter-SP communication. If one x8 PCI-E gen2 slot is used for this, then write bandwidth would be limited to 3.2GB/s (per SP?).
A single socket should only be able to drive 1 IOH even though it is possible to connect 2.
Perhaps the VNX 7500 is dual-socket?
An increase to 28GB/s could require 40 x8Gbps FC ports (if 700MB/s is the practical limit of 1 port). A 2-socket Xeon E5-2600 should be able to handle this easily, with 4 memory channels and
5 x8 PCI-E gen3 slots per socket.
The future VNX is cited as 5M IOPS and 112GB/s.
I assume this might involve the new NVM-express driver architecture
supporting distributed queues and high parallelism.
Perhaps both VNX2 and VNX Future are described is that the basic platform is ready
but not all the components to support the full bandwidth?
The 5M IOPS should be no problem with an array of SSDs, and the new NVM express architecture of course. But the 112GB/s bandwidth is curious. The number of FC ports, even at a future 16Gbit/s is too large to be practical. When the expensive storage systems will finally be able to do serious IO bandwidth, it will also be time to ditch FC and FCOE. Perhaps the VNX Future will support infini-band?
The puprose of having extreme IO bandwidth capability is to be able to deliver all of it to a single database server on demand, not a little dribblet here and there. If not, then the database server should have its own storage system.
The bandwidth is also too high for even a dual-socket E5-2600. Each Xeon E5-2600 has 40 PCI-E gen3 lanes, enough for 5 x8 slots. The nominal bandwidth per PCIe G3 lane is 1GB/s, but the realizable bandwidth might be only 800MB/s per lane, or 6.4GB/s. A socket system in theory could drive 64GB/s. The storage system is comprised of 2 SP, each SP being a 2-socket E5-2600 system.
To support 112GB/s each SP must be able to simultaneously move 56GB/s on storage and 56GB/s on the host-side ports for a total of 112GB/s per SP.
In addition, suppose the 112GB/s bandwidth for read, and that the write bandwidth is 56GB/s.
Then it is also necessary to support 56GB/s over the inter-SP link to guarantee write-cache
coherency (unless it has been decided that write caching flash on the SP is stupid).
Is it possible the VNX Future has more than 2 SP's? Perhaps each SP is a 2-socket E5-4600 system, but the 2 SPs are linked via QPI? Basically this would be a 4-socket system, but running as 2 separate nodes, each node having its own OS image. Or that it is a 4-socket system?
Later this year, Intel should be releasing an Ivy Bridge-EX, which might have more bandwidth?
Personally I am inclined to prefer a multi-SP system over a 4-socket SP.
Never mind, I think Haswell-EP will have 64 PCIe gen4 lanes at 16GT/s. The is 2GB/s per lane raw, and 1.6GB/s per lane net, 12.8GB/s per x8 slot and 100GB/s per socket.
I still think it would be a good trick if one SP could communicate with the other over QPI, instead of PCIe. Write caching SSD at the SP level is probably stupid if the flash controller is already doing this? Perhaps the SP memory should be used for SSD metadata?
In any case, there should be coordination between what each component does.
It is good to know that EMC is finally getting serious about IO bandwidth.
I was of the opinion that the reason Oracle got into the storage business was that they were
tired of hearing complaints from customers resulting from bad IO performance on the multi-million dollar
My concern is that the SAN vendor field engineers have been so thoroughly indoctrinated
in the SaaS concept that only capacity matters while having zero knowledge of bandwidth,
that they are not be able to properly implement the IO bandwidth capability
of the existing VNX, not to mention the even higher bandwidth in VNX2 and Future.
Updates will be kept on