Long ago, almost 20 years, there was legitimate concern on whether servers had sufficient capability to drive transaction processing requirements for large businesses.
Then for a long period, perhaps from the late 1990’s to 2012, the better choice for most mid to large businesses was to simply buy a 4-way (socket) system.
Larger systems were based on a non-uniform memory access (NUMA) architecture that had unusual characteristics and vendors provided almost no information on how to work around issues that could cripple such systems.
Towards the end of the second era of server sizing, processors (at the socket level) became ridiculously powerful such that properly tuned systems ran at very low overall CPU utilization.
However, 2-socket systems in this period did not have adequate memory or IO capability, unless it was a Xeon 7500 or later E7 series system with 2 sockets populated.
And even then, the structure of SQL Server per processor licensing meant that many people opted for the 4-way system regardless of requirements.
In 2012, Intel finally put powerful processors (Sandy Bridge, up to 8-cores) with larger memory and IO capability into the Xeon E5 line (there were both 2 and 4 socket versions). This coincided with SQL Server 2012 per core licensing. Now there was a stronger technical justification and financial incentive to not automatically opt for a 4-socket system with the most number of cores.
It would seem reasonable to make some effort to first determine the total number of processor cores that will meet requirements with some headroom.
Then determine the number processor sockets as more processor sockets means more memory sockets (also considering the difference in DIMM sockets per processor between Intel Xeon E5 and E7 processors).
It is unclear whether anyone actually engaged in the technical analysis versus just making the decision strictly on budget goals.
Today, it is time to consider the astonishing next step, that a single socket system is the best choice for a transaction processing systems.
First, with proper database architecture and tuning, 12 or so physical cores should be more than sufficient for a very large majority of requirements.
We should also factor in that the second generation hyper-threading (two logical processors per physical core) from Nehalem on has almost linear scaling in transactional workloads (heavy on index seeks involving few rows). This is very different from the first generation HT in Willamette and Northwood which was problematic, and the improved first generation in Prescott which was somewhat better in the positive aspects, and had fewer negatives.
A single socket Intel Xeon E5 processor supports 12 DIMM sockets for 384GB with 32GB DIMMs, or 768GB with 64GB DIMMs which is now at a moderate (50%) premium per GB over the 32GB DIMM. In the old days, DBA’s were taught that memory was precious in reducing disk IO to a tolerable level. That was back when a total system memory of 1GB was considered very large.
I noticed that after common server memory configuration reached 32GB memory, disk IO could be brought to a very manageable level, well under the IOPS possible with 100-200 15K HDDs.
Today, all-flash storage is practical.
Definitely in direct-attach storage, and maybe in SAN based storage
depending on the degree of price gouging from your SAN vendor.
So while it might be possible to show that TB-plus memory reduces IOPS from 10,000 to nil, the reality is a modern storage system correctly configured for database applications can easily support 100,000 IOPS if not 1M (which would require NVMe to be practical).
In this case, we should find that memory requirements are far lower than the memory capacity of recent generation 2 and 4-socket systems.
Why not continue to spectacularly over-configured the server system when even with SQL Server per core licensing is only a small portion of the costs in deploying a critical line-of-business application? The big bosses may not question the costs of a 2-socket system or perhaps 4-socket system even factoring in that there may be several environments between production, DR, QA and development. But if performance is inadequate, it may cost you your job.
The answer is performance. Today, transaction processing performance is heavily dependent on serialized round-trip memory access latency. The term is pointer chasing in that a CPU instruction accesses memory to determine the next memory location to access. No amount of L2/L3 cache can hide this, nor is it possible to pre-fetch memory. (What Intel needs to do is increase Hyper-threading beyond two logical processors per physical core. IBM Power is 4 and SPARC is 8?).
For a processor with integrated memory controllers (AMD from Opteron on, Intel from Nehalem on), local node memory access round-trip latency is probably 50ns. For memory on an adjacent processor, a 1-hop remote node, it is perhaps 100ns. This information is absolutely crucial in understanding system level performance for transactional databases, and yet vendors try to not mention it, instead stressing metrics that are mostly irrelevant.
In a single socket system, 100% of memory accesses are local node, because there is only 1 node. In a two socket system, the expectation is that memory accesses are split 50-50 between local and remote node. In principle, it is possible to devise some strategy such that a higher percentage is local, but SQL Server does not make it practical to implement such a scheme. In the Intel Xeon E7 4-socket systems, 1 out of 4 memory nodes is local, and there are 3 memory nodes 1-hop away. In an 8-socket system, 4 nodes are 2-hop away.
An argument in favor of multi-socket systems is that the number of memory channels scales with the number of sockets. However, in transaction processing, it is memory round-trip latency the dominates, not memory bandwidth. (it is a factor that more memory channels means there can be more memory transactions, but it is unclear as to whether this offsets the increase in memory latency.)
If someone has reasonably comparable systems with 1, 2 and 4 processor sockets, all of the same processor generation, the impact of the number of sockets can be tested.
I am also of the opinion that SQL Server cannot make meaningful use of all the processor cores in a top-of-the-line 2-socket (36-cores for Xeon E5 v3), let alone a 4-socket (72-cores) due to most write operations data and log being single threaded. But I would like here opinions on this. Is this addressed in SQL Server 2016 to a degree?
The above applies mostly to transaction processing applications.
It is reasonable to suppose that decision support system application could use both the greater number of processors cores and the greater number of memory channels in a multi-socket systems. However it is unclear as to whether SQL Server is effective in implementing parallel execution plans when the degree of parallelism is greater than the number of cores in one socket.
I would think that the strategy of partitioning and alignment of compute to local (system) node memory from (formerly) PDW applied to a single SQL Server instance, treating each processor socket as though it were a node, which it is, could dramatically improved very DOP parallel execution plans. But it does not seem that Microsoft is interested in this approach.
I am presuming that column store is better capable of utilizing multi-socket, as it is definitely not encumbered by serialized round-trip memory latency.
The Dell PowerEdge R730 can be purchased with 1 or 2 of 2 sockets populated. The R930 must have 2 or 4 of 4 sockets populated. It appears that the premium from 32GB to 64GB memory modules might be because the 32 is RDIMM and the 64GB is LRDIMM. The 2400MT/s 32GB RDIMM is $558, and the 64GB LRIMM is $1483.
For this differential, I would stay single socket, opting for the more expensive 64GB (if necessary) over 2 sockets using the less expensive 32GB, even though I am giving up the 4 extra memory channels.
If the plan is to implement memory-optimized tables in 2016 (which supports more than 256GB? and has fewer other restrictions) and more than 768GB system memory is needed, then go ahead with 2-sockets.
But I would try to get my transaction processing DB under the single socket limit. In the old days, we moved the old out to the DW system.