THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Michael Zilberstein

Scalability features I would like to have in SQL Server

1. Active-Active cluster.

Like RAC in Oracle. Maximum that commodity server can provide by now is 4 quad-core CPU machine. If application requires more, we can split the database (which is not always possible; and even when it is - such a solution often looks very artificial) or buy high-end server with more CPUs and much higher price line. The only thing that keeps SQL Server alive in the market for applications with high scalability requirements is totally unreasonable (IMHO) price that Oracle charges for RAC. It costs so much that even high-end machine is cheaper. For 4 quad-core CPU machine you'll pay 100K$ for SQL Server 2005 Enterprise Edition license (25K$ per license per CPU) and 240K$ for Oracle 11g Enterprise Edition (40K$ per license per every 2 cores). If you 2 machines - for Microsoft the price is doubled. For Oracle - if you want them in RAC - double the price for licensing plus add 20K$ addition for RAC per every pair of cores. Which gives us 200K$ vs 960K$ for two identical 4 quad-core CPU machines. Price is the only real answer Microsoft actually has to RAC - all the rest is pretty pathetic. Judge by yourself - here is the link to the whitepaper (PDF) on Microsoft site that compares SQL Server with RAC. And what will happen if tomorrow Oracle reduces price for RAC by 50%? BTW, the prices for Oracle are list prices but they're willing to negotiate. So the gap isn't that big.

2. Indexes per partition

SQL Server 2005 gave partial answer (or workaround - to be honest) - indexed views. But with huge limitation - since index on view couldn't be aligned with partition schema, sliding window scenario was impossible (index on view had to be dropped in order to perform partition switch or merge). In SQL Server 2008 the situation is better - now indexed view can be aligned with partition schema and support switch and merge operations on partitions. Nevertheless, indexed view is pretty costly when compared to the normal index. It also requires unique index which is not always possible.

Another part of the answer in SQL Server 2008 are filtered indexes. They should be cheaper then indexed view and can be non-unique. But filtered indexes can't be created on computed columns (Microsoft's answer to Oracle's function-based indexes) or UDT columns. Close but still not there.

3. Bitmap indexes and function-based indexes

Those aren't pure scalability features but they can be very useful when ability to analyze, apply filters and run queries against large read-only partitions is required (this requirement is one of the two main reasons to keep huge amounts of historical data - second reason is legislation and accountability requirements).

I'm sure, there're other features I forgot to mention but lack of the features in the above list causes me to curse more often then I would like to.

Published Wednesday, April 23, 2008 8:03 PM by Michael Zilberstein
Filed under:



jchang said:

conceptually RAC is neat, but getting a real app to benefit from RAC is not a simple matter. The degree of difficulty is aggravated by the remote node memory access latency (and CPU cost) relative to local node. I heard a goal of 1 micro-sec, and at the time it was actually several micro-sec. And this is not going to happen on Gig E. Most people think they do not need infini-band for RAC, but suprise suprise.

If you consider that most applications will barely scale if at all on NUMA system (where the remote node memory access relative to local is 3 to 1, how bad will be when its 30:1). It is possible to get an app to scale on NUMA, but substantial re-architecture will most likely be required. Think about it, if you are not aware of which specific SQL operations scale well on a NUMA platform, what it is the chance that you will accidentally force all the right operations?

yes, RAC is a vast improvement over OPS, which Oracle happily sold for vast sums, which almost no one benefited from, and many suffered on its short falls. I do think it is great of Oracle to push the technology. But in the current stage, it really for hardware vendors to study, until will get hardware and OS support for super low remote node memory latency

April 23, 2008 8:49 PM

Jason said:

Info on MatrixDB leaked. Anyone else have any details? Check out

Filtered indexes takes indexed views a step further but I think they have the same partitioning limitations.

As for as function based indexes,how about indexing persisted computed columns?

April 24, 2008 8:07 PM

Michael Zilberstein said:

Hello Jason,

Wow, looks like Microsoft at last started to think in the right direction. Interesting though, how much time will it take before the RTM. BTW, immediately after SQL2005 had been released, I've heard rumors from inside Microsoft that SQL2008 would contain active-active cluster feature. I don't know whether they planned it and dropped at a later stage or those were just rumors but evidently SQL2008 contains nothing of a sort.

Partitioning limitation - indexed views in SQL2008 support operations on partitions. If you perform merge or split, index on view would undergo the same merg/split automatically. There're other limitations. You mentioned indexing persistant computed columns as analog of funtion-based indexes in Oracle. But if you create column as computed, filtered index can't be created on it. And also you can't add column to a single partition - you;ve to do it for the entire table including the active part i.e. it implies performance penalty (bigger row length - less rows per page etc.).

April 25, 2008 9:22 AM

Michael Zilberstein said:

Hello Joe,

Thanks for you response - via it I've found you articles on Somehow missed them in the last years. I've been thinking about trying NUMA in one of the projects and din't find much real-world info about it. So I'm reading your articles now - specifically "High Call Volume SQL Server Applications on NUMA Systems".

Regarding your response - of course, I'm not anticipating 100% performance improvement after adding second machine on RAC. Oracle guys claim that it'll give 80% improvement. You also say: "It is possible to get an app to scale on NUMA, but substantial re-architecture will most likely be required." - do you have any material on this subject? Besides your article and several blog posts by Slava Oks, I've seen almost nothing about soft and hard NUMA.

April 25, 2008 9:39 AM

jchang said:

I would be extremely skeptical of any RAC scaling assessments, or scaling assessments period. Every single time some one told me something, on further probing, there were always serious flaws in their assessment. One person claimed his Power6 box gets 98% scaling. But he was pegging 2 CPUs, overall CPU was 25%. So how could he be getting high 90 scaling? turns out he was citing a benchmark, not his own app. In a carefully designed app, making high use of locality, its possible to get good scaling. It will most likely never happen with an app designed without prior knowledge of NUMA or RAC systems. And the app must be designed around the key asymmetric characteristics. There is no app I know that was desgined from the beginning to run on NUMA or RAC.

I have some unpublished material, but most customers I supported with expensive NUMA systems would rather hide the fact that they getting no scaling benefits over a standard 4-socket. Meaning they don't want the boss to know. lets see, a $400K box, 16 SQL EE licenses

April 25, 2008 11:37 AM

Michael Zilberstein said:

I understand that key factor in RAC scalability is minimization of remote node memory access. Example is data storage where any row in the database has equal chance to be requested. In such a case - if we're talking about terabytes or at least hundreds of gigabytes of data - cache hit ratio will be near zero (hitting row that remained in cache since previous request will be a matter of pure luck), all the data would go straight from storage to the relevant node, so almost no remote node access will be required. In this case RAC will provide more CPU power and enable to build system on commodity hardware instead of buying single monster like HP SuperDome.

On hardware NUMA the idea (as I understand it) should be - divide different tasks between nodes in a way every task runs on as little nodes as possible (ideally - task per node). It can be done via ports assignment. That's not the case in my application so I was thinking about soft NUMA in order to gain additional IO and lazy writer threads.

April 25, 2008 4:08 PM
New Comments to this post are disabled
Privacy Statement