THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

SQL storms from your data grids

As organizations start to deploy various data grids into their application layers, they may start to experience what I’d call a SQL storm—a massive number of simple SQL requests that are issued over a large number of concurrent database connections at a high frequency.


This is not about SQL viruses or denial of service attacks. But the end result is similar. On the SQL Server machine, the CPU usage is being pegged to 100% for a sustained time period, and the applications that share the SQL Server instance complain about slow performance or query/connection timeouts. Upon further examination of the SQL Server instance, you typically find that a large number of connections coming from a group of application servers, and on each connection you see the application server trying to issue simple SQL statements in either ad hoc or proc calls against the SQL Server instance at a high frequency.


This is not an inherent problem of a data grid (such as those based on Oracle Coherence). But if not managed properly, a data grid has the capability and potential to cripple a SQL Server instance (or for that matter any DBMS).


Typically, these data grids cache frequently used data on the application servers, close to where the data is being used. A grid enables the app to automatically and dynamically partition the data in memory across the app servers, and help manage their transactional consistency and data integrity.


One of the issues any data grid must grapple with is how to ensure changes to data in memory across the app servers are persisted in the database. This is where things may not be managed properly and SQL storms are seen against the database.


Now, these changes can be reflected on the database backend in a number of different ways. The simplest approach is to leverage the normal grid calls to the database to perform your typical OLTP inserts, updates and deletes. Often, a grid performs these inserts, updates, and deletes in a rather simple and uniform fashion over all its connections, and this works well when changes to the cached data only trickle in.


Sometimes, business rules may dictate that a large amount of grid-cached data be updated at a given point in time. The grid can usually handle it without any issue because all the updates are memory only operations and are happening on multiple app servers. But as you can imagine, if all the app servers in the grid start to update the database backend all at once with the changes they see in their own memory, it can cause a significant load on the database server. Furthermore, note that these grids are often architected in such a way that each app server maintains multiple connections to the database server. If the grid keeps all the connections busy with tight loops in trying to sync’ing up the database with its cache as quickly as possible, it may cause the database server to be so busy that it ends up creating a significant CPU bottleneck on the database server, prolonging rather than shortening the synchronization.


If you have data grids running in your environments, watch out for SQL storms from these grids, and ask your application development folks to channel these massive updates through a more efficient batch process against your database backend. Avoid performing batch updates in an OLTP fashion.


Published Monday, June 29, 2009 2:41 PM by Linchi Shea



dugi said:

Nice explanation about the SQL Storm, Linchi! Yes, this should consider as critical point during the Application Solutions with Database in this case SQL Server!



July 1, 2009 7:23 AM

Chris Compton said:

As a developer and DBA, I can appreciate this as I have seen these issues arise in the past. However, I believe that specifically the grid controls currently available in the Microsoft .NET platform leave alot to be desired in regards to technology. The freedom of implementation (lots of coding required), lack of performance minded developers, ORMs, and stateless web applications gets us there in a hurry.

One example of this that comes to mind is the old Powersoft/Sybase datawindow control. This is 15+ year old client server technology, but on enterprise level systems you rarely encountered the SQL Storm. The rows and columns kept state and dynamically generated what needed to happen. The control handled the implementation, not the developer.

Not sure why we are going backwards.



July 1, 2009 8:34 AM

Kevin Boles said:

One thing that may well help out here is FORCED PARAMETERIZATION.  It is made for just such a situation.

July 5, 2009 10:38 AM

Linchi Shea said:


The probem here is the massive number of calls to the database from multiple servers (i.e. grid nodes) in a short time span. These calls may already be fully opimized individually. At least, the ones I have seen are all stored procedure calls, and they are rather simple and are optimized if you look at these stored procedures individually. They drive the database server crazy because of their sheer number and intensity.

July 5, 2009 3:06 PM

Linchi Shea said:


I should clarify that by data grid, I don't mean the two-dimensional grid control used on a Windows GUI form or a .NET DataGrid. Rather, I'm referring to data grids as in grid computing (aka high performance computing), typically a clustered cache distributed across multiple app servers. See this Wikipedia page for a short description; At this point, I believe the most widely used data grid product is Coherence by Oracle. Microsoft is also working on a data grid project code named Velocity. I'm not sure about its release schedule.

These data grid servers or distributed caching servers have the great potential to seriously hammer your database!

July 7, 2009 12:41 AM

Joe said:

Thanks Linchi,

What are the wait types you see when this happens?

Ive seen something similar happen to me, and the wait types

were usually CMEMTHREAD. Is there much that can be done

on the db side when the developers dont respond?

July 9, 2009 4:14 PM

Linchi Shea said:

Mostly waiting on schedulers -- SOS_SCHEDULER_YIELD. The bottleneck is on CPU. But it could be on any resource depending on the nature the SQL storm.

July 10, 2009 5:00 PM
New Comments to this post are disabled

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog


Privacy Statement