As organizations start to deploy various data grids into their application layers, they may start to experience what I’d call a SQL storm—a massive number of simple SQL requests that are issued over a large number of concurrent database connections at a high frequency.
This is not about SQL viruses or denial of service attacks. But the end result is similar. On the SQL Server machine, the CPU usage is being pegged to 100% for a sustained time period, and the applications that share the SQL Server instance complain about slow performance or query/connection timeouts. Upon further examination of the SQL Server instance, you typically find that a large number of connections coming from a group of application servers, and on each connection you see the application server trying to issue simple SQL statements in either ad hoc or proc calls against the SQL Server instance at a high frequency.
This is not an inherent problem of a data grid (such as those based on Oracle Coherence). But if not managed properly, a data grid has the capability and potential to cripple a SQL Server instance (or for that matter any DBMS).
Typically, these data grids cache frequently used data on the application servers, close to where the data is being used. A grid enables the app to automatically and dynamically partition the data in memory across the app servers, and help manage their transactional consistency and data integrity.
One of the issues any data grid must grapple with is how to ensure changes to data in memory across the app servers are persisted in the database. This is where things may not be managed properly and SQL storms are seen against the database.
Now, these changes can be reflected on the database backend in a number of different ways. The simplest approach is to leverage the normal grid calls to the database to perform your typical OLTP inserts, updates and deletes. Often, a grid performs these inserts, updates, and deletes in a rather simple and uniform fashion over all its connections, and this works well when changes to the cached data only trickle in.
Sometimes, business rules may dictate that a large amount of grid-cached data be updated at a given point in time. The grid can usually handle it without any issue because all the updates are memory only operations and are happening on multiple app servers. But as you can imagine, if all the app servers in the grid start to update the database backend all at once with the changes they see in their own memory, it can cause a significant load on the database server. Furthermore, note that these grids are often architected in such a way that each app server maintains multiple connections to the database server. If the grid keeps all the connections busy with tight loops in trying to sync’ing up the database with its cache as quickly as possible, it may cause the database server to be so busy that it ends up creating a significant CPU bottleneck on the database server, prolonging rather than shortening the synchronization.
If you have data grids running in your environments, watch out for SQL storms from these grids, and ask your application development folks to channel these massive updates through a more efficient batch process against your database backend. Avoid performing batch updates in an OLTP fashion.