THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Hash Partition as a Design Method

SQL Server 2005 does not support hash partition as a product feature. But that doesn't mean you can't apply the concept of hash partition in your application or database design.

Recently, I worked on an application that used SQL Server to persist the customized ASP.NET session state information, similar to the SQL Server mode for ASP.NET session state management. The table schema and the stored procedures were similar to those found in the Microsoft KB article 311209. The most salient points to note are:

  • The session state information was stored in a single table,
  • Every row was associated with a GUID session state id, and
  • Every stored procedure had a session state id parameter

In studying the scalability and capacity of the session state database, I found that under stress load the database transaction throughput was not constrained by CPU, memory, disk I/Os, or network. But rather, the transaction throughput was limited by contentions inside SQL Server. More precisely, almost all the SQL Server resource waits were found to be on page latches. Therefore, reducing these page-level contentions should lead to higher transaction throughput.

One potential solution was to split the single table into multiple tables. There were several different ways to split the table. One approach was to split the table into multiple tables of the identical structure in the same database.

For the load test, I took a different, but simpler, route to split the table into multiple identically-structured databases. As a matter of fact, I created 20 clones of the original database. These 20 databases differed only in their names. I kept the original database name, but removed all the user tables, and replaced all the stored procedures that were directly called by the application with their corresponding wrapper stored procedures. The only function of a wrapper stored procedure was to route the calls from the application to the 20 newly created databases, which were not directly visible to the application.

So how would the incoming stored procedure calls be routed? Hash partition was the answer.

Basically, since each call contained a GUID session state id, we could map that GUID to a number corresponding to one of the 20 databases, and route the call to that database. The following is a sample procedure for routing a call to store procedure ExecCommand:

CREATE PROCEDURE dbo.ExecCommand
    @id         varchar(32),
    @param1     varchar(128),  
    { other parameters }
AS
   SET NOCOUNT ON
   DECLARE @hash int
   SET @hash = (abs(checksum(@id)) % 20) + 1
 
   IF (@hash = 1)
      EXECUTE sessionDB1.dbo.ExecCommand
                  @id,
                  @param1,
                  { other parameters go here }
   ELSE
   IF (@hash = 2)
      EXECUTE sessionDB2.dbo.ExecCommand
                  @id,
                  @param1,
                  { other parameters go here }
   ELSE
 
      ...  -- other @hash values
 
   ELSE
   IF (@hash = 20)
      EXECUTE sessionDB20.dbo.ExecCommand
                  @id,
                  @param1,
                  { other parameters go here }
   ELSE
      -- Should never reach here
      RAISERROR('Invalid code path for session id %s and hash %d', 17, -1, @id, @hash)
RETURN

The hash function in the above code snippet is (abs(checksum(@id)) % 20) + 1. This function maps a GUID to an integer between 1 and 20, inclusive, and the mapping is relatively evenly distributed.

So how effective was this hash partition strategy? I ran the same stress load test after the hash partition was applied with everything else being the same. The following chart shows the performance difference:

With the original single-table design, the chart shows that after 50 simulated users the database was stressed out, and the transaction throughput actually started to drop significantly as even more users were added. However, when the table was hash partitioned into 20 databases, it took more than 200 simulated users to saturate the databases. For 100 users, the transaction throughput improved by about 50%, and for 200 users, the throughput improved by about 65%.

Not bad for a simple backend change that does not require any change on the application side!

Published Monday, January 22, 2007 10:35 PM by Linchi Shea

Attachment(s): HashPartition.gif

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Alan Williams said:

A Nice solution, but you could possibly speed it up by removing the if.. else statements.

if you can get the sessionDB##.dbo.ExecCommand statement to execute from a parameter you could build up the value from contacanation.

Let me know if it works

alan.williams@talk21.com

June 1, 2007 4:38 PM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement