THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Michael Coles: Sergeant SQL

SQL Server development, news and information from the front lines

Let's Hash a BLOB

In my last post I talked about how to work around a couple of the limitations of SQL Server encryption by using SQL CLR and the .NET Framework to encrypt a BLOB value (up to 2.1 GB in size), using any supported algorithm you choose. In the example I used AES to encrypt data using a passphrase. SQL Server 2008 also allows you to generate cryptographic hashes of binary data using the MD2, MD4, MD5 or SHA-1 hash algorithms. The HashBytes function provides this functionality, as shown in the sample below:

SELECT HashBytes(N'SHA1', 'This is a test');

The result is a 160-bit binary hash value:

0xA54D88E06612D820BC3BE72877C74F257B561B19

The MD2, MD4 and MD5 algorithms all produce 128-bit binary hash values. There's a problem: the MDn-series of 128-bit hashes are currently considered insecure by cryptographers, and it's not advisable to use them for cryptographically secure applications. SHA-1 is considered more secure, but it's hash value length of 160 bits is considered relatively small by today's standards. The National Institute of Standards and Technology has already taken steps to require all federal agencies to stop using SHA-1 in 2010. Instead they are requiring federal agencies to use SHA-2 family of algorithms, which includes SHA-256, SHA-384 and SHA-512, which generate more cryptographically secure hash values that are 256, 384 and 512 bits in length.

SQL Server's HashBytes function doesn't give you the ability to access these more secure hash functions, but they are available through the .NET Framework and SQL CLR. The SQL CLR GetHash function below mirrors the functionality of the built-in HashBytes function. This function accepts an Algorithm name, which can be SHA256, SHA384, or SHA512. You also need to pass the Plaintext value to be hashed to the function. While HashBytes is limited to an 8,000 byte plaintext value (it truncates everything past 8,000 bytes), the GetHash function accepts a varbinary(max) up to 2.1 GB in size.

[Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None)]

public static SqlBytes GetHash

(

  SqlString Algorithm,

  [SqlFacet(MaxSize = -1)] SqlBytes Plaintext

)

{

  if (Algorithm.IsNull || Plaintext.IsNull)

    return SqlBytes.Null;

  bool HashDefined = true;

  HashAlgorithm Hash = null;

  switch (Algorithm.Value.ToUpper())

  {

    case "SHA256":

      Hash = new SHA256Managed();

      break;

 

    case "SHA384":

      Hash = new SHA384Managed();

      break;

 

    case "SHA512":

      Hash = new SHA512Managed();

      break;

 

    default:

      HashDefined = false;

      break;

  }

  if (!HashDefined)

    throw new Exception("Unsupported hash algorithm - use SHA256, SHA384 or SHA512");

  byte[] HashBytes = Hash.ComputeHash(Plaintext.Value);

  // Convert result to a SqlBytes value

  return new SqlBytes(HashBytes);

}

As you can see, accessing .NET Framework hash functionality through the SQL CLR is very simple. In a future post I'll talk about making your hash values even more secure to protect against so-called "rainbow table" attacks.

Published Sunday, April 12, 2009 5:50 PM by Mike C

Comments

 

Paul White said:

Hi - thanks for the article, it's an interesting read.

The only problem I have encountered in the past with this sort of CLR solution is that CLR functions that take or return a MAX datatype force a fully serial plan.  'Ordinary' CLR functions which don't do data access are fully parallelizable.

On an application I worked on (before SQL2K8) encryption and compression were required on-the-fly at INSERT or UPDATE time.  The CLR solution just tanked performance-wise due to the serial plan.  Not just a serial 'island' by the way - the whole plan, just like a T-SQL UDF!

In the event, we hacked around it by breaking up the MAX datatype in a trigger - splitting it into VARCHAR(8000) chunks in the database at INSERT time.  A (parallelizable) CLR compression/encryption function could then be used on the chunks whenever access was required by a complex query for example.

As far as I know, the serial nature of CLR functions with MAX parameters ([MaxSize=-1]) is still present in patched-to-date SQL2K5 and SQL2K8.  I don't re-test after every CU or hotfix, so if this is vital to anyone, please test it now!

Cheers

Paul

April 16, 2009 6:17 AM
 

Jack said:

Hash a BLOG, a strange idea! Thanks for sharing!

April 18, 2009 4:16 AM
 

Mike C said:

If you're relying on parallelization the method you mention (breaking the data into 8,000 byte chunks and hashing individually) definitely makes sense.  The advantage is that you can potentially do this in an all-SQL (no CLR) solution (but you would lose access to the SHA-2 family of hash functions since SHA-1 and the MD_ series are the only natively supported hash algorithms).  Thanks for the comments!

April 26, 2009 9:05 AM
New Comments to this post are disabled

This Blog

Syndication

News

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement