THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Peter DeBetta's SQL Programming Blog

Peter DeBetta's blog about programming in SQL Server using technologies such as T-SQL, .NET, CLR, C#, VB, Visual Studio, and SQL Server Management Studio.

T-SQL UrlEncode

I was perusing the newsgroups when I came across a request in the microsoft.public.sqlserver.clr newsgroup to see if there was a way to use System.Web.HttpUtility.UrlEncode in T-SQL. I know using the CLR is an option, but I decided to write a T-SQL equivalent. I reflected the code to figure out the exact rules for UrlEncoding. Then, I discovered much to my delight, a new system function in SQL Server 2005 that converts a string to its hexadecimal equivalent as a string– sys.fn_varbintohexstr. I knew there was an extended stored procedure in prior versions of SQL Server to do this conversion, but this function made writing this UrlEncode function even easier.

CREATE FUNCTION dbo.UrlEncode(@url NVARCHAR(1024))
RETURNS NVARCHAR(3072)
AS
BEGIN
    DECLARE
@count INT, @c NCHAR(1), @i INT, @urlReturn NVARCHAR(3072)
    
SET @count = LEN(@url)
    
SET @i = 1
    
SET @urlReturn = ''    
    
WHILE (@i <= @count)
    
BEGIN
        SET
@c = SUBSTRING(@url, @i, 1)
        
IF @c LIKE N'[A-Za-z0-9()''*\-._!~]' COLLATE Latin1_General_BIN ESCAPE N'\' COLLATE Latin1_General_BIN
        
BEGIN
            SET
@urlReturn = @urlReturn + @c
        
END
        ELSE
         BEGIN
            SET
@urlReturn =
                  
@urlReturn + '%'
                  
+ SUBSTRING(sys.fn_varbintohexstr(CAST(@c AS VARBINARY(MAX))),3,2)
                   +
ISNULL(NULLIF(SUBSTRING(sys.fn_varbintohexstr(CAST(@c AS VARBINARY(MAX))),5,2), '00'), '')
        
END
        SET
@i = @i +1
    
END
    RETURN
@urlReturn
END
GO

This function works in a very similar fashion to the UrlEncodeBytesToBytesInternal found in the .NET Framework (in System.Web.HttpUtility) except that the function I wrote does not convert the URL to a byte array. If you have the latest version of Reflector, you can view the reflected code for UrlEncodeBytesToBytesInternal by going to this URL:

code://System.Web:2.0.0.0:b03f5f7f11d50a3a/System.Web.HttpUtility/UrlEncodeBytesToBytesInternal(Byte%5b%5d,Int32,Int32,Boolean):Byte%5b%5d

And speaking of Reflector – if you have the latest version, you can copy a URL link to a method, class, property, and so on by select the item and pressing Ctrl-Alt-C (which is how I got the URL to lookup the UrlEncodeBytesToBytesInternal method in Reflector).

[Edit - I've made some modifications which should address everyone's very valid points!]

Published Friday, March 09, 2007 12:08 PM by Peter W. DeBetta
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Phil Factor said:

/*

Could one use a number 'helper' table to avoid the use of iteration? Here is an alternative method which can also be used in SQL Server 2000 (use 8000 instead of MAX)

*/

create FUNCTION ufsURLEncoded

(

@String varchar(max)

)

RETURNS varchar(max)

BEGIN

Declare @URLEncodedString varchar(max)

select @URLEncodedString=''

select @URLEncodedString=@URLEncodedString+

case when theChar LIKE '[A-Za-z0-9()''*-._!]'

then theChar

else '%'

+ substring ('0123456789ABCDEF',

(ascii(theChar) / 16)+1,1)

+  substring ('0123456789ABCDEF',

(ascii(theChar) % 16)+1,1)

end

from

(

select [theChar]=substring(@string,number,1)

from numbers

where number <= LEN(@String) ) Characterarray

-- Return the result of the function

RETURN @URLEncodedString

END

/* on a largeish string of 6000 characters it outperforms your function by at least 100%, usually a lot more. On the downside, it needs a NUMBERS table, which surely everyone uses anyway. here is code to put it in if it isn't there....

Create procedure spMaybeBuildNumberTable

as

BEGIN

IF NOT EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[Numbers]') AND OBJECTPROPERTY(id, N'IsUserTable') = 1)

BEGIN

CREATE TABLE [dbo].[Numbers](

[number] [int],

CONSTRAINT [Index_Numbers] PRIMARY KEY CLUSTERED

(

[number] ASC

) ON [PRIMARY]

) ON [PRIMARY]

Declare @ii int

Select @ii=10000

SET NOCOUNT ON

while (@ii>0)

BEGIN

INSERT INTO NUMBERS(NUMBER) SELECT @II

SELECT @II=@II-1

END

end

end

March 12, 2007 5:23 AM
 

Paul Tiemann said:

I discovered a small difference between the way each version (Peter's and Phil's) works and the way that System.Web.HttpUtility.UrlEncode works. The .NET function converts spaces to +. Here is a modified version of Phi's that does that. It is also modified as Phil suggested for SQL Server 2000. I also discovered by making the same change (MAX to 8000) Peter's function works fine on SQL 2000.

Our client also requested that all URL's be lower case so my version does that as well. Thanks for the great post, it saved me a lot of time.

/* Note: this function has been modified to lower case each character that is not encoded

by changing theChar to LOWER(theChar) */

ALTER   FUNCTION dbo.fn_URLEncodeLC

(@String varchar(8000))

RETURNS varchar(8000)

BEGIN

DECLARE @URLEncodedString varchar(8000)

SELECT @URLEncodedString=''

SELECT @URLEncodedString=@URLEncodedString+

CASE WHEN

theChar LIKE '[A-Za-z0-9()''*-._!]'

THEN LOWER(theChar)

       WHEN

               theChar = ' '

       THEN '+'

ELSE '%' + substring ('0123456789ABCDEF', (ascii(theChar) / 16)+1,1) +  substring('0123456789ABCDEF', (ascii(theChar) % 16)+1,1)

END

FROM

(SELECT [theChar]=substring(@string,number,1)

FROM [master].[dbo].[NUMBERS]

WHERE number <= LEN(@String) ) Characterarray

-- Return the result of the function

RETURN @URLEncodedString

END

August 3, 2007 10:25 AM
 

hilla said:

the regex needs to be '[-A-Za-z0-9()''*._! ]'

the " - " char has to be first to apply.

February 4, 2008 5:04 AM
 

Peter W. DeBetta said:

Hilla, I checked and '[A-Za-z0-9()''*-._! ]' worked fine and no different than '[-A-Za-z0-9()''*._! ]'. Do you have an example where my version fails. Perhaps you are using a different collation than I am? I'm using SQL_Latin1_General_CP1_CI_AS.

February 4, 2008 2:26 PM
 

smoovbcaltex said:

I think hilla may be on to something.

The original pattern seems to consider *-. to mean "characters between * and ." which includes '+' and ','.  I noticed this because when I tested out the function, my plus and comma characters weren't being encoded.  Moving the dash to the beginning or the end of the pattern, as hilla suggested, seems to fix this problem.

February 19, 2008 8:07 PM
 

snoochybooches said:

the "-" in the LIKE clause needs to be escaped. And add "~".

'[A-Za-z0-9()''*\-._!~]' ESCAPE '\'

May 28, 2008 8:21 PM
 

Ben said:

This would be great, but unfortunately these URLEncode methods don't appear to be complete.

If I want to URL-encode the string 'Valkeapää', for example, I believe it should return 'Valkeap%C3%A4%C3%A4', but it doesn't encode the accented characters at all.

Does anyone have a workaround for this?

March 10, 2009 12:48 PM
 

SYD said:

I couldn't add System.Web to clr function so i pulled out code from microsoft's implementation to be sure I'd always get the same results. Could probably refactor this logic into a sql function.

using System;

using System.Data;

using System.Web;

using System.Data.SqlClient;

using System.Data.SqlTypes;

using Microsoft.SqlServer.Server;

public partial class UserDefinedFunctions

{

   [Microsoft.SqlServer.Server.SqlFunction]

   public static SqlString UrlEncode(string url)

   {

       byte[] bytes = System.Text.Encoding.UTF8.GetBytes(url);

       return System.Text.Encoding.ASCII.GetString(UrlEncodeBytesToBytesInternal(bytes, 0, bytes.Length, false));

   }

   private static byte[] UrlEncodeBytesToBytesInternal(byte[] bytes, int offset, int count, bool alwaysCreateReturnValue)

   {

       int num = 0;

       int num2 = 0;

       for (int i = 0; i < count; i++)

       {

           char ch = (char)bytes[offset + i];

           if (ch == ' ')

           {

               num++;

           }

           else if (!IsSafe(ch))

           {

               num2++;

           }

       }

       if ((!alwaysCreateReturnValue && (num == 0)) && (num2 == 0))

       {

           return bytes;

       }

       byte[] buffer = new byte[count + (num2 * 2)];

       int num4 = 0;

       for (int j = 0; j < count; j++)

       {

           byte num6 = bytes[offset + j];

           char ch2 = (char)num6;

           if (IsSafe(ch2))

           {

               buffer[num4++] = num6;

           }

           else if (ch2 == ' ')

           {

               buffer[num4++] = 0x2b;

           }

           else

           {

               buffer[num4++] = 0x25;

               buffer[num4++] = (byte)IntToHex((num6 >> 4) & 15);

               buffer[num4++] = (byte)IntToHex(num6 & 15);

           }

       }

       return buffer;

   }

   internal static bool IsSafe(char ch)

   {

       if ((((ch >= 'a') && (ch <= 'z')) || ((ch >= 'A') && (ch <= 'Z'))) || ((ch >= '0') && (ch <= '9')))

       {

           return true;

       }

       switch (ch)

       {

           case '\'':

           case '(':

           case ')':

           case '*':

           case '-':

           case '.':

           case '_':

           case '!':

               return true;

       }

       return false;

   }

   internal static char IntToHex(int n)

   {

       if (n <= 9)

       {

           return (char)(n + 0x30);

       }

       return (char)((n - 10) + 0x61);

   }

};

May 24, 2009 6:29 PM
 

Chris Ludwig said:

/*

I went looking for a function like this, and after seeing the approaches used, I remembered a technique that I used when I needed to encode huge blocks of binary data into BASE64 data... let SQL Server do the work for you by using the FOR XML clause in a SELECT.  Here's my version, which I have to believe is about as fast as it could possibly be (especially for large blocks of text) since all the work is done for you by SQL Server.  I don't have access to a SQL Server 2000 instance, but I believe this would work there too (again, by changing VARCHAR(MAX) to VARCHAR(8000) ) since FOR XML was introduced in 2000.  I feel a little dirty relying on the specific output and chopping text off the beginning and end, but I believe it's a sound strategy as changing the output format of FOR XML RAW would likely cause many other compatibility issues for MS.  That dirty feeling goes away when you see how wicked fast it is.

*/

CREATE FUNCTION [dbo].[ENC_URL] ( @string VARCHAR(MAX) )

RETURNS VARCHAR(MAX)

BEGIN

   DECLARE @encodedText VARCHAR(MAX)

   DECLARE @table TABLE ( textToBeEncoded VARCHAR(MAX) )

   INSERT  INTO @table ( textToBeEncoded ) VALUES ( @string )

   SET @encodedText = ( SELECT textToBeEncoded FROM @table FOR XML RAW ) -- Thank you SQL Server for encoding this for me. :-)

   SET @encodedText = LEFT(@encodedText, LEN(@encodedText) - 3) -- Chop off "/>

   SET @encodedText = RIGHT(@encodedText, LEN(@encodedText) - 22) -- Chop off <row textToBeEncoded="

-- If you still want convert spaces to '+'.  I don't, so I comment this line out.

-- SET @encodedText = REPLACE(@encodedText, ' ', '+')

   RETURN @encodedText

END

GO

July 8, 2010 12:38 PM
 

anon said:

January 10, 2011 1:26 PM
 

Altaf said:

Please modify the regular expression to take "-" into account properly. Moved "-" to the end of the regular expression:

IF @c LIKE '[A-Za-z0-9()''*._! -]'

March 9, 2011 12:49 PM
 

Nataraj said:

Hi Peter, Blog for URLencoding is very useful and wonderful. Thanks a lot.

January 23, 2012 12:16 AM

Leave a Comment

(required) 
(required) 
Submit

About Peter W. DeBetta

Peter DeBetta works for Microsoft as an Application Platform Technology Strategist, specializing in strategies, design, implementation, and deployment of Microsoft SQL Server and .NET solutions. Peter writes courseware, articles, and books – most recently the title Introducing SQL Server 2008 from Microsoft Press. Peter speaks at conferences around the world, including TechEd, SQL PASS Community Summit, DevTeach, SQL Connections, DevWeek, and VSLive!

When Peter isn’t working, you can find him singing and playing guitar (click here to hear an original song by Peter), taking pictures, or simply enjoying life with his wife, son, and daughter.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement