THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

  • Performance impact: stored procedures, SQL batches, and CPU usage

    What is the simplest way to drive a CPU to 100% using T-SQL? If it’s a language like C++ or C#, the simplest way is to create a tight loop and perform some work that uses CPU cycles inside the loop. For instance, the following tight loop in C# can peg a CPU to 100%:

     

          while (true)

            Math.Sin(2.5);

     

    Note that on a multi-processor machine, this code may end up being executed on more than one processor, and you may not see any particular processor being pegged at 100%. But the cumulative effect is equivalent to keeping a single processor busy.

     

    What about T-SQL? Does the following T-SQL code keep a CPU 100% busy?

     

    set nocount on

    go

    declare @a float

    while(1=1)

    begin

              select @a = cos(2.5);

    end

     

    If you try the above SQL batch as is in SQL Server Management Studio from a remote client, it’s unlikely that it would peg a CPU on the server side at 100%. When I ran the above code, it kept a single CPU at about 20~30%. So apparently, the loop in this little piece of T-SQL code is not tight enough to keep a CPU fully occupied.

     

    Note To help see the CPU usage clearly, for all the T-SQL tests in this post I explicitly set the affinity mask option on the SQL Server instance to use only a single processor.

     

    Is there a way to execute a tight loop in T-SQL? Yes, all you need to do is to put the above code inside a stored procedure:

     

    create proc p_test

    as

    set nocount on

    declare @a float

    while(1=1)

    begin

              select @a = cos(2.5);

         end                                                 

     

    If you execute proc p_test, you would see a CPU being pegged at 100%.

     

    What is the difference between executing a loop in a SQL batch and inside a stored procedure for them to exhibit such dramatically different CPU usage?

     

    The difference is that while executing the loop in a batch, SQL Server has to yield numerous times to chat with the client, whereas when it’s executing the loop inside a stored procedure, it doesn’t. [Updated. Note that as Andy Kelly pointed out, this was when SET NOCOUNT ON was in effect. All bets are off if you don't do SET NOCOUNT ON.]

     

    An example should make this abundantly clear. Let’s modify the loop a bit to iterate a finite number of times (say 10,000,000 times):

     

    The batch:

    set nocount on

    go

    declare @a float,

            @i int = 1

    while(@i<=10000000)

    begin

      select @a = cos(2.5);

      select @i += 1;

    end

     

    The stored procedure:

    create proc p_test

    as

    set nocount on

    declare @a float,

              @i int = 1

    while(@i<=10000000)

    begin

      select @a = cos(2.5);

      select @i += 1;

    end

     

    In my test environment, the batch finished in about 33 seconds, whereas the stored procedure finished in about 4 seconds. The stored procedure was about eight times faster!

     

    It was most revealing to look at the client statistics. With the batch, the client sent one TDS packet to the server, but received 151,623 TDS packets from the server. In contrast, with the stored procedure, the client sent one TDS packet and received one TDS packets.

     

    It’s dramatically chattier to execute the batch than the stored procedure. In fact, it’s so chatty that it is unable to keep the CPU busy doing useful work.

  • Performance impact: implicit type conversions

    Often you see an implicit type conversion or a type mismatch causing a performance issue when it trips up the SQL Server query optimizer. A while back though, I ran into a case where a harmless looking type mismatch caused significant performance degradation when there was no bad plan involved.

     

    So we had a rather straightforward transactional replication setup for a table that included about 30 varchar columns. And when there was a burst of inserts into the table on the publisher, the distribution agent could not keep up with the traffic and a significant latency ensued. Upon investigation, we found that the inserts were distributed to the subscriber via a stored procedure, which was not unusual in itself because that is the default setting when you configure transactional replication using the SQL Server Management Studio (SSMS) GUI interface. The stored procedure could not be simpler:

     

    CREATE PROC sp_MSins_MyTable

      @c1 nvarchar(40),

      @c2 nvarchar(30),

      ...

      @c30 nvarchar(30)

    AS

    INSERT MyTable(c1, c2, ..., c30)

    SELECT @c1, @c, ..., @c30

     

    This procedure was generated by SSMS when the published article was configured to call a procedure instead of using the INSERT statement for distributing inserts.

     

    What’s odd was that even though the table columns were varchar types, the generated stored procedure above defined the input parameters as nvarchar, thus implicit type conversions must be performed during the proc execution. Not sure about why, but that is how SSMS generates these replication wrapper procs.

     

    Since the type conversion was not involved in any where clause or join condition, there was no issue with a terribly inefficient query plan.

     

    Still, the insert latency at the subscriber was huge and there was no other resource issue that might have helped explaining the poor insert throughput on the subscriber.

     

    Thinking that all these type conversion efforts in each insert might have contributed to the slowdown, we replaced all the nvarchar’s with varchar’s in stored procedure. Voila! The insert performance increased dramatically and the distribution latency disappeared quickly.

     

    All was happy and we moved on.

     

    Yesterday, however, we ran into the same problem because upon recreating the replication over the weekend, we had inadvertently replaced our custom modified stored procedure with an SSMS-generated default proc as shown above, and it was using nvachar’s instead of varchar’s. Upon hearing the distribution latency, we promptly put in the proc with the varchar types and the problem went away almost immediately.

     

    That got me wondering about the exact performance impact of these nvarchar/varchar type conversions. So I ran some tests outside the replication setup with two stored procedures, one with nvarchar for all the input parameters and the other with varchar for all the input parameters (note that the table columns are all varchar’s).

     

    CREATE PROC p_insert_nvarchar

      @c1 nvarchar(40),

      @c2 nvarchar(30),

      ...

      @c30 nvarchar(30)

    AS

    INSERT MyTable(c1, c2, ..., c30)

    SELECT @c1, @c, ..., @c30

     

    CREATE PROC p_insert_varchar

      @c1 varchar(40),

      @c2 varchar(30),

      ...

      @c30 varchar(30)

    AS

    INSERT MyTable(c1, c2, ..., c30)

    SELECT @c1, @c, ..., @c30

     

    I put them through a loop to insert 100,000 rows into a truncated MyTable. To my surprise, I did not see much of a difference between the two procs.

     

    At this point, I’m at a loss as to why changing from nvarchar to varchar had such a huge performance impact for the distribution agent, but had almost no impact in my controlled tests.

     

    Thus the search for the explanation continues, and I’ll certainly report back when it’s found. By the way, if you know the reason, please post a comment here.

     

    However, it is clear from this experience that a simple nvarchar/varchar type mismatch can have a huge performance impact even when it does not involve any foul query plan. And if you are experiencing latency on the subscriber side (not on the distributor), this is one more thing you may want to check.

  • Why is it good to be comfortable with multi-threading?

    If you are an application developer, you should find a new job if you can't do multu-threaded programming. What if you are a DBA? Probably not, not to the same extent anyway.

    Some in our community clearly realize the importance of being able to do things in parallel. Adam Machanic, for instance, has put in a lot of efforts in this area and is trying hard to spread the message on parallelism.

    Unfortunately, the community in general does not seem to be as convinced. Perhaps when you've become accustomed to finding workarounds to drive nails with a wrench, you may not realize that hammers are a much better tool for that task.

     

    Although doing parallelism does not necessarily mean that every DBA should become conversant in multi-threaded programming, I’d argue that it’s a good skill to have, and once you are comfortable with it, you’ll find plenty of opportunity to fruitfully apply it.

     

    Here is a little anecdotal evidence from my recent experience.

     

    So I needed to automate the removal and creation of a lot of replication setups, and for that I fully automated the generation of the replication delete and create scripts. I also automated the execution of these scripts. However, one potential issue with executing these scripts is that since these are kind of DDL scripts, and as you may know, any DDL change can easily get blocked. Thus, I had to automate the monitoring of blocking. Furthermore, I needed to automatically remove any blocking, if the execution ofthe a replication script got blocked.

     

    Now, this handling of blocking needed to be granular and precise in that I needed to clear a spid only if it’s blocking the execution of a replication delete or create script. The solution I ended up with is to have the code that controlled the replication script execution spawn a monitoring thread for each of the servers (i.e. the publishers and subscribers), and as the main control code cycling through these servers to apply the replication scripts, it goes through the following logic:

     

    • The main control code connects to the server on which a replication script is to be executed, and retrieves the spid of its own connection. It passes this spid to the monitoring thread so that the monitoring thread can focus on that connection only, and will not be bothered by any other activities that may be going on the same server.
    • The monitoring thread connects to the same server and waits for the go-ahead.
    • The main control code signals the monitoring thread to start monitoring.
    • The main control code executes the replication script.
    • The main control code signals the monitoring thread to stop monitoring and wait for a new replication script to be executed.

    The actual logic is slightly more complex because when we execute the replication create/delete script on a publisher, the publisher will make a connection to the subscriber and/or the distributor, we don't want those connections be blocked either. The monitoring thread must take care of that.

     

    Although most or many of the DBA work appear to be conducive to single-threaded automation, tasks like this are not. If you think using multi-threading for this task is an overkill or there is a better way, l’d appreciate it if you eave a comment.

     

  • Performance impact: Try/Catch blocks in SQLCLR – a follow up

    My previous post showed a simple test that appears to suggest that you may experience significant performance degradation if multiple users are calling the same SQLCLR function at the same time and they are all catching a lot of exceptions.

     

    However, it’s not clear whether that behavior is limited to SQLCLR or applies to .NET in general. To see if I would run into similar behavior, I wrote a simple C# program for a quick test. To simulate the concurrent exception-handling load, the test program spawns 10 background threads, each calling the following method nonstop (the complete program is listed at the end of this post):

     

        static void TryCatch()

        {

            int c = 0;

     

            for(int i = 0; i < 200000; i++)

            {

                try { c= i/c; } catch (Exception) { c = 1;}

            }

        }

     

    Note that variable c is assigned value 0, thus forcing a divide-by-zero exception, and this exception is handled 200,000 times in a loop.

     

    The elapsed time of the method is measured both when there is no additional background threads and when 10 additional threads are running.

     

    On my old two-core 2GB PC workstation, the following output from the program is typical among many runs:

     

    C:\junk>test2.exe

    TryCatch() without any background thread = 15617968 ticks

    TryCatch() without any background thread = 15559616 ticks

    TryCatch() without any background thread = 15566064 ticks

    TryCatch() without any background thread = 17472496 ticks

    TryCatch() without any background thread = 15782952 ticks

    Thread 0 Called TryCatch() 1000 times.

    Thread 1 Called TryCatch() 1000 times.

    Thread 2 Called TryCatch() 1000 times.

    Thread 3 Called TryCatch() 1000 times.

    Thread 4 Called TryCatch() 1000 times.

    Thread 5 Called TryCatch() 1000 times.

    Thread 6 Called TryCatch() 1000 times.

    Thread 7 Called TryCatch() 1000 times.

    Thread 8 Called TryCatch() 1000 times.

    Thread 9 Called TryCatch() 1000 times.

    TryCatch() with 10 background threads = 17498336 ticks

    TryCatch() with 10 background threads = 17535984 ticks

    TryCatch() with 10 background threads = 17664424 ticks

    TryCatch() with 10 background threads = 17515200 ticks

    TryCatch() with 10 background threads = 17465312 ticks

    TryCatch() with 10 background threads = 17498432 ticks

    TryCatch() with 10 background threads = 17508656 ticks

    TryCatch() with 10 background threads = 17710856 ticks

    ^C

     

    At least for this test, the adverse concurrency impact that we saw with SQLCLR--and reported in the previous post--is not observed.

     

    Although it’s not strictly an apple-to-apple comparison between this test without SQLCLR and that described in the previous post with SQLCLR, the end user experience is so different that it calls into question why SQLCLR does not seem to handle many concurrent exceptions as gracefully. I don’t have an answer.

     

    I have absolutely no knowledge of how SQLCLR works internally, and can’t explain the concurrency behavior observed in the previous post.

     

    By the way, when I set variable c to 1 in the TryCatch() method, thus avoiding the exception, its concurrency impact (or the lack of) did not change much, if at all.

     

    Anyway, here is the test program. For the output presented above, the program was compiled with .NET Framework 3.5.

     

    using System;

    using System.Diagnostics;

    using System.Threading;

     

    public partial class Test

    {

     

        public static void Main()

        {

            Stopwatch stop_watch = new Stopwatch();

     

            // warming up a bit

            for(int i = 0; i < 5; i++)

            {

                TryCatch();

            }

     

       // measure the elaped time without any additional background threads

            for(int i = 0; i < 5; i++)

            {

                stop_watch.Reset();

                stop_watch.Start();

                TryCatch();

                stop_watch.Stop();

                Console.WriteLine("TryCatch() without any background thread = {0} ticks",

                                    stop_watch.ElapsedTicks);

     

                Thread.Sleep(2000);

            }

     

            Thread[] user_threads = new Thread[10];

     

            for(int i = 0; i < 10; i++)

            {

                user_threads[i] = new Thread(new ThreadStart(StartTryCatch));

                user_threads[i].Name = i.ToString();

                user_threads[i].IsBackground = true;

                user_threads[i].Start();

                Thread.Sleep(10);

            }

     

            Thread.Sleep(5000);

     

            // now measure the elaped time again with 10 additiona threads running

            for(int i = 0; i < 20; i++)

            {

                stop_watch.Reset();

                stop_watch.Start();

                TryCatch();

                stop_watch.Stop();

                Console.WriteLine("TryCatch() with 10 background threads = {0} ticks",

                                    stop_watch.ElapsedTicks);

     

                Thread.Sleep(2000);

            }

     

            // this will never be reached. You have to Cltr-C to stop the program

            for (int i = 0; i < 10; i++)

            {

                if (user_threads[i] != null)

                {

                    user_threads[i].Join();

                }

            }

        }

     

        static void StartTryCatch()

        {

            int i = 0;

     

            while (true)

            {

                if (i % 1000 == 0)

                {

                    Console.WriteLine("Thread {0} Called TryCatch() 1000 times.", Thread.CurrentThread.Name);

                }

     

                TryCatch();

                Thread.Sleep(10);

     

                i++;

            }

     

        }

     

        static void TryCatch()

        {

            int c = 0;

     

            for(int i = 0; i < 200000; i++)

            {

                try { c= i/c; } catch (Exception) { c = 1;}

            }

        }

    };

     

  • Performance impact: too many try/catch blocks may be harmful in SQLCLR

    If you have many try/catch blocks in your .NET code and your code execution actually passes through them, you should expect a performance hit. That’s intuitive and no surprise.

     

    What is surprising is the extent to which you may experience severe performance degradation when multiple users are executing a piece of SQLCLR code (e.g. calling a SQLCLR table valued function) that contains many try/catch blocks. In other words, there may significant and adverse concurrency implication if you use too many try/catch blocks in your SQLCLR code.

     

    Here is an example to prove it!

     

    I have implemented two SQLCLR table-valued functions: tvf_GetDataWithTryCatch() and tvf_GetDataWithoutTryCatch(). These two functions return exactly the same resultset: 2000 rows with 100 columns of integers. The functions simulate the case where the integer values are out of bound (i.e. greater than the maximum value allowed for the integer datatype, which is SqlInt32.MaxValue in the attached C# code), and when that happens, the functions return NULL instead.

     

    There are many ways to check whether a value is equal to or greater than SqlInt32.MaxValue. As the name implies, tvf_GetDataWithTryCatch() implements the check in a try block and returns NULL in the corresponding catch block. The tvf_GetDataWithoutTryCatch() function, on the other hand, implements the check with a C# conditional operator (?:).

     

    When there is no background concurrent load

     

    The tvf_GetDataWith TryCatch() function is expected to be slower than the tvf_GetDataWithoutTryCatch() function because of the inherent overhead with a try/catch block, and tests show that is indeed the case. On my rather antiquated PC with 2GB of physical memory, the following query

     

    select * from dbo.[tvf_GetDataWithTryCatch]()

     

    returns all the 2000 rows in about 8 seconds, whereas this query

     

    select * from [dbo].[tvf_GetDataWithoutTryCatch]()

     

    returns in about  1 second. Consistently, the function doing 200,000 try/catch’es is about eight times slower than the function doing 200,000 conditional operations. So far so good! No surprises!

     

    When there is background concurrent load

     

    When I have 10 users running tvf_GetDataWithoutTryCatch() nonstop against the same SQL Server 2008 instance in the background, the tvf_GetDataWithoutTryCatch() returns in about 2 seconds consistently. So running additional load of tvf_GetDataWithoutTryCatch() does slow it down a bit, but not a lot.

     

    Now, when I have 10 users running tvf_GetDataWithTryCatch() nonstop in the background against the same SQL Server 2008 instance, the performance of tvf_GetDataWithTryCatch() degrades dramatically. This time it takes about 76 seconds to return all the 2000 rows. That is about 10 times slower than it is when there is no concurrent load. And it is about 38 times slower than tvf_GetDataWithoutTryCatch() under the same 10-user background concurrent load. Note that tvf_GetDataWithoutTryCatch() returns in about 2 seconds regardless whether the 10-user background load is calling tvf_GetDataWithTryCatch() or calling tvf_GetDataWithoutTryCatch().

     

    If you think 38 times slower is bad, in real production environments I have seen far worse concrrency impact as a result of too many try/catch blocks.

     

    With the attached C# code, you can run the tests yourself to see the behavior. Your results may differ in some details, but you should see a similar pattern. If not, post back a comment and let us know.

     

  • Performance impact: SqlBulkCopy.WriteToServer() and interesting space allocation behavior

    After reading Adam Machanic’s comment to my previous post, I started to wonder if what I saw was all due to the fact that SQL Server blindly chooses to allocate at least one extent in response to each call to WriteToServer(), perhaps in its zeal to achieve minimal logging.

     

    Such an algorithm sounds a bit crazy, or even dumb. But anything is possible.

     

    To check if this is indeed the case, I wrote a simple test program below with most of the parameters hard coded. The only exception is the payload, i.e. the number of rows that WriteToServer() copies to the server. BTW, the target database needs to be in either bulk_logged or simple recovery mode.

     

    using System;

    using System.Collections;

    using System.IO;

    using System.Data;

    using System.Data.SqlClient;

     

    class Loader

    {

        static void Main(string[] args)

        {

            int payload = 0;

            try

            {

                payload = int.Parse(args[0]);

            }

            catch

            {

                Console.WriteLine("***Err: The argument must be an integer.");

                Environment.Exit(-1);

            }

     

            int row_counter = 0;

            int write_2_server_counter = 0;

            int extent_counter = 0;

           

            using (SqlConnection bcp_conn = new SqlConnection("server=myServer;database=myDB;Integrated Security=SSPI;"),

                                     conn = new SqlConnection("server=myServer;database=myDB;Integrated Security=SSPI;"))

            {

                try

                {

                    bcp_conn.Open();

                    conn.Open();

                   

                    try

                    {

                        SqlCommand cmd = new SqlCommand();

                        cmd.Connection = conn;

                        cmd.CommandType = CommandType.Text;

                        cmd.CommandText = "if object_id('dbo.junk') is not NULL " +

                                              " drop table junk";                   

                        cmd.ExecuteNonQuery();

     

                        cmd.CommandText = "CREATE TABLE dbo.junk( " +

                                               " c_id int NOT NULL, " +

                                               " c_data char(950) NOT NULL)";

                        cmd.ExecuteNonQuery();

                    }

                    catch(Exception e)

                    {

                        Console.WriteLine(e.ToString());

                            Environment.Exit(-1);

                    }

                   

                    

                    DataTable dt = new DataTable();

                    dt.MinimumCapacity = 20000;

     

                    DataColumn c_id = new DataColumn();

                    c_id.DataType = System.Type.GetType("System.Int32");

                    c_id.ColumnName = "c_id";

                    dt.Columns.Add(c_id);

     

                    DataColumn c_data = new DataColumn();

                    c_data.DataType = System.Type.GetType("System.String");

                    c_data.ColumnName = "c_data";

                    dt.Columns.Add(c_data);

     

                    using (SqlBulkCopy bcp = new SqlBulkCopy(bcp_conn, SqlBulkCopyOptions.TableLock, null))

                    {

                        bcp.DestinationTableName = "dbo.junk";

                        bcp.BatchSize = 20000;

                        bcp.BulkCopyTimeout = 30000000;

     

                        for (int i = 1; i <= 1000; i++)

                        {

                            row_counter++;

                            DataRow row = dt.NewRow();

     

                            row["c_id"] = i;

                            row["c_data"] = "12345667890abcdefgh";

                            dt.Rows.Add(row);

     

                            if (row_counter % payload == 0)

                            {

                                bcp.WriteToServer(dt);

                                write_2_server_counter++;

                                extent_counter = GetExtentCount(conn);

                                Console.WriteLine("{0},{1}", write_2_server_counter, extent_counter);

                                dt.Rows.Clear();

                            }

     

                        }

                        if (dt.Rows.Count > 0)

                        {

                            bcp.WriteToServer(dt);

                            dt.Rows.Clear();

                        }

                    }  // using

     

                }

                catch(Exception e)

                {

                    Console.WriteLine(e.ToString());

                    Environment.Exit(-1);

                }

            }

        }  // Main()

     

        static int GetExtentCount(SqlConnection conn)

        {

            int extent_count = 0;

            try

            {

                SqlCommand cmd = new SqlCommand();

                cmd.Connection = conn;

                cmd.CommandType = CommandType.Text;

                cmd.CommandText = "select cast(u.total_pages/8 as int) " +

                            " from sys.partitions p join sys.allocation_units u " +

                                  " on p.partition_id = u.container_id " +

                                  " where p.object_id = object_id('junk')";

                extent_count = (int) cmd.ExecuteScalar();

            }

            catch(Exception e)

            {

                Console.WriteLine(e.ToString());

                    Environment.Exit(-1);

            }

       

            return extent_count;

        }

       

    }  // class Loader

     

    If you run this program (and I have run this against a SQL Server 2008 instance), you’ll find that as long as the payload parameter value you supply on the command line is 64 or less, SQL Server will allocate a new extent for every call to WriteToServer() no matter how little data is being written to the server and no matter how much free space is already allocated to the table

     

    This behavior is utterly striking!!!

     

    If the payload parameter value is 65, the amount of data no longer fits on one extent, and understandably SQL Server needs to allocate—and does allocate--two extents for each WriteToServer().

     

    I’m all for reducing the amount of transaction logging whenever appropriate. Furthermore, if this space cost is the result of my explicitly configuring something to favor minimal logging, I’m willing to live with the consequence. But in this test nowhere did I say that I wanted minimal logging at any cost.

     

    One may argue that this is an edge case because people are not expected to--or should not--call WriteToServer() with a small payload. But regardless, at least the potential space implication of doing so should have been clearly made known so that people can explicitly protect their code from falling into it. And in the software business, it’s often the edge cases that trip people up.

  • Performance impact: how to waste time and space with SqlBulkCopy.WriteToServer()?

    If you have some time to kill, you could try to load a tiny amount of data, say ~3GB or 3,000,000 rows, with SqlBulkCopy, but do remember to give the WriteToServer method a very small payload every time it is called. The test results reported in my previous post suggest that once you have started loading, you might as well go for a lunch break.

     

    It turns out wasting space is just as easy. In fact, wasting a great amount of space is made easy with calling SqlBulkCopy.WriteToServer with a small payload. All you need to do is direct that payload towards a heap table.

     

    How much space can you waste? Well, how about SQL Server allocating more than 90GB of the storage space in order to import about 3GB of data?

     

    I ran a series of data loading tests with SqlBulkCopy against the same customer table used in my previous post. Each test loaded 3,000,000 rows into the customer table without any index. The database was pre-allocated 200GB of space, and the customer table was the only user table. The test data was generated in a C# program and added to a DataTable object, which was then passed to the WriteToServer method to be copied to the customer table on the SQL Server 2008 instance. This data loader program can be configured to run the bulk importing either single threaded or in any number of threads concurrently into multiple non-overlapping ranges.

     

    The tests were first divided into two groups. The first group of tests was to load the data in a single thread, and the second group of tests in 10 threads. In each group, tests were run with different levels of payload. More specifically, I tested populating the DataTable object with the following number of rows before calling the WriteToServer method with that object:

     

    2, 5, 10, 20, 50, and 100

     

    Before presenting the results, let me establish the frame of reference with the results from calling WriteToServer with a payload of 2,000 rows:

     

    Payload
    (# of rows)

    Threads

    Duration (second)

    Data (GB)

    Reserved (GB)

    Unused (GB)

    2,000

    1

    289

    2.4

    3.8

    1.4

    2,000

    10

    94

    2.4

    3.8

    1.4

     

    Okay, there was a huge performance difference between loading the data in a single thread versus loading the data in 10 threads. But I’d like to turn your attention to the space usage. At the completion of loading the data, I looked at how much space was actually used by the table data, how much space was reserved by SQL Server for the table, and how much space was allocated to the table but not used.

     

    The above table shows that with payload = 2,000 rows, there was 1.4GB of unused space allocated to the table in both cases.

     

    Now take a look at the following table and pay attention to the highlighted cells:

     

    Payload
    (# of rows)

    Threads

    Duration (second)

    Data (GB)

    Reserved (GB)

    Unused (GB)

    2

    1

    10,228

    12

    96

    84

    5

    1

    4,074

    4.8

    38

    33.2

    10

    1

    2,206

    2.4

    19.2

    16.8

    20

    1

    1,267

    2.4

    9.6

    7.2

    50

    1

    574

    2.4

    3.8

    1.4

    100

    1

    385

    2.4

    3.8

    1.4

    2

    10

    1,998

    10

    87

    77

    5

    10

    872

    4.8

    38

    33.2

    10

    10

    441

    2.4

    19

    16.8

    20

    10

    240

    2.4

    9.6

    7.2

    50

    10

    135

    2.4

    3.8

    1.4

    100

    10

    105

    2.4

    3.8

    1.4

     

    The amount of space wasted was staggering when the payload was 2 rows or 5 rows.

     

    For instance, SQL Server had to allocate a whopping 96GB in order to load the 3,000,000 million rows or about 3GB worth of data into the customer table when WriteToServer had to copy 2 rows at a time. Good news is that this appears to be an anomaly only when the payload was extremely low. As shown in the above two tables, as soon as the payload level was greater than 50, SQL Server did not need to allocate more than 3.8GB of storage space.

     

    Number of threads made little difference in terms of how much space was wasted. 

     

    The extra space is quite easy to reclaim by creating a clustered index. The potential problem, however, is that you could run out of disk space during the bulk load if you didn't know about this behavior. In addition, you may not have planned to pre-allocate a lot of free space and bulk loading with a small payload can cause the database file to autogrow, thus further degrading the performance.

     

    So why would SQL Server waste so much space in the case of a very small payload for WriteToServer? I don’t know enough about the internals of the SQL Server space allocation algorithm and/or SqlBulkCopy to offer an answer. But it’s revealing to check how the pages were allocated and used in some of these outrageous cases. For instance, when the payload was 2 rows, each database extent only had a single used page on average, and in each used page only 17% was occupied with the customer table rows.

     

    The space usage picture was slightly better when the payload was 5 rows. While it remained the same that one page was used per extent, the average density of the page was higher at about 42%.

     

    When the payload was bumped up to 50, the used page count per extent increased to 5 and the page density was respectable at about 85%.

     

    No matter what may be the rationale for this behavior, SQL Server does seem to not handle space allocation efficiently when the WriteToServer payload is extremely small.

     

    The solution? Well, just avoid small payload when using SqlBulkCopy. In my own test environment, that means I don’t want the WriteToServer method to keep copying fewer than several hundred rows.

  • Performance impact: What is the optimal payload for SqlBulkCopy.WriteToServer()?

    For many years, I have been using a C# program to generate the TPC-C compliant data for testing. The program relies on the SqlBulkCopy class to load the data generated as per the TPC-C specifications into the SQL Server tables. In general, the performance of this C# data loader is satisfactory. Lately however, I found myself in a situation where I needed to generate a much larger amount of data than I typically do and the data needed to be loaded within a confined time frame. So I was driven to look into the code more carefully to see if it could be significantly sped up.

     

    Among other things, I became curious about the performance impact of the amount of data copied by the WriteToServer(DataTable) method. I’ll call this amount of data the WriteToServer payload in this post, or simply the payload.

     

    I was a bit lazy when I first coded the C# data loader, and no knowing the full implication, I set the payload to the same value I assigned to the BatchSize property. That proves to be terribly inconvenient when it comes to tuning the loader’s performance.

     

    By the way, in the Microsoft Windows SDK documentation, the descriptions for WriteToServer(DataTable)  and BatchSize are as follows:

     

    WriteToServer

    Copies all rows in the supplied DataTable to a destination table specified by the DestinationTableName property of the SqlBulkCopy object.

    BatchSize

    Number of rows in each batch. At the end of each batch, the rows in the batch are sent to the server. If UseExternalTransaction is not in effect, each batch of rows is inserted as a separate transaction.

     

    Going off the topic a bit, in the BatchSize description, it says that, “At the end of each batch, the rows in the batch are sent to the server” This is at least confusing. I could be wrong, but the setting of BatchSize alone will not cause the SqlBulkCopy object to initiate any action of sending any rows over the wire to the server.

     

    The question is how many rows should we have in the DataTable object before we send them over to the server with WriteToServer? In other words, how should we size the WriteToServer payload?

     

    The most optimal setting no doubt depends on the configuration of the environment. However, intuitively there is overhead in sending data across the wire, and you would not want the payload to overhead ratio to be too low. On the other hand, you probably don’t want to wait for an excessive amount of time just for the DataTable object to be populated with a very large payload before you send its content. So there should be a happy medium somewhere, we hope.

     

    Some tests should help give us a more precise idea

     

    I used the TPC-C customer table for my tests. The customer table was defined as follows:

     

    (

        c_id                int,

        c_d_id              tinyint,

        c_w_id              int,

        c_first             char(16),

        c_middle            char(2),

        c_last              char(16),

        c_street_1          char(20),

        c_street_2          char(20),

        c_city              char(20),

        c_state             char(2),

        c_zip               char(9),

        c_phone             char(16),

        c_since             datetime,

        c_credit            char(2),

        c_credit_lim        numeric(12,2),

        c_discount          numeric(4,4),

        c_balance           numeric(12,2),

        c_ytd_payment       numeric(12,2),

        c_payment_cnt       smallint,

        c_delivery_cnt      smallint,

        c_data              char(500)

    )

     

    For each of the tests, three million rows were loaded into the customer table without any index , and the following row numbers were tested for the WriteToServer method:

     

    2, 20, 100, 200, 2,000, 20,000, and 3,000,000

     

    Three million rows amounted to about 2.2GB of data after they were all loaded into the customer table. This amount of test data was small compared to the amount of data I needed to load, but was sufficient for the test purpose, and it helped reducing the test time. In addition, for all the tests, SqlBulkCopy.BatchSize was set to 20,000.

     

    The other test parameter considered was the number of threads in loading the data concurrently into non-overlapping ranges. I looked at loading the three million rows single threaded versus in 10 threads.

     

    The results are in the following table:

     

    Threads

    Payload

    (WriteToServer Row Number)

    Load Duration (second)

    1

    2

    >1440

    1

    20

    ~1440

    1

    100

    397

    1

    200

    300

    1

    2,000

    289

    1

    20,000

    278

    1

    3,000,000

    270

    10

    2

    1998

    10

    20

    238

    10

    100

    105

    10

    200

    90

    10

    2,000

    94

    10

    20,000

    100

    10

    3,000,000

    N/A

     

    So, at least for my test environment, it does not look like a good idea for WriteToServer to send 20 or fewer rows to the server each time. We should stock up the payload a bit more before each trip.

     

    The loader program was run on one of the nodes in a two-node cluster, and the SQL Server instance was on the other node. The network in between was 1Gbps. There was more than sufficient bandwidth and the latency was low. With extremely low payload (i.e. when the DataTable object only had 2~20 rows), the network bandwidth was barely used (4% or so).

     

    A salient point to note from the test result is that once the payload crossed a certain point, for instance 200 rows in these tests, the exact payload size no longer mattered much. For instance, it made little difference whether it was 2000 rows or 20,000 rows.

     

    Loading data concurrently did help to mask the penalty of a very low payload. For instance, when the data was loaded in a single thread, setting payload to 20 would cause the loader to take 1440 seconds to load the three million rows, whereas the same payload with 10 threads dramatically reduced the load time to 238 seconds.

     

    In the next post, I’ll look at a few other implications of the payload setting more closely.

  • Linked servers: set them up in QA

    How do you set up linked servers in your QA environment? You set them up the same as you do in the production environment, right? Why even ask?

    Correct. There is no difference in the actual steps of setting up a linked server between your QA and prod. However, consider the following typical code fragment involving a linked server:

     SELECT abc FROM NYCSQL01.inventory.dbo.my_db;

    where NYCSQL01 is a linked server pointing to a production SQL Server instance, most likely also named NYCSQL01. And let's further assume that you name all your production SQL instances with the NYC prefix and all your QA instances with a QA prefix. In this case, when you test the code that includes the above fragment, you have a choice, when setting up a linked server on your QA database server, between (1) naming the linked server QASQL01, and (2) naming the linked server NYCSQL01. In the 2nd choice, you need to make sure NYCSQL01 actually points to QASQL01(assuming that QASQL01 is the QA server for NYCSQL01). For obvious reasons, the 2nd choice is much more preferable. The most important reason is that you don't need to alter your code when testing it in QA.

    But it doesn't really stop there. If you want to simplify your life, you should make sure that any reference to NYCSQL01 in your QA environment actually points to QASQL01 by default instead of NYCSQL01 for at least two reasons. First, there is no business for any code/app in QA to reference a production instance such as NYCSQL01. Well, that's not 100% true. In the rare cases where you do need to reference NYCSQL01 as NYCSQL01, special arrangement should be made for that to happen (not by default). Second, linked servers are not the only things that may reference NYCSQL01. The app being tested in QA may have other components such as an SSIS package, a batch file, a connection from a piece of CLR code, and so on that may access NYCSQL01. And when the app is being tested in QA, you want those references all access QASQL01 instead without changing any of your code. One way to make this happen is to modify the hosts file in C:\Windows\system32\drivers\etc to map NYCSQL01 to the IP address of QASQL01.

    Is this how you set up your QA environment?

  • Performance impact: The cost of doing small lookups in a large batch update

    Lookup tables are widely used in database applications for good reasons. Usually, a lookup table has a small number of rows and looking it up with a join is fast, especially when the table is already cached.

    Recently, I needed to update every row in many relatively large tables, each of which was identically structured, had ~25 million rows, and was ~30GB in size. The tables were denormalized to include both a lookup index column (i.e. CategoryID, which was an integer) and the corresponding lookup value column (i.e. CategoryName, which was a char(50)). The batch update I was performing was to ensure that the CategoryName column of these tables had the correct matching the value. The CategoryID to CategoryName mapping was defined in a small lookup table, CategoryLookup, with 10 rows.

    Question

    What would be the most efficient method to perform this batch update?

    Three lookup methods

    For the batch update scenario described above, you have three alternatives to lookup the CategpryName values (assume that the table to be updated is called Transactions):

    • Method 1—The inline CASE method, which performs an inline lookup with a CASE expression in the UPDATE statement. Okay, this is not really a lookup. But this method provides a baseline for comparison.

    UPDATE Transactions

         SET CategoryName = CASE

                   when CategoryID = 1  then 'abc1'

                   when CategoryID = 2  then 'abc2'

                   when CategoryID = 3  then 'abc3'

                   when CategoryID = 4  then 'abc4'

                   when CategoryID = 5  then 'abc5'

                   when CategoryID = 6  then 'abc6'

                   when CategoryID = 7  then 'abc7'

                   when CategoryID = 8  then 'abc8'

                   when CategoryID = 9  then 'abc9'

                   when CategoryID = 10 then 'abc10'

                 END

     

    • Method 2—The JOIN method, which relies on joining the Transactions table and the CategoryLookup table to do the lookup.

      UPDATE t1

         SET t1.CategoryName = t2.CategoryName

        FROM Transactions t1 JOIN CategoryLookup t2

              ON t1.CategoryID = t2.CategoryID

     

    • Method 3—The subquery method, which does a lookup with a subquery. Clearly, there is a join in the subquery.

      UPDATE Transactions

         SET CategoryName =

              (SELECT CategoryLookup.CategoryName

                 FROM CategoryLookup

                WHERE CategoryLookup.CategoryID=Transactions.CategoryID ) 

     

    You can also do the lookup with a scalar function. But it’s so horrifically inefficient that you should not seriously consider it. It’s not interesting to include in this discussion. In addition, you could do the lookup with an inline table valued function, which has a similar performance profile as that of the inline CASE method.

    It should be highlighted that method 2 (the JOIN method) and method 3 (the Subquery method) are not semantically identical. For instance, if the Transactions table has a CategoryID value that is not present in the CategoryLookup table, the Subquery method will, if permitted, set the CategoryName column to NULL, or the update will fail if NULL is not permitted, whereas the JOIN method will leave the CategoryName value unchanged. For the scenario we are interested in, the results of these two methods are identical.  All the CategorID values in the Transactions table are also in the CategoryLookup table and the mapping from CategoryID to CategoryName in the CategoryLookup table is one to one.

    I ran a series of controlled tests that mimicked the update scenario described previously. To keep the tests more manageable, I used a smaller and artificially created Transactions table that had 5,000,000 rows and was ~5GB in size. You can find the DDLs and the test script at the bottom of this post.

    Test results and practical implications

    I made sure that the results shown below were stable in that (1) they were taken from 50 repeated tests with a small number of outliers thrown out, and (2) the remaining results were inspected and made sure that the variances were relatively small among them and the values exhibited a consistent pattern.

    Clearly, if you do a massive number of lookups (like what I did in this test), the cumulative cost can be quite visible. In fact, in this test using an inline CASE expression was more than twice as fast as lookups using either a subquery or a straight join.  As the number of rows increases, you can expect to see this difference (or the cost of doing lookups) grow more prominent. So, if you are doing a very large batch update, it’s definitely worth replacing the table lookups with an inline CASE expression for better performance.

    The difference between the CASE method and the table lookups (either the Subquery method or the JOIN method) remained stable across different test environments. But the difference between the Subquery method and the JOIN method was more subtle. In fact, if you run the same test in a different environment, you may see different relative performance between them. In some environments, the Subquery method can perform significantly better than the JOIN method.

    Although there was a significant performance penalty when using Subquery or JOIN lookups in a massive update, this does not mean you should jettison using lookups in your individual transactions. Because the marginal cost of doing an individual lookup is infinitesimally small compared to many other performance-related factors, you’d lose much more  in terms of code reuse, flexibility, and so on if you start to embed ‘lookups’ inline. To  emphasize, note that the difference between the CASE method and the Subquery method in the test was ~34 seconds. Divide 34 seconds by the 5,000,000 lookups the update did, we get 6.8 microseconds as the marginal cost of an individual lookup.

    There is no surprise that avoiding a massive number of table lookups could give you better performance. But it’s still good to be able to appreciate it with some concrete numbers. My update of all those 25-million-row tables mentioned at the beginning of this post took more than 10 hours to complete and I used the subquery method. Had I had the results reported here, I could have finished the same update process in five hours. That would have been a very nice saving!

    Test setup

    The lookup DDL and data:

    drop table CategoryLookup

    go

    create table CategoryLookup(CategoryID int, CategoryName char(20))              

    go

    with tmp(a, b) as (

       select 1, 'abc' + cast(1 as varchar(5))

       union all

       select a+1, 'abc' + cast(a+1 as varchar(5))

         from tmp

        where a < 10

    )

    insert CategoryLookup

    select * from tmp

    go

    create clustered index cix_CategoryLookup on CategoryLookup(CategoryID)

    go

    The Transactions test table DDL and data:

    drop table Transactions

    go

    create table Transactions(CategoryID int,

                      CategoryName char(50),

                      filler char(1000))

    go

    set nocount on

    go

    declare @i int

    set @i = 1

     

    begin tran

    while @i <= 5000000

    begin

         insert Transactions

         select @i % 10 + 1, 'abc', 'filler'

        

         if @i % 100000 = 0

         begin

             commit tran

             begin tran

         end

         set @i = @i + 1

    end

    if @@trancount > 0

       commit tran

    go

    sp_spaceused Transactions

    go

     

    create clustered index cix_Transactions on Transactions(CategoryID)

    go

      

    drop table test_log – this tale is used to log the test times

    go

    create table test_log (

        Name        varchar(50),

        Num         int,

        StartTime   datetime,

        EndTime     datetime NULL

    ) 

    go

     

    The test script:

    set nocount on

    go

    declare @dt datetime,

            @i int

     

    set @i = 1

     

    while @i < 20 -- run the test 20 times

    begin

       set @dt = getdate()

       insert test_log select 'CASE method', 10, @dt, NULL

     

       update Transactions

         set CategoryName = case

                   when CategoryID = 1  then 'abc1'

                   when CategoryID = 2  then 'abc2'

                   when CategoryID = 3  then 'abc3'

                   when CategoryID = 4  then 'abc4'

                   when CategoryID = 5  then 'abc5'

                   when CategoryID = 6  then 'abc6'

                   when CategoryID = 7  then 'abc7'

                   when CategoryID = 8  then 'abc8'

                   when CategoryID = 9  then 'abc9'

                   when CategoryID = 10 then 'abc10'

                 end

     

      update test_log

         set EndTime = getdate()

       where StartTime = @dt

     

     

      set @dt = getdate()

      insert test_log select 'Subquery method', 10, @dt, NULL

     

      update Transactions

         set CategoryName =

            (select CategoryLookup.CategoryName

              from CategoryLookup

             where CategoryLookup.CategoryID= Transactions.CategoryID ) 

     

      update test_log

         set EndTime = getdate()

       where StartTime = @dt

     

      set @dt = getdate()

      insert test_log select 'JOIN method', 10, @dt, NULL

     

      update t1

         set t1.CategoryName = t2.CategoryName

        from Transactions t1

             join CategoryLookup t2 on t1.CategoryID = t2.CategoryID

     

      update test_log

         set EndTime = getdate()

       where StartTime = @dt

     

      set @i = @i +1

    end

    The reported results were obtained on  a DL585 G1 with 64GB of RAM and eight 2.6GHz cores, running Windows Server Enterprise 2003 and SQL Server 2008 SP2 Enterprise x64 Edition. 50GB was allocated to the SQL Server instance.

     

  • There are no sequential or random disk I/Os

    Okay, it’s a bit of a hyperbole. But let me elaborate.

    I regularly bump into SQL Server discussions, online or elsewhere, in which I hear people speaking with confidence that an operation is so and so because it is doing sequential disk I/Os or because it is doing random disk I/Os. That always makes me wonder how they know and how they can be so confident about that. In most cases, I absolutely have no idea whether the disk I/Os are sequential or random.

    It makes sense to talk about I/Os issued by SQL Server being sequential or random at the SQL Server level. For instance, if a SQL Server operation (e.g. a table scan and a backup) accesses the database pages in the file page number sequence, it’s considered issuing sequential I/Os, and SQL Server certainly has control over that. Similarly, if you write a program (outside of SQL Server in the OS) to access a file, you can control how your I/O requests are issued with respect to how the data is laid out in that file. Whether the I/Os are sequential or random are relative to that file. No question there!

    However, in today storage environment, there is usually a very long path and many layers of API calls between the SQL Server operations that we are interested in and I/Os on the physical drives (i.e. spindles and platters). It’s rare that you can control how your I/Os requests at an application level are serviced at the physical disk level. In lot of cases, you don’t have any visibility into how your I/O requests are eventually handled on the disk platters.

    In addition, unlike other I/O access characteristics, whether I/Os are sequential or random, or to what extent they are sequential or random, is almost never exposed as any measurable metric. If you can’t measure or monitor whether I/Os are sequential or random, you can’t be so sure whether they are sequential or random. I don’t know where people get their confidence in talking about sequential disk I/Os or random disk I/Os.

    All is not lost though. For all the intents and purposes of discussing sequential vs. random disk I/Os, we can turn our attention to large vs. small disk I/Os.

    First of all, disk I/O sizes are measurable metrics and they are readily exposed by many tools.

    In addition, when people speak of a database system such SQL Server being optimized for sequential I/Os, they really mean to say that the system is optimized for large I/Os. Even if they didn't mean it, you are free to interpret it as such. Looking from an I/O perspective, you can view key database techniques such as database checkpoints, read-ahead reads, and transaction logging as methods to optimize for large disk I/Os (among their other purposes).

    On a disk platter, fewer sequential disk I/Os are indeed much more efficient than many random disk I/Os for accessing the same amount of data. The bet is that when you make a single large I/O request at the application level (e.g. SQL Server or your own code), it has a high probability of being translated into sequential disk I/Os at the lowest level (i.e. disk I/Os on the spindles and platters). Or at least, the chance of getting sequential disk I/Os is much higher for a large application-level I/O request than it is for multiple small I/O requests at the same application level. And that is a good bet!

    Personally, I pay more attention to the I/O size than I do whether disk I/Os are sequential or random.

  • Linked servers: how long do they stay connected?

    I was looking at some of my old notes on linked servers and found a tidbit on how the linked server connections are managed by SQL Server. I'm posting it here because I don’t think the information is widely known.

    When you make a linked server call from a SQL Server instance (say ServerA) to another SQL Server instance (say ServerB) over Microsoft Native Client OLEDB Provider, SQL Server on ServerA acts as a client to the instance on ServerB and will open or reuse a connection to ServerB. That connection will be managed by the SQL Server instance on ServerA.

    If you check on ServerB with the following query, you should see that connection from ServerA (if it's still there):

    select * from sysprocesses

     where hostname = 'ServerA'

       and program_name = 'Microsoft SQL Server'

    And if no linked server calls from ServerA is using the connection, its status would be ‘dormant’.

    The question is, “how long will ServerA keep the connection alive if no call is using it? And can you configure it?”

    I can’t find any official documentation to answer these two questions. But my own tests appear to yield a consistent answer to the first question. That is, a dormant SQL Server linked server connection will stay for about 4~5 minutes, and will be closed after that. All my attempts to see if this number is configurable suggest that the answer is negative. If anyone knows a more authoritative answer, please post it.

    Here is a simple test to determine how long a dormant connection stays alive.

    On ServerB, use the previous query to ensure that there is no linked server connection from ServerA. If there is, kill the connection and ensure ServerA does not open a new one.

    On ServerB, run this script:

    declare @dt datetime

    while not exists (select * from sysprocesses

                       where hostname = 'ServerA'

                         and program_name = 'Microsoft SQL Server')

        waitfor delay '00:00:01'

     

    set @dt = getdate()

     

    while exists (select * from sysprocesses

                   where hostname = 'ServerA'

                     and program_name = 'Microsoft SQL Server')

        waitfor delay '00:00:01'

     

    select 'Duration' = datediff(second, @dt, getdate())

     

    Go to ServerA and run the next script:

     

    select * from openquery(ServerB, 'select @@servername')


    I have been consistently getting the Duration values between ~260 seconds and ~300 seconds, which is about 4~5 minutes.
    If you have any test numbers of your own  or run this test, please post your findings.

  • Sybase ASE

    I sat in a Sybase ASE class last week for five days. Although it didn't cover the more advanced features introduced in the more recent versions of Sybase ASE, the class did touch all the basics of administering Sybase ASE. While I was successful in suppressing any urge to openly compare Sybase ASE with Microsoft SQL Server in the class, I could not help making mental notes on the differences between the two database platforms.

     

    It's always interesting to look at how two DBMS platforms that share the same root went their own different ways in handling the same/similar tasks. And here are some random notes I jotted down while sitting in the class.

     

    I had some limited experience with Sybase about 15 years ago, and had expected to see some major improvements in doing the basic tasks of managing a Sybase ASE instance. So I was surprised that very little had changed and it felt like a throwback to SQL Server 4.21a or 6.5. For those of you who never used SQL Server versions earlier than 7.0, you'd probably have a hard time understanding why having a good database backup file alone is not enough to get the database restored. Instead, you must first create the database in exactly the same layout as when the backup is taken. If you don't know that layout, there is a good chance that you won't get your database back even if you have the backup. As a result, religiously backing up your master database or religiously keeping your database creation scripts up to date become paramount important, much more important than it is with SQL Server. Having seeing a better alternative, this all seemed rather archaic to me.

     

    SQL Server 2005 and later implemented a feature that allows a database to be online to the user as soon as the redo step is finished during the instance startup. A database on a Sybase ASE instance still needs to wait until both the redo phase and the undo phase are completed before it can accept user connections. I guess to help ease the pain, Sybase ASE allows the user to specify the order in which the databases on an instance are recovered. So if database XYZ is the most important, you could configure it to be recovered first before all the other user databases. This is nice and would be good to have on SQL Server. But because the SQL Server databases can come online faster, the feature is not as useful on SQL Server.

     

    Like SQL Server, Sybase ASE also supports autogrowing the database space allocation. And it doesn't encourage anyone to rely on the feature just as it is discouraged on SQL Server. But because knowing exactly how space is allocated for a database is so much more critical on Sybase ASE than it is on SQL Server for recovery purposes, using autogrowth is therefore much more dangerous on Sybase ASE.

     

    One nice thing about Sybase ASE is that all its sp_configure options are exposed in a configuration file which is a plain text file. You can still use sp_configure just like you do on SQL Server, but all the configuration changes are saved to the configuration file. The main motivation for externalizing the sp_configure options to a configuration file is that on Sybase ASE you could easily mis-configure an option causing the Sybase ASE to fail to start up. With the sp_configure options in a file, you can easily correct such a mistake and restart the instance without having to start the instance in some special mode just to correct the mistake. On SQL Server, this feature is less useful because you can hardly mis-configure a sp_configure option and cause the SQL Server instance to fail to start.

     

  • Linked servers and performance impact: Direction matters!

    When you have some data on a SQL Server instance (say SQL01) and you want to move the data to another SQL Server instance (say SQL02) through openquery(), you can either push the data from SQL01, or pull the data from SQL02.

     

    To push the data, you can run a SQL script like the following on SQL01, which is the source server:

     

    -- The push script

    -- Run this on SQL01

    use testDB

    go

    insert openquery(SQL02,

                     'select * from testDB.dbo.target_table')

    select * from source_table;

     

    To pull the data, you can run a SQL script like the following on SQL02, which is the target server:

     

    -- The pull script

    -- Run this on SQL02

    use testDB

    go

    insert target_table

    select * from openquery(SQL01,

                           'select * from testDB.dbo.test')

     

    The question is: does it make a difference whether to push the data or pull the data? The answer is absolutely yes. Which method you use makes a huge performance difference. Let’s demonstrate the difference with some tests.

     

    First, we need to create two identically-structured tables, one on SQL01 and the other on SQL02. The script for the source table is as follows:

     

    -- Run this on SQL01

    use testDB

    go

    create table source_table (

    i int,

    filler char(200))

    go

     

    The table script for the target table is as follows:

     

    -- Run this on SQL02

    use testDB

    go

    create table target_table (

    i int,

    filler char(200))

    go

     

    Then, let’s populate the source table with 50,000 rows:

     

    use TestDB

    go

    ;with tmp as (

      select 1 as i, replicate('a', 200) as filler

      union all

      select i+1, replicate('a', 200)

        from tmp

       where i < 50000

    )

    insert source_table

    select i, filler from tmp

    option(maxrecursion 0)

     

    Once we have populated the source table with test data, we can take turn running the push script and the pull script mentioned above. I ran the tests several times in my test environment, each time with the target table first truncated. Here are the results:

     

    Method

    Duration (seconds)

    Push

    240

    Pull

    2

     

    The pull method was more than 100 times faster in this test environment. I ran the tests many times. Sometimes, the difference was smaller, and other times the difference was larger. But this seems to be a good enough average. Regardless, there is no mistake that the performance difference was profound. Originally, I was going to plot a bar chart. But with these numbers, the bar chart would look ridiculous!

     

    Why is the performance difference so large?

     

    To see the reason, we need to fire up SQL Profiler on both sides and watch the SQL traffic while running the tests. In the case of the pull method, you’ll see that a single SQL statement is executed across the linked server (i.e. SQL01), and the resultset is pulled over from SQL01 to SQL02.

     

    In the case of the push method, you may see the following calls executed 50,000 times across the linked server (the specific values would be different each time):

     

    exec sp_cursor 180150005,4,0,N'[testDB].[dbo].[target_table]',@i=4167,@filler='aaaaaaaaaaaaaaaaaaaa'

     

    This executes something like the following:

     

    INSERT [testDB].[dbo].[target_table]([i],[filler])VALUES(@Param000004,@Param000005)

     

    In other words, the data is pushed one row at a time from SQL01 to SQL02. That’s 50,000 round trips between SQL01 and SQL02. Not efficient at all! In addition, the larger the source table, the more pronounced the inefficiency would show.

  • Linked servers: permissions and distributed query performance

    In an earlier post, I highlighted that linked server security configuration can have a huge performance impact, and complained that this was not properly documented in SQL Server Books Online and filed a Connectitem for this. Good news is that in Books Online for SQL Server 2008 R2,  Microsoft has revised the documentation, and included the following guideline forusing distributed queries:

     

    To create the best query plans when you are using a table on a linked server, the query processor must have data distribution statistics from the linked server. Users that have limited permissions on any columns of the table might not have sufficient permissions to obtain all the useful statistics, and might receive aless efficient query plan and experience poor performance. If the linked serveris an instance of SQL Server, to obtain all available statistics, the user must own the table or be a member of the sysadmin fixed server role, the db_ownerfixed database role, or the db_ddladmin fixed database role on the linkedserver.

    This is a major improvement.To drive this home, however, it would be nice to see the impact in action. In my presentation at the recent SQL Saturday #59 in NYC, I demonstrated, with a simple example, that the performance of adistributed query can be significantly impacted by the user permission. Unfortunately, due to time limit, I could not go into great details during the demo. Let me describe the example in this post.

    To make the example work,some assumptions are in order:

    ·      You have two SQL Server 2005 (or 2008) instances:SQL01 and SQL02. There is a user database called TestDB on SQL02.
    ·      You have a SQL login called bob on SQL02, which is granted access to TestDB as a user in the public role only.
    ·      A linked server called SQLBOB is created on SQL01 as follows:

     

    EXEC master.dbo.sp_addlinkedserver @server= N'SQLBOB',

         @srvproduct=N'',

         @provider=N'SQLNCLI',

         @datasrc=N'SQL02',

         @catalog=N'TestDB'

    EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'SQL02',

         @useself=N'False',

         @locallogin=NULL,

         @rmtuser=N'bob',

          @rmtpassword='########' -- replace withreal password for bob

    GO

    EXEC master.dbo.sp_serveroption @server=N'SQLBOB',

         @optname=N'data access', @optvalue=N'true'

    GO

    EXEC master.dbo.sp_serveroption @server=N'SQLBOB',

         @optname=N'rpc', @optvalue=N'true'

    GO

    EXEC master.dbo.sp_serveroption @server=N'SQLBOB',

         @optname=N'rpc out', @optvalue=N'true'

    GO

    ·      A second linked server SQLSA is created on SQL01 as follows:

    EXEC master.dbo.sp_addlinkedserver @server= N'SQLSA',

          @srvproduct=N'',

          @provider=N'SQLNCLI',

          @datasrc=N'SQL02',

          @catalog=N'TestDB'

    EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'SQLSA',

          @useself=N'False',

          @locallogin=NULL,

          @rmtuser=N'sa',

          @rmtpassword='########' -- replace with real sa password

    GO

    EXEC master.dbo.sp_serveroption @server=N'SQLSA',

          @optname=N'data access', @optvalue=N'true'

    GO

    EXEC master.dbo.sp_serveroption @server=N'SQLSA',

          @optname=N'rpc', @optvalue=N'true'

    GO

    EXEC master.dbo.sp_serveroption @server=N'SQLSA',

          @optname=N'rpc out', @optvalue=N'true'

    GO

     

    The only difference between SQLBOB and SQLSA is that the former is configured to log into SQL02 with bob, whereas the latter is configured to log into SQL02 with sa.

    Now, let’s create a table in TestDB on SQL02 and populate it with 100,000 rows using the following script:

    use TestDB

    go

    drop table item

    go

    create table item(i int, j int, cchar(200))

     

    ;with tmp(i, j, c) as (

       select 1, 1, replicate('a', 200)

       union all

       select i + 1, j+1, replicate('a', 200)

          from tmp

        where i < 100000

    )

    insert item

    select * from tmp

    option (maxrecursion 0)

    go

    create clustered index ix_i on item(i)

    go

    create index ix_j on item(j)

    go

    grant select on item to bob

    Then on SQL01, let’s create a temporary table and populate it with 20 rows.

    drop table #tmp

    go

    create table #tmp(i int, c varchar(10))

     

    ;with tmp(i, c) as (

       select 1, cast('abc' as varchar(10))

       union all

       select i + 1, cast(i as varchar(10))

          from tmp

        where i < 20

    )

    insert #tmp

    select * from tmp

    option (maxrecursion 0)

    go

    create clustered index ix_c on #tmp(c)

    go

     

    Now, we are ready for ourdemo. On SQL01, run the following queries:

     

    --Query 1

    select t1.*, t2.*

     from #tmp t1, SQLBOB.TestDB.dbo.item t2

     where t1.i = t2.j

     

    -- Query 2

    select t1.*, t2.*

     from #tmp t1, SQLSA.TestDB.dbo.item t2

     where t1.i = t2.j

    Note that the only difference between Query 1 and Query 2 is that the former accesses the item table on SQL02 via linked server SQLBOB whereas the latter does via linked server SQLSA.

    You should find that both queries will come back relatively quickly with no big difference in duration.However, if you capture their query plans, you’ll see that the remote query forQuery 1 is something similar to the following:

    SELECT "Tbl1004"."i""Col1010","Tbl1004"."j""Col1011","Tbl1004"."c" "Col1012" FROM"TestDB"."dbo"."item" "Tbl1004" ORDERBY "Col1011" ASC'

     

    In other words, the optimizer on SQL01 has decided to bring all the rows of the table item from SQL02 to SQL01 in a single scan, and then process the query locally.

     

    And if you capture the query plan for Query 2 or useSQL Profiler on SQL02, you’ll see that the remote query would look like thefollowing:

     

    SELECT "Tbl1004"."i""Col1010","Tbl1004"."j" "Col1011","Tbl1004"."c""Col1012" FROM "TestDB"."dbo"."item""Tbl1004" WHERE "Tbl1004"."j"=?

     

    This time, SQL01 is using a different strategy. Basically, it is fetching only those rows from SQL02 where column item.j matches the column i values from the local temporary table.  If you look at the incoming traffic on SQL02with SQL Profiler, you’ll see the following:

     

    declare @p1 int

    set @p1=1

    exec sp_prepexec @p1 output,N'@P1 int',N'SELECT"Tbl1004"."i""Col1006","Tbl1004"."j""Col1007","Tbl1004"."c" "Col1008" FROM "master"."dbo"."item" "Tbl1004" WHERE "Tbl1004"."j"=@P1',2

    select @p1

     

    And this is followed by 20 executions of sp_execute. Essentially, SQL01 has decided to parameterize the calls to SQL02 through the linked server SQLSA by creating a temporary stored procedure on SQL02 and then call the proc 20 times, one for each value of column i in #tmp.

     

    So what is the implication? Well, you’ll see a huge performance difference if table item on SQL02 is not populated with 100,000 rows but 10,000,000 rows. In that case, 20 calls with index seek will absolutely beat a scan of a large table any time. The larger the table on SQL02, the more significant the performance difference between Query 1 and Query 2.

     

    Why is there such a performance difference between Query 1 and Query 2?

     

    The root cause is explained by the quotation from SQL Server 2008 R2 Books Online (reproduced at the beginning of this post). With SQLBOB, the user accessing SQL02 is bob, who only has the SELECT permission on the item table and does not have permission to see the distribution statistics. With SQLSA, however, the user is sa and has all the permission to access everything on SQL02, including the distribution statistics for the item table.

     

    Note that the worst case scenario for the Query 2query plan would be scanning the item table 20 times if the column item.j turnsout to be not selective, whereas with Query 1, a single scan of the item table is the worst case. In a way, it makes sense to choose the query plan as seen with Query 1 when the optimizer on SQL01 has no knowledge of the distribution statistics of the item table on SQL02. In terms of the worst case scenario, it is actually safer.

     

    If you run this example in your environment, you may not see the exact behavior as I have described. But if you play with different numbers of rows or different column widths in the test tables, you should eventually see the above two different plans.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement