THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Linked servers and performance impact: Direction matters!

When you have some data on a SQL Server instance (say SQL01) and you want to move the data to another SQL Server instance (say SQL02) through openquery(), you can either push the data from SQL01, or pull the data from SQL02.

 

To push the data, you can run a SQL script like the following on SQL01, which is the source server:

 

-- The push script

-- Run this on SQL01

use testDB

go

insert openquery(SQL02,

                 'select * from testDB.dbo.target_table')

select * from source_table;

 

To pull the data, you can run a SQL script like the following on SQL02, which is the target server:

 

-- The pull script

-- Run this on SQL02

use testDB

go

insert target_table

select * from openquery(SQL01,

                       'select * from testDB.dbo.test')

 

The question is: does it make a difference whether to push the data or pull the data? The answer is absolutely yes. Which method you use makes a huge performance difference. Let’s demonstrate the difference with some tests.

 

First, we need to create two identically-structured tables, one on SQL01 and the other on SQL02. The script for the source table is as follows:

 

-- Run this on SQL01

use testDB

go

create table source_table (

i int,

filler char(200))

go

 

The table script for the target table is as follows:

 

-- Run this on SQL02

use testDB

go

create table target_table (

i int,

filler char(200))

go

 

Then, let’s populate the source table with 50,000 rows:

 

use TestDB

go

;with tmp as (

  select 1 as i, replicate('a', 200) as filler

  union all

  select i+1, replicate('a', 200)

    from tmp

   where i < 50000

)

insert source_table

select i, filler from tmp

option(maxrecursion 0)

 

Once we have populated the source table with test data, we can take turn running the push script and the pull script mentioned above. I ran the tests several times in my test environment, each time with the target table first truncated. Here are the results:

 

Method

Duration (seconds)

Push

240

Pull

2

 

The pull method was more than 100 times faster in this test environment. I ran the tests many times. Sometimes, the difference was smaller, and other times the difference was larger. But this seems to be a good enough average. Regardless, there is no mistake that the performance difference was profound. Originally, I was going to plot a bar chart. But with these numbers, the bar chart would look ridiculous!

 

Why is the performance difference so large?

 

To see the reason, we need to fire up SQL Profiler on both sides and watch the SQL traffic while running the tests. In the case of the pull method, you’ll see that a single SQL statement is executed across the linked server (i.e. SQL01), and the resultset is pulled over from SQL01 to SQL02.

 

In the case of the push method, you may see the following calls executed 50,000 times across the linked server (the specific values would be different each time):

 

exec sp_cursor 180150005,4,0,N'[testDB].[dbo].[target_table]',@i=4167,@filler='aaaaaaaaaaaaaaaaaaaa'

 

This executes something like the following:

 

INSERT [testDB].[dbo].[target_table]([i],[filler])VALUES(@Param000004,@Param000005)

 

In other words, the data is pushed one row at a time from SQL01 to SQL02. That’s 50,000 round trips between SQL01 and SQL02. Not efficient at all! In addition, the larger the source table, the more pronounced the inefficiency would show.

Published Wednesday, December 01, 2010 12:31 AM by Linchi Shea

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Uri Dimant said:

Hi Linchi

>>The pull method was more than 100 times faster in this test environment.

I see duration for Pull method is 240 , should it be 2 seconds?

December 1, 2010 3:26 AM
 

Uri Dimant said:

Hi Linchi

>>The pull method was more than 100 times faster in this test environment.

I see duration for Pull method is 240 , should it be 2 seconds?

December 1, 2010 3:26 AM
 

David said:

Hi,

It's really a interesting behavior.

Maybe I miss something, but I think the table is wrong.

You say "The pull method was more than 100 times faster", but the table shows the opposite.

Regards,

David EG

December 1, 2010 3:48 AM
 

Linchi Shea said:

Oops! Thanks Uri and David. I've corrected the table. Oh BTW, that's an embedded test to see if people are really reading :-)

December 1, 2010 9:01 AM
 

Elad said:

More interesting performance test will be activation of bulk insert from one instance to another using linked server.

December 5, 2010 4:30 AM
 

Michael Harmon said:

I don't dispute the speed differences between the push and pull, but, I found a different reason to change my query from the pull variety to push (i.e. running on the server where the data is located).

In the environment at my company, I have setup the SQL Server service to run under a Windows Domain account, instead of local system. I needed to pull the data from one SQL server which is still being maintained by an outside consultant, so, I am not in real control of that server.

I changed the Service account of that other server and created a Linked Server to the server where the data needed to be pushed to the same Windows Domain account. I then coded a stored procedure on the target server using four part names (instead of OpenQuery) and everything worked fine.

However, the consultant changed the service account to the service account to local system again and the stored proc. started to fail. I don't have the error message handy.

Anyway, I recoded the stored proc. on the source server (running on local system), and there is no longer a problem. I created a linked server on the source server to the target server, and now, it works fine.

Permissions issues may dictate whether you use the push or pull method.

December 11, 2010 9:48 AM
 

Phil Elmousse said:

Thanks for that tip.

I had some trouble with insert in a remote server which took me several hours. Now I used the pull method and it's about 10 ten times better.

September 9, 2013 9:50 AM
 

Matthew said:

I know it's been long time since this post was written and commented but I hope anyway to get an answer. Thanks in advance.

I'm facing an a similar issue but in this case the problm is the opposite: a Pull approach for populating a table with data extracted from a remote linked server is taking long time while the Push approach

is taking just few minutes.

I'm interested about Michael Harmon's comment 'cause I'm wondering whether or not that could be my case. Could anyone clarify that point about permission issues? Could it be related to my performace issue?

November 8, 2013 6:27 AM
 

[翻译]——SQL Server使用链接服务器的5个性能杀手 - 博客园 said:

June 8, 2014 1:10 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement