When you have some data on a SQL Server instance (say SQL01) and you want to move the data to another SQL Server instance (say SQL02) through openquery(), you can either push the data from SQL01, or pull the data from SQL02.
To push the data, you can run a SQL script like the following on SQL01, which is the source server:
-- The push script
-- Run this on SQL01
use testDB
go
insert openquery(SQL02,
'select * from testDB.dbo.target_table')
select * from source_table;
To pull the data, you can run a SQL script like the following on SQL02, which is the target server:
-- The pull script
-- Run this on SQL02
use testDB
go
insert target_table
select * from openquery(SQL01,
'select * from testDB.dbo.test')
The question is: does it make a difference whether to push the data or pull the data? The answer is absolutely yes. Which method you use makes a huge performance difference. Let’s demonstrate the difference with some tests.
First, we need to create two identically-structured tables, one on SQL01 and the other on SQL02. The script for the source table is as follows:
-- Run this on SQL01
use testDB
go
create table source_table (
i int,
filler char(200))
go
The table script for the target table is as follows:
-- Run this on SQL02
use testDB
go
create table target_table (
i int,
filler char(200))
go
Then, let’s populate the source table with 50,000 rows:
use TestDB
go
;with tmp as (
select 1 as i, replicate('a', 200) as filler
union all
select i+1, replicate('a', 200)
from tmp
where i < 50000
)
insert source_table
select i, filler from tmp
option(maxrecursion 0)
Once we have populated the source table with test data, we can take turn running the push script and the pull script mentioned above. I ran the tests several times in my test environment, each time with the target table first truncated. Here are the results:
|
Method |
Duration (seconds) |
|
Push |
240 |
|
Pull |
2 |
The pull method was more than 100 times faster in this test environment. I ran the tests many times. Sometimes, the difference was smaller, and other times the difference was larger. But this seems to be a good enough average. Regardless, there is no mistake that the performance difference was profound. Originally, I was going to plot a bar chart. But with these numbers, the bar chart would look ridiculous!
Why is the performance difference so large?
To see the reason, we need to fire up SQL Profiler on both sides and watch the SQL traffic while running the tests. In the case of the pull method, you’ll see that a single SQL statement is executed across the linked server (i.e. SQL01), and the resultset is pulled over from SQL01 to SQL02.
In the case of the push method, you may see the following calls executed 50,000 times across the linked server (the specific values would be different each time):
exec sp_cursor 180150005,4,0,N'[testDB].[dbo].[target_table]',@i=4167,@filler='aaaaaaaaaaaaaaaaaaaa'
This executes something like the following:
INSERT [testDB].[dbo].[target_table]([i],[filler])VALUES(@Param000004,@Param000005)
In other words, the data is pushed one row at a time from SQL01 to SQL02. That’s 50,000 round trips between SQL01 and SQL02. Not efficient at all! In addition, the larger the source table, the more pronounced the inefficiency would show.