THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Argenis Fernandez

Transactional Replication and WAN links

We recently worked on a transactional replication setup that involved a very active VLDB and a subscriber being located on a different datacenter. What made it even more interesting is that the WAN link was not particularly fast. In this post, I would like to mention a few of the challenges we faced while and how we got past them, in the hopes that our experience can help you in your future endeavours.

The Problem with Slow Distribution Servers

 If your distribution server is slow, your replication performance will tank. You will get behind on transactions and might not ever catch up. In our scenario we had a distribution server that was outdated. The server was running Windows Server 2003 and SQL Server 2005. In our case, this was the biggest issue. After we moved to a new distribution server that was running Windows Server 2008 R2 and SQL Server 2008 R2, our performance increased greatly. One of the biggest benefits in moving to Windows Server 2008 R2 is the set of enhacements made to the TCP/IP stack - particularly send/receive TCP windows. For more information, see this article on Technet.

Careful with WAN Accelerators

While WAN accelerators can be fantastic in a myriad of scenarios, in our testing we noticed that the device wasn't really optimizing replication traffic - and it was actually causing latency. With the help of our Networking team, we made sure that traffic from the distributor to the subscriber was skipped by the WAN accelerator. Obviously, your mileage will vary - so test accordingly.

Use Pull Subscriptions

Our initial setup was done using a Push subscription. This was a big mistake. While Push subscriptions are easier to setup and maintain, their performance across WAN links is plain dismal. The MSDN team at Microsoft put out a great White Paper on Geo-Replication Performance which is, in my opinion, required reading for replication to other datacenters. We saw huge performance gains when we switched to Pull. Orders of magnitude faster. Put simply - never use Push subscriptions across WAN links.

One of the design decisions made in our scenario that I would like to point out: we intentionally kept the distribution database near the publisher (i.e., on the same datacenter) - the reason behind this is simple: if your level of confidence in your WAN link isn't that high, the concern becomes the Log Reader agent and getting the Transaction Log to clear reliably and constantly.

Initialize the Subscriber from a Backup

With a WAN link and high latency involved, initialization of the subscriber from a backup is your best bet. We saved ourselves a lot of headaches by doing it. Creating and transferring a snapshot of a VLDB is out of the picture when you're concerned with WAN latency. In our scenario, the publisher was running SQL Server 2005 and backups were being taken using LiteSpeed. We transferred the most recent full backup to the remote datacenter using robocopy (could have used FTP also) plus the latest differential taken after changing the properties of the publication to allow initialization from backup. Restored at the subscriber using LiteSpeed tools, and then used the Extractor utility to create native-format backup files to initialize from backup. This is because you cannot initialize from a LiteSpeed backup, as SQL needs to read LSN information from the backup file and it uses a system stored procedure for that purpose. Here are some tips: you only need the first backup file created by Extractor to initialize. Also, you don't have to initialize with a differential backup - you can use a T-Log backup just as well.

 Here is a good post on Initialization from Backup at ReplTalk that might be helpful if you run into issues.

Other Optimizations

There are other replication features that help reduce the amount of commands sent across to the subscriber. Namely:

Happy Publishin'!

Published Tuesday, May 31, 2011 9:39 AM by Argenis
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Robert L Davis said:

Which WAN accelerator are you using? We use Bedrock and just recently got it installed into our datacenters. I don't have metric yet on how it has affected replication.

May 31, 2011 4:40 PM
 

Argenis said:

Robert,

The WAN accelerator in our scenario was a Riverbed. I don't have any details on how it was implemented, etc. We treated it as a black box - our go live dates got in the way of working with the Network team and performing proper troubleshooting.

May 31, 2011 5:06 PM
 

Robert L Davis said:

See, I even got the name wrong. Yes, Riverbed. Not Bedrock. It may be Tuesday, but my brain thinks it is Monday.

May 31, 2011 5:53 PM
 

Dan said:

Argenis, how did robocopy work for you?  We had to copy a 4TB VLDB and used all sorts of copy utilities and all worked terribly due to limitations on buffered copies.  They would start fast and gradualy slow down to a trickle.

I ended up finding a few articles that recommend using Exchange's ESEUTIL and we then were able to get a sustained 650MB/sec throughput.  Even if lightspeed cut the backup down to a few hundred GB, I'd still recommend using the ESEUTIL for a faster copy.

http://blogs.msdn.com/b/granth/archive/2010/05/10/how-to-copy-very-large-files-across-a-slow-or-unreliable-network.aspx

June 1, 2011 10:18 AM
 

Argenis said:

Dan,

robocopy worked pretty well for us. One thing we did to prevent copy problems was to perform the remote copy between two servers that weren't all that busy, and we made sure we always pulled the file - i.e, started the copy at the remote location.

June 1, 2011 11:54 AM
 

Sankar Reddy said:

To copy (large) files over the network, I am using winRAR to split the file into multiple smaller components and then transfer the smaller chunks. If something fails, I just have to re-start that one smaller file instead of re-trying the complete large file again.

June 1, 2011 12:13 PM
 

Dave said:

Very interesting thanks. We replicate to subscribers in the same datacentre as well as remote ones, and although we have a fat pipe between the datacentres I've definitely noticed that the remote subscribers take a lot longer to initialize than the local ones.

Is it still possible to initialize from backup when publishing only a subset of data? I wouldn't think so, but haven't had any experience with it.

June 3, 2011 3:11 AM
 

Argenis said:

Sankar,

In our case the backup files were split. That also helped :)

June 3, 2011 11:46 AM
 

Argenis said:

Dave,

Sure you can. You'll just end up with unfiltered data at the subscriber and it would be up to you to _carefully_ deal with that fact.

June 3, 2011 11:51 AM
 

Kendra Little said:

This is great! I'm adding this to my list of great links on replication.

June 15, 2011 5:59 PM
 

Will S said:

Hi Argenis. Were your subscriber servers also running Windows/SQL 2008 R2? Just wondering if it's required on both ends to see the TCP/IP improvements. Thanks!

August 18, 2011 1:07 PM
 

Tom said:

Argenis ,

Can you help me how setup "WAN" Replication (transaction) in SQL Server 2008 R2

I can't find it into the net

tanx

August 11, 2012 3:49 AM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement