SQL Server transactional replication has been a rather solid feature and works well for delivering data to reporting servers (among other things) in near real time. That said, it may not work too well when you need to perform massive updates on a published table, for instance, when you need to archive a large amount of data. It can take a long time for all the changes to be replicated to the subscribers.
For some requirements, a perfect solution for this scenario would be to temporarily disable the transactional replication, perform the large scale data archiving concurrently on both the publisher and all the subscribers, and then resume the replication. That is, this would work very well if you can keep the published table from being modified by any other process for the duration.
But wait! This won’t work because there is no built-in SQL Server feature to disable/enable transactional replication so that it becomes completely transparent to data updates while it’s disabled.
A potential workaround is to remove the replication completely before making massive data changes and then re-create it afterwards. Before I actually started using this approach, it sounded to me like a rather dicey way of getting better performance. A number of things could go horribly wrong to help ruin your otherwise peaceful weekend:
- Replication you re-create afterwards may not be identical to what you have removed.
- Your table(s) could get out of sync among the servers before you re-create replication.
- Coordinating concurrent updates on multiple servers is inherently more complex than just updating the publisher.
But now that I have put this approach in practice and battle tested for many months, I can report that this can be a very effective approach to gaining performance in your archiving process. Here are some of the tips I’d like to share in order to achieve that effectiveness:
- You must have a robust mechanism to quiesce all the processes that update your published data. If your data is updated (by a non-archiving process) on the publisher while replication is removed, your subscribers will be out of sync with the publisher and you must re-sync them with the publisher, which defeats the purpose of removing replication in the first place.
- Once the data update processes are quiesced, you must verify that the tables are indeed in sync among the publisher and the subscribers. If this fails, there is no point of proceeding further.
- You’ll need a program or a script to generate the T-SQL scripts for deleting and creating the replication. I do this with a C# program via RMO, but a Poweshell script will do just fine. The T-SQL scripts need to be generated immediately before you remove the replication so that any change to the replication is captured.
- If you are archiving a single table or a few tables, you can just script out the replication setup on the table(s). In many cases, you are archiving a database and you should script out all the replication setup for that database.
- Among other things, your replication script generator should take the name of the table or database as an input and automatically find out all the replication setup that needs to be scripted. This ensures that you don’t need to change the script generator as publications and/or subscriptions are being added, dropped, or modified.
- You’ll also need a program to execute the generated replication delete scripts and another program to execute the generated replication create scripts. These programs need to be aware of where a given script must be executed. For instance, a script to create the pull subscript needs to be executed on the subscriber. My solution is to the replication script generator record this information. As it goes through each publication and pull subscription to generate scripts, the script generator records the generated T-SQL script locations and the target server names in a table. The programs that execute the generated scripts are driven by this table to find each script and what server on which that script needs to be executed.
- Once the tables are verified to be in sync and the replication is removed (by executing the delete scripts), you can start to archive data concurrently on all the servers. This step is best driven by a common list of what need to be archived so that you are guaranteed that archiving will be done on all the server on exactly the same data, and at the end of archiving, you are guaranteed that the tables will still be in sync.
- Since archiving is done on separate servers, things can obviously go wrong to crash it on one server but not on the others. Regardless what may go wrong, before you execute the generated T-SQL scripts to re-create the replication, you must verify that:
- The archiving processes are indeed done on all the servers, and
- The data is still in sync across all the servers
- All the subscriptions need to be re-created with no-sync. Otherwise, you have just archived the subscriber for nothing. This means that the replication script generator needs to modify the generated T-SQL scripts so that the value of the @sync_type parameter for the sp_addsubscription procedure is set to ‘none’ regardless of how it is currently set.
- To ensure that the re-created replication is the same as the original replication (before it’s removed), you can take a snapshot of the syspublications, sysarticles, and syssubscriptions tables before the replication is removed, and take another snapshot of the same three tables after the replication is re-created. By comparing the before and after replication configurations recorded in these three tables, you can determine whether you have changed any replication setup.
- I would also suggest that you run a test change and see it get replicated from the publisher to the subscribers. Nothing gives you more comfy feeling than see it in action and see it works.
- The distribution agents should run under the security context of the SQL Agent service account. Otherwise, because the generated scripts don’t include any password text, the script containing sp_addpushsubscription_agent or sp_addpullsubscription_agent will fail and no distribution agent would be created.
As mentioned, this approach has been proven to be effective in practice, and it is also quite maintainable. Still this amounts to regular schema changes with a lot of moving parts. It would be much better that replication can be disabled with a single DDL statement, just like you can disable an index or a constraint.