THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Bad database practices: abusing linked servers

In SQL Server, it is rather handy to retrieve data from a different SQL Server instance and use the result locally in another SQL statement for further processing. In theory and in the set purists’ fantasy land, it shouldn’t matter where you get your data or even how you get the data as long as you can use them to further compose a set-based solution.

 

That is all fine. After all, in a distributed environment the data you want may not be on the same instance where the processing takes place. Well, that is fine until it comes to performance and troubleshooting. If you use linked servers indiscriminately, sooner or later you’ll run into these issues, and they are not pretty.

 

In the real world, you may have seen different kinds of linked server abuses than what I have seen. For me, two types of abuse stand out rather prominently. (For brevity, I’ll limit the discussions primarily to remote procedure calls, although they apply as well to any other use of linked servers such as openquery and heterogeneous joins.)

 

The worst kind of linked server abuse is in making an excessive number of linked server hops. Let me clarify what I mean by that. If ProcA makes a call to ProcB via a linked server ServerB and ProcB does not make any further cross server linked server calls, it would be one hop. If ProcB itself makes another one-hop call to ProcC on ServerC, there would be two hops from ProcA to ProcC, and so on and so forth.

 

Generally speaking, I would consider a two- or three-hop call stretched, but in reality I have seen calls that span four hops. I can only classify that as abusive, even though a lot of times the developer who added the very last hop did not know that the proc he was calling had three hops already. He was just making use of the resultset, a practice perfectly in alignment with the set-based approach.

 

So what is the problem? The headache lies in performance and troubleshooting.

 

Although getting data across several server hops does not necessarily mean poor performance, there is no doubt that increasing the number of hops significantly increases the susceptibility to poor performance. Unless you are getting a tiny amount of data, retrieving data across server is relatively expensive, and if that is subject to some implicit loop in an execution plan, your chance of poor performance has just been dramatically magnified. As pointed out in my previous post, you may end up processing data on the more expensive side of the server boundary, and by adding another hop, you only manage to further obscure that danger rather than expose or highlight it. This is especially dangerous when you have layers of views on top of cross-server queries and that view is hidden several hops down stream. You may think you are just doing a local join when in fact you are doing a distributed join.

 

In addition to real or potential performance problems, excessive number of server hops makes it very difficult to troubleshoot an issue, whether it’s performance or function related. The fact that SQL Server error messages rarely identify which server they come from only help exacerbate the grief. Often, this forces you to have to execute the calls at different levels in order to see where the problem may come from, a task that may be hard to perform in many circumstances. Just getting the right parameters for a call at each level can become time consuming.

 

The second type of linked server abuse that I consider among the worst is to combine multiple layers (or hops) of linked server calls with dynamic SQL. For instance, to gain flexibility, one may store linked server names in a table, and construct calls dynamically at the run time. This solves the problem of keeping the code stable when one changes the linked server names. But this approach makes it nearly impossible to troubleshoot when there is a problem somewhere in one of the hops because looking at a message you would have no idea where it comes from, and you can’t just scan through the code to spot problematic location. It’s not a pretty situation, and I wouldn’t wish that on anyone.

 

It is often beyond your control as to why there are so many linked server hops. And once they are there, they can quickly become such a tangled web that any attempt to shrink the number of hops is too painful to be worth any benefits such a successful attempt may bring about. Hopefully, by highlighting the pain one may suffer from the abuse, we can help deter its future practice.

 

Published Friday, November 06, 2009 8:56 PM by Linchi Shea

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Vijaya Kadiyala said:

Its good information. At the first place is it good to have Linked servers?

November 23, 2009 9:41 PM
 

Linchi Shea said:

Vijaya;

In general, linked servers are an extremely useful tool for accessing data among the SQL instances, and in a distributed environment, you can expect data to be 'distributed' among different servers.

December 1, 2009 11:19 PM
 

Michelle Jenks said:

Recently a Microsoft employee stated that linked servers should not be used, and to add that to our best practices.   I did not hear what he said as so black and white, but others in my company did.  Your thoughts?

May 17, 2011 10:59 PM
 

Dave M said:

If that is actually a "Microsoft Stance" they would do away with them.

FWIW the proof is always in the pudding.

November 19, 2013 9:22 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement