I am using SQL Server 2008 Enterprise. And I am using Linked Server technologies to link another SQL Server 2008 Enterprise instance from another server. I write TSQL to manipulate objects (e.g. tables) from both server instances.
My question is, for linked server, is there a big performance issue? If yes, what is the key performance bottleneck and best practice we should follow?
thanks in advance, George
Compared to what? For what queries? of course it all depends on what you are doing. For some queries the performance hit will be negligible for others massive.
There are a bunch of concerns you should keep in mind:
In the past I have found situations where it was a few orders of magnitude faster to move the remote data locally, and index it before joining into it.
It depends on what you are doing.
If you are running queries that join between tables in the two server instances, and transferring large amounts of data, then you have a bottleneck that you need to be aware of.
If the servers are on their own subnet with a 1GB link, then you should not have to worry a great deal. I would be concerned if the two servers are connected by a shared, slow link.
You're going to take a bit of a hit to transfer the results across the wire twice (linked server to SQL Server to your machine). Secondly, it's got to resolve the name and log in, which isn't much of a hit, but it's a hit nonetheless.
Anyway, I've found the only major bottleneck is jumping servers, since it has to transmit the information twice.
@George2,
Sam Saffron is correct in this case. When a join is performed locally then SQL Server uses indexes to perform a join and then does lookups for the columns not included into an index definition.
With linked server to do a join all table needs to be transferred from a remote server first, then join is performed. This is a bottle neck. If you can pre-filter all remoted tables before joining them to local tables it will considerably improve performance (e.g. select into #temp tables with good filter to reduce number of rows), then if you need to perform multiple operations against that table you are better off creating an index right away.
I use linked servers frequently to synchronise data between environments, mainly because I found it to be the easiest solution to code and manage.
One tip I found, but may not be an option for others, was to run any procedures on the server that has the most data or is doing the most updating/inserting. For example I have a procedure that compares two tables and inserts/updates from A to B. If I ran this on server A it would take many times longer than running the procedure on B. If you don't have a choice where to run our code, and you are stuck on, say, server A, then this advice may not help.
Another tip is to reduce the data returned to the minimum necessary. Whereas you might normally have data returned almost instantly on a local server, if a linked server is some distance away then the latency can be very painful. Be stricter than normal in accessing only those columns you need.
I've found that if you're doing outer joins (left/right) the performance degrades fast. It's sometimes faster to select the data from the remote server into a temp table and index it rather than joining across the network. Mostly, the best strategy is to write the query the way it makes sense and then only tune it if performance is a real problem.