Linked Server Insert-Select Performance

2019-02-09 23:12发布

问题:

Assume that I have a table on my local which is Local_Table and I have another server and another db and table, which is Remote_Table (table structures are the same).

Local_Table has data, Remote_Table doesn't. I want to transfer data from Local_Table to Remote_Table with this query:

Insert into RemoteServer.RemoteDb..Remote_Table
select * from Local_Table (nolock)

But the performance is quite slow.

However, when I use SQL Server import-export wizard, transfer is really fast.

What am I doing wrong? Why is it fast with Import-Export wizard and slow with insert-select statement? Any ideas?

回答1:

The fastest way is to pull the data rather than push it. When the tables are pushed, every row requires a connection, an insert, and a disconnect.

If you can't pull the data, because you have a one way trust relationship between the servers, the work around is to construct the entire table as a giant T-SQL statement and run it all at once.

DECLARE @xml XML

SET @xml = (
        SELECT 'insert Remote_Table values (' + '''' + isnull(first_col, 'NULL') + ''',' +
            -- repeat for each col
            '''' + isnull(last_col, 'NULL') + '''' + ');'
        FROM Local_Table
        FOR XML path('')
        ) --This concatenates all the rows into a single xml object, the empty path keeps it from having <colname> </colname> wrapped arround each value

DECLARE @sql AS VARCHAR(max)

SET @sql = 'set nocount on;' + cast(@xml AS VARCHAR(max)) + 'set nocount off;' --Converts XML back to a long string

EXEC ('use RemoteDb;' + @sql) AT RemoteServer


回答2:

It seems like it's much faster to pull data from a linked server than to push data to a linked server: Which one is more efficient: select from linked server or insert into linked server?

Update: My own, recent experience confirms this. Pull if possible -- it will be much, much faster.

Try this on the other server:

INSERT INTO Local_Table
SELECT * FROM RemoteServer.RemoteDb.Remote_Table


回答3:

The Import/Export wizard will be essentially doing this as a bulk insert, where as your code is not.

Assuming that you have a Clustered Index on the remote table, make sure that you have the same Clustered index on the local table, set Trace flag 610 globally on your remote server and make sure remote is in Simple or bulk logged recovery mode.

If you're remote table is a Heap (which will speed things up anyway), make sure your remote database is in simple or bulk logged mode change your code to read as follows:

INSERT INTO RemoteServer.RemoteDb..Remote_Table WITH(TABLOCK)
SELECT * FROM Local_Table WITH (nolock)


回答4:

The reason why it's so slow to insert into the remote table from the local table is because it inserts a row, checks that it inserted, and then inserts the next row, checks that it inserted, etc.

Don't know if you figured this out or not, but here's how I solved this problem using linked servers.

First, I have a LocalDB.dbo.Table with several columns:

IDColumn (int, PK, Auto Increment)
TextColumn (varchar(30))
IntColumn (int)

And I have a RemoteDB.dbo.Table that is almost the same:

IDColumn (int)
TextColumn (varchar(30))
IntColumn (int)

The main difference is that remote IDColumn isn't set up as as an ID column, so that I can do inserts into it.

Then I set up a trigger on remote table that happens on Delete

Create Trigger Table_Del
    On Table
    After Delete
AS
Begin
    Set NOCOUNT ON;

    Insert Into Table (IDColumn, TextColumn, IntColumn)
     Select IDColumn, TextColumn, IntColumn from MainServer.LocalDB.dbo.table L 
      Where not exists (Select * from Table R WHere L.IDColumn = R.IDColumn)

END

Then when I want to do an insert, I do it like this from the local server:

Insert Into LocalDB.dbo.Table (TextColumn, IntColumn) Values ('textvalue', 123);
Delete From RemoteServer.RemoteDB.dbo.Table Where IDColumn = 0;

--And if I want to clean the table out and make sure it has all the most up to date data:
Delete From RemoteServer.RemoteDB.dbo.Table

By triggering the remote server to pull the data from the local server and then do the insert, I was able to turn a job that took 30 minutes to insert 1258 lines into a job that took 8 seconds to do the same insert.

This does require a linked server connection on both sides, but after that's set up it works pretty good.

Update:
So in the last few years I've made some changes, and have moved away from the delete trigger as a way to sync the remote table.

Instead I have a stored procedure on the remote server that has all the steps to pull the data from the local server:

CREATE PROCEDURE [dbo].[UpdateTable]
    -- Add the parameters for the stored procedure here
AS
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;

    -- Insert statements for procedure here

    --Fill Temp table
    Insert Into WebFileNamesTemp Select * From MAINSERVER.LocalDB.dbo.WebFileNames

    --Fill normal table from temp table
    Delete From WebFileNames
    Insert Into WebFileNames Select * From WebFileNamesTemp

    --empty temp table
    Delete From WebFileNamesTemp
END

And on the local server I have a scheduled job that does some processing on the local tables, and then triggers the update through the stored procedure:

EXEC sp_serveroption @server='REMOTESERVER', @optname='rpc', @optvalue='true'
EXEC sp_serveroption @server='REMOTESERVER', @optname='rpc out', @optvalue='true'
EXEC REMOTESERVER.RemoteDB.dbo.UpdateTable
EXEC sp_serveroption @server='REMOTESERVER', @optname='rpc', @optvalue='false'
EXEC sp_serveroption @server='REMOTESERVER', @optname='rpc out', @optvalue='false'


回答5:

If you must push data from the source to the target (e.g., for firewall or other permissions reasons), you can do the following:

In the source database, convert the recordset to a single XML string (i.e., multiple rows and columns combined into a single XML string). Then push that XML over as a single row (as a varchar(max), since XML isn't allowed over linked databases in SQL Server).

    DECLARE @xml XML

    SET @xml = (select * from SourceTable FOR XML path('row'))

    Insert into TempTargetTable values (cast(@xml AS VARCHAR(max)))

In the target database, cast the varchar(max) as XML and then use XML parsing to turn that single row and column back into a normal recordset.

DECLARE @X XML = (select '<toplevel>' + ImportString + '</toplevel>' from TempTargetTable)

DECLARE @iX INT
EXEC sp_xml_preparedocument @ix output, @x

insert into TargetTable
SELECT [col1],
       [col2]
FROM OPENXML(@iX, '//row', 2) 
WITH ([col1] [int],
       [col2] [varchar](128)
)

EXEC sp_xml_removedocument @iX


回答6:

I've found a workaround. Since I'm not a big fun of GUI tools like SSIS, I've reused a bcp script to load table into csv and vice versa. Yeah, it's an odd case to have the bulk operation support for files, but tables. Feel free to edit the following script to fit your needs:

exec xp_cmdshell 'bcp "select * from YourLocalTable" queryout C:\CSVFolder\Load.csv -w -T -S .' 
exec xp_cmdshell 'bcp YourAzureDBName.dbo.YourAzureTable in C:\CSVFolder\Load.csv -S yourdb.database.windows.net -U youruser@yourdb.database.windows.net -P yourpass -q -w' 

Pros:

  • No need to define table structures every time.
  • I've tested and it worked way faster than inserting directly through the LinkedServer.
  • It's easier to manage than XML (which is limited to varchar(max) length anyway).
  • No need of an extra layout of abstraction (tools like SSIS).

Cons:

  • Using the external tool bcp through the xp_cmdshell interface.
  • Table properties will be lost after ex/im-poring csv (i.e. datatype, nulls,length, separator within value, etc).