My company is cursed by a symbiotic partnership turned parasitic. To get our data from the parasite, we have to use a painfully slow odbc connection. I did notice recently though that I can get more throughput by running queries in parallel (even on the same table).
There is a particularly large table that I want to extract data from and move it into our local table. Running queries in parallel I can get data faster, but I also imagine that this could cause issues with trying to write data from multiple queries into the same table at once.
What advice can you give me on how to best handle this situation so that I can take advantage of the increased speed of using queries in parallel?
EDIT: I've gotten some great feedback here, but I think I wasn't completely clear on the fact that I'm pulling data via a linked server (which uses the odbc drivers). In other words that means I can run normal INSERT statements and I believe that would provide better performance than either SqlBulkCopy or BULK INSERT (actually, I don't believe BULK INSERT would even be an option).
Have you read Load 1TB in less than 1 hour?
- Run as many load processes as you have available CPUs. If you have
32 CPUs, run 32 parallel loads. If you have 8 CPUs, run 8 parallel
loads.
- If you have control over the creation of your input files, make them
of a size that is evenly divisible by the number of load threads you
want to run in parallel. Also make sure all records belong to one
partition if you want to use the switch partition strategy.
- Use BULK insert instead of BCP if you are running the process on the
SQL Server machine.
- Use table partitioning to gain another 8-10%, but only if your input
files are GUARANTEED to match your partitioning function, meaning
that all records in one file must be in the same partition.
- Use TABLOCK to avoid row at a time locking.
- Use ROWS PER BATCH = 2500, or something near this if you are
importing multiple streams into one table.
For SQL Server 2008, there are certain circumstances where you can utilize minimal logging for a standard INSERT SELECT:
SQL Server 2008 enhances the methods that it can handle with minimal
logging. It supports minimally logged regular INSERT SELECT
statements. In addition, turning on trace flag 610 lets SQL Server
2008 support minimal logging against a nonempty B-tree for new key
ranges that cause allocations of new pages.
If your looking to do this in code ie c# there is the option to use SqlBulkCopy
(in the System.Data.SqlClient namespace) and as this article suggests its possible to do this in parallel.
http://www.adathedev.co.uk/2011/01/sqlbulkcopy-to-sql-server-in-parallel.html
If by any chance you've upgraded to SQL 2014, you can insert in parallel (compatibility level must be 110). See this:
http://msdn.microsoft.com/en-us/library/bb510411%28v=sql.120%29.aspx