I'd like to know if you can recommend any advanced ADO.NET libraries for working with databases.
I've discovered that LINQ-to-Entities is great for pulling data out of databases, but not at all useful for inserting data into databases. Its missing functionality like fast bulk insert, culling of duplicates, and most of the advanced functionality you can achieve with pure SQL.
So: can you recommend some ADO.NET libraries that offer the sorts of advanced functionality that LINQ-to-Entities is missing?
The ADO.net SqlBulkCopy class enables quick, mass upload of records into a table:
You can use LINQ Entity Data Reader to write an IEnumerable list to a database using SQL Bulk Copy behind the scenes. You can use this library to bulk upload the results of a LINQ query straight into the database, because the results of a LINQ query are IEnumerable.
As there are LINQ-to-everything adapters, you can do tricks like use the LINQ to CSV library to grab the data out of a .csv file using a LINQ query, then the LINQ Entity Data Reader to bulk write this data directly into the database.
Case study:
Problem: read a .csv file quickly into a database. The connection to the SQL database is via LINQ-to-Entitys from C#.
Solution 1: Use LINQ to CSV library, construct a LINQ query to pull out the data you want, then write it in using the standard LINQ-to-Entity calls (ctx.AddObject(), ctx.SaveChanges(), etc). Time taken: 30 seconds for 20,000 records, as LINQ ends up generating a query for every single record (slooooow!!!!!).
Solution 2: Use LINQ to CSV library, construct a LINQ query to pull out the data you want into an IEnumerable, use LINQ Entity Data Reader to bulk write this data directly into the target data table. Time taken: 3 seconds for 20,000 records.
Solution 3: Use a a stored procedure with SQL "bulk copy". Time taken: 2 seconds for 20,000 records. However, this solution is quite brittle as it relies on a stored procedure, and SQL bulk copy is just not compatible with some .csv file formats. This method also requires that you use a staging table between the actual target table and the .csv file, to deal with file formatting issues and to help with normalization.
And, here is the source code for solution #2: