c# DataSet.Fill dreadful performance issues with F

2019-03-04 17:00发布

问题:

REMARK I completely rewrite the question as while exploring options and getting insights, I realized the origin of my problem was not at all what I thought.

I use Firebird as a database engine and the standard .Net provider (v.5.11.0) to fetch data using following code:

// myBlob1 is BLOB SUB_TYPE 1 (text field) with some empty, but some
// VERY long stuff (xml-content) which exceeds VARCHAR(32765), but I removed
// those before performing my tests!!!
var tick = Stopwatch.StartNew();
DataTable dataTable = new DataTable();      
DbLive.Open();
FbCommand command = new FbCommand("SELECT myBlob FROM MY_TABLE", DbLive);
try {
    dataTable.BeginLoadData();
    FbDataAdapter fda = new FbDataAdapter(command);
    fda.Fill(dataTable);
    dataTable.EndLoadData();
}
command.Dispose();
DbLive.Close();
tick.Stop();
Console.WriteLine("Execution time: " + tick.ElapsedMilliseconds + " [ms]");

I am fetching about 30k rows with, in total, about 16Mb of data (according to database workbench's statistics) but only 20k rows with non-null data.

So I go fetching the whole thing using naive approach, then I used the cast (VARCHAR(8192)) approach (note that I removed all rows with > 8192 characters before doing the tests). Now, here are the results :

// Obtained when loading data over wifi, with a bandwidth of about 100Mbps)
// the performance on local machine did not make a big difference!)
No casting:  73287.0788 ms
With casting: 2360.2244 ms

There really is something bad going on with Blobs here. I tested performance using Firebird 3 and compression, but results weren't much better (a very small improvement, but difference is still of the same order of magnitude).

To the point: 1. Can I expect an improvement if I use another provider (e.g. www.ibprovider.com as suggested, but I was not able to test it) 2. Is this a known issue on other database engines, or is there hope to get improvements by changing to another engine?

回答1:

I got a bit confused about many things with the above, so let me clarify some insights:

  1. FbDataReader and the Dapper approaches did, in fact, NOT load the blob fields, so performance was indeed irrelevant;

  2. Using the CAST to VARCHAR(4096) did show significant performance improvements (note that here, I tested the text-blob's content to fully match, which did work):

    DataTable no cast: 73287.0788 ms

    FbDataReader cast: 2224.1387 ms

    DataTable cast: 2360.2244 ms

I even tried with Firebird 3 using compression, without significant improvement. So it is indeed the fact I use BLOB instead of large VARCHAR that caused the problem.

Note that I did not succeed in testing with another provider.

Bottom line : using BLOB SUBTYPE 1 kills my performance here, I should use VARCHAR instead.



回答2:

I encountered a similar performance issue when using Firebird 3 embedded and FBDataAdapter to read BLOB text fields. The query itself didn't seem to be the problem - it executed pretty quickly in Flamerobin. Than I noticed that Visual Studio was reporting a long series of:

Microsoft C++ exception: Firebird::status_exception at memory location 0x0000007DD010B920

although these exceptions are not seen by the calling .Net application.

It turned out that much of the slowdown was apparently due to the Visual Studio IDE trying to handle the many exceptions that are raised. There's a "quiet" exception raised for every BLOB field that is read. This occurs when reading "segments" of the BLOB via isc_get_segment, which is called repeatedly until all segements have been read. isc_get_segement apparently throws an exception behind the scenes when the end of data is encountered.

When I'm running under the Visual Studio IDE and debugger, this slows things way down, but performance is reasonable if I run the application on its own.