What does the buffered parameter do in Dapper dot

2019-03-11 14:46发布

问题:

Dapper dot net has a buffer parameter (a bool), but as far as I can tell the only thing it does is cast the result to a list before returning it.

As per the documentation:

Dapper's default behavior is to execute your sql and buffer the entire reader on return. This is ideal in most cases as it minimizes shared locks in the db and cuts down on db network time.

However when executing huge queries you may need to minimize memory footprint and only load objects as needed. To do so pass, buffered: false into the Query method.

I'm not sure how casting the result to a list accomplishes this. Am I missing something? My only idea is that it is supposed to set the CommandBehavior for the ExecuteReader to CommandBehavior.SequentialAccess (but it doesn't).

回答1:

but as far as I can tell the only thing it does is cast the result to a list before returning it

You aren't missing anything. That is the key difference. Except it isn't a cast as such: the actual returned object is very different. Basically, there are two ways of reading data:

  • in a streaming API each element is yielded individually; this is very memory efficient, but if you do lots of subsequent processing per item, mean that your connection / command could be "active" for an extended time
  • in a buffered API all the rows are read before anything is yielded

If you are reading a very large amount of data (thousands to millions of rows), a non-buffered API may be preferable. Otherwise lots of memory is used, and there may be noticeable latency before even the first row is available. However, in most common scenarios the amount of data read is within reasonable limits, so it is reasonable to push it into a list before handing it to the caller. That means that the command / reader etc has completed before it returns.

As a side note, buffered mode also avoids the oh-so-common "there is already an open reader on the connection" (or whatever the exact phrasing is).



回答2:

I have to disagree with @chris-marisic on this... I ran into multiple "Out Of Memory" exceptions at that exact line (data.ToList()) when using buffered:true. It was not a "zillion rows X bazillion columns" query, just a regular 5-6k rows SQL-result with about 30 columns.

It really depends on your configuration. E.g. whether your SQL and IIS run on same physical machine or not. And how much memory is installed on the IIS machine, and what is the page file setting etc. If the web-server has 2 GB or less - consider setting "buffered:false" for super-heavy reports.



回答3:

In practice it is better to never use buffered: false.

I've found reading even many millions of rows that it is both faster and more memory efficient to use buffered results than unbuffered. Perhaps there is a cross over point if your tables have 500 columns and you're reading 10s of millions or 100s of millions of rows.

If your result sets are smaller than many billions of values it is not worth using buffered: false for any reason.

I was shocked during actual analysis that reading gigabytes of data from Sql Server was both faster (2-6x faster) and more memory efficient in standard buffered mode. The performance increase even accounts for the most minute operation possible, adding an object to a sparse array by index to an array that does not resize. Using a multi-gigabyte sparse array switching from unbuffered to buffered saw the 2x improvement in load time. Writing to a dictionary using buffered saw the 6x improvement in load time when inserting millions of records (dictionary used the table's int PK as key so as basic of a hashcode calculation as possible).

As with everything regarding performance you always must analyze. However I can tell you with a very high level of certainty always start with the default buffered behavior of Dapper.