I've got the lovely task of working out how to handle large files being loaded into our application's script editor (it's like VBA for our internal product for quick macros). Most files are about 300-400 KB which is fine loading. But when they go beyond 100 MB the process has a hard time (as you'd expect).
What happens is that the file is read and shoved into a RichTextBox which is then navigated - don't worry too much about this part.
The developer who wrote the initial code is simply using a StreamReader and doing
[Reader].ReadToEnd()
which could take quite a while to complete.
My task is to break this bit of code up, read it in chunks into a buffer and show a progressbar with an option to cancel it.
Some assumptions:
- Most files will be 30-40 MB
- The contents of the file is text (not binary), some are Unix format, some are DOS.
- Once the contents is retrieved we work out what terminator is used.
- No-one's concerned once it's loaded the time it takes to render in the richtextbox. It's just the initial load of the text.
Now for the questions:
- Can I simply use StreamReader, then check the Length property (so ProgressMax) and issue a Read for a set buffer size and iterate through in a while loop WHILST inside a background worker, so it doesn't block the main UI thread? Then return the stringbuilder to the main thread once it's completed.
- The contents will be going to a StringBuilder. can I initialise the StringBuilder with the size of the stream if the length is available?
Are these (in your professional opinions) good ideas? I've had a few issues in the past with reading content from Streams, because it will always miss the last few bytes or something, but I'll ask another question if this is the case.
If you read the performance and benchmark stats on this website, you'll see that the fastest way to read (because reading, writing, and processing are all different) a text file is the following snippet of code:
All up about 9 different methods were bench marked, but that one seem to come out ahead the majority of the time, even out performing the buffered reader as other readers have mentioned.
This should be enough to get you started.
You can improve read speed by using a BufferedStream, like this:
March 2013 UPDATE
I recently wrote code for reading and processing (searching for text in) 1 GB-ish text files (much larger than the files involved here) and achieved a significant performance gain by using a producer/consumer pattern. The producer task read in lines of text using the
BufferedStream
and handed them off to a separate consumer task that did the searching.I used this as an opportunity to learn TPL Dataflow, which is very well suited for quickly coding this pattern.
Why BufferedStream is faster
December 2014 UPDATE: Your Mileage May Vary
Based on the comments, FileStream should be using a BufferedStream internally. At the time this answer was first provided, I measured a significant performance boost by adding a BufferedStream. At the time I was targeting .NET 3.x on a 32-bit platform. Today, targeting .NET 4.5 on a 64-bit platform, I do not see any improvement.
Related
I came across a case where streaming a large, generated CSV file to the Response stream from an ASP.Net MVC action was very slow. Adding a BufferedStream improved performance by 100x in this instance. For more see Unbuffered Output Very Slow
An iterator might be perfect for this type of work:
You can call it using the following:
As the file is loaded, the iterator will return the progress number from 0 to 100, which you can use to update your progress bar. Once the loop has finished, the StringBuilder will contain the contents of the text file.
Also, because you want text, we can just use BinaryReader to read in characters, which will ensure that your buffers line up correctly when reading any multi-byte characters (UTF-8, UTF-16, etc.).
This is all done without using background tasks, threads, or complex custom state machines.
My file is over 13 GB:
The bellow link contains the code that read a piece of file easily:
Read a large text file
More information
You say you have been asked to show a progress bar while a large file is loading. Is that because the users genuinely want to see the exact % of file loading, or just because they want visual feedback that something is happening?
If the latter is true, then the solution becomes much simpler. Just do
reader.ReadToEnd()
on a background thread, and display a marquee-type progress bar instead of a proper one.I raise this point because in my experience this is often the case. When you are writing a data processing program, then users will definitely be interested in a % complete figure, but for simple-but-slow UI updates, they are more likely to just want to know that the computer hasn't crashed. :-)