Out of Memory Exception when using File Stream Wri

2019-02-27 19:59发布

问题:

I have the following code that throws an out of memory exception when writing large files. Is there something I'm missing?

I am not sure why it is throwing an out of memory error as I thought the Filestream would only use a maximum of 4096 bytes for the buffer? I am not entirely sure what it means by the Buffer to be honest and any advice would be appreciated.

 public static async Task CreateRandomFile(string pathway, int size, IProgress<int> prog)
    {
        byte[] fileSize = new byte[size];
        new Random().NextBytes(fileSize);
        await Task.Run(() =>
           {
               using (FileStream fs = File.Create(pathway,4096))
               {

                   for (int i = 0; i < size; i++)
                   {
                       fs.WriteByte(fileSize[i]);
                       prog.Report(i);
                   }

               }

           }
       );
    }

    public static void p_ProgressChanged(object sender, int e)
    {
        int pos = Console.CursorTop;
        Console.WriteLine("Progress Copied: " + e);
        Console.SetCursorPosition (0, pos);
    }

    public static void Main()
    {
        Console.WriteLine("Testing CopyLearning");
        //CopyFile()
        Progress<int> p = new Progress<int>();
        p.ProgressChanged += p_ProgressChanged;
        Task ta = CreateRandomFile(@"D:\Programming\Testing\RandomFile.asd", 99999999, p);
        ta.Wait();
    }

Edit: the 99,999,999 was just created to make a 99MB file

Note: I have commented out prog.Report(i) and it will work fine. It seems for some reason, the error occurs at the line

Console.writeline("Progress Copied: " + e);

I am not entirely sure why this causes an error? So the error might have been caused because of the progressEvent?

Edit 2: I have followed advice to change the code such that it reports progress every 4000 Bytes by using the following:

 if (i%4000==0)
     prog.Report(i);

For some reason. I am now able to write files up to 900MBs fine.

I guess the question is, why would the "Edit 2"'s code allow it to write up to 900MB just fine? Is it because it's reporting progress and writing to the console up to 4000x less than before? I didn't realize the Console would take up so much memory especially because I'm assuming all it's doing is outputting "Progress Copied"?

Edit 3:

For some reason when I change the following line as follows:

 for (int i = 0; i < size; i++)
      {
          fs.WriteByte(fileSize[i]);
          Console.Writeline(i)
          prog.Report(i);
      }

where there is a "Console.Writeline()" before the prog.Report(i), it would work fine and copy the file, albeit take a very long time to do so. This leads me to believe that this is a Console related issue for some reason but I am not sure as to what.

回答1:

           fs.WriteByte(fileSize[i]);
           prog.Report(i);

You created a fire-hose problem. After deadlocks and threading races, probably the 3rd most likely problem caused by threads. And just as hard to diagnose.

Easiest to see by using the debugger's Debug + Windows + Threads window and look at thread that is executing CreateRandomFile(). With some luck, you'll see it is completed and has written all 99MB bytes. But the progress reported on the console is far behind this, having only reported 125KB bytes written, give or take.

Core issue is the way Progress<>.Report() works. It uses SynchronizationContext.Post() to invoke the ProgressChanged event handler. In a console mode app that will call ThreadPool.QueueUserWorkItem(). That's quite fast, your CreateRandomFile() method won't be bogged down much by it.

But the event handler itself is quite a lot slower, console output is not very fast. So in effect, you are adding threadpool work requests at an enormous rate, 99 million of them in a handful of seconds. No way for the threadpool scheduler to keep up, you'll have roughly 4 of them executing at the same time. All competing to write to the console as well, only one of them can acquire the underlying lock.

So it is the threadpool scheduler that causes OOM, forced to store so many work requests.

And sure, when you call Report() less frequently then the fire-hose problem is a lot less worse. Not actually that simple to ensure it never causes a problem, although directly calling Console.Write() is an obvious fix. Ultimately simple, create a usable UI that is useful to a human. Nobody likes a crazily scrolling window or a blur of text. Reporting progress no more frequently than 20 times per second is plenty good enough for the user's eyes, the console has no trouble keeping up with that.