I am trying to stream the contents of a file.
The code works for smaller files, but with larger files, I get an Out of Memory error.
public void StreamEncode(FileStream inputStream, TextWriter tw)
{
byte[] base64Block = new byte[BLOCK_SIZE];
int bytesRead = 0;
try
{
do
{
// read one block from the input stream
bytesRead = inputStream.Read(base64Block, 0, base64Block.Length);
// encode the base64 string
string base64String = Convert.ToBase64String(base64Block, 0, bytesRead);
// write the string
tw.Write(base64String);
} while (bytesRead == base64Block.Length);
}
catch (OutOfMemoryException)
{
MessageBox.Show("Error -- Memory used: " + GC.GetTotalMemory(false) + " bytes");
}
}
I can isolate the problem and watch the memory used grow as it loops.
The problem seems to be the call to Convert.ToBase64String()
.
How can I free the memory for the converted string?
Edited from here down ... Here is an update.
I also created a new thread about this -- sorry I guess that was not the right thing to do.
Thanks for your great suggestions. From the suggestions, I shrunk the buffer size used to read from the file, and it looks like memory consumption is better, but I'm still seeing an OOM problem, and I'm seeing this problem with files sizes as small as 5MB. I potentially want to deal with files ten times larger.
My problem seems now to be with the use of TextWriter.
I create a request as follows [with a few edits to shrink the code]:
HttpWebRequest oRequest = (HttpWebRequest)WebRequest.Create(new Uri(strURL));
oRequest.Method = httpMethod;
oRequest.ContentType = "application/atom+xml";
oRequest.Headers["Authorization"] = getAuthHeader();
oRequest.ContentLength = strHead.Length + strTail.Length + longContentSize;
oRequest.SendChunked = true;
using (TextWriter tw = new StreamWriter(oRequest.GetRequestStream()))
{
tw.Write(strHead);
using (FileStream fileStream = new FileStream(strPath, FileMode.Open,
FileAccess.Read, System.IO.FileShare.ReadWrite))
{
StreamEncode(fileStream, tw);
}
tw.Write(strTail);
}
.....
Which calls into the routine:
public void StreamEncode(FileStream inputStream, TextWriter tw)
{
// For Base64 there are 4 bytes output for every 3 bytes of input
byte[] base64Block = new byte[9000];
int bytesRead = 0;
string base64String = null;
do
{
// read one block from the input stream
bytesRead = inputStream.Read(base64Block, 0, base64Block.Length);
// encode the base64 string
base64String = Convert.ToBase64String(base64Block, 0, bytesRead);
// write the string
tw.Write(base64String);
} while (bytesRead !=0 );
}
Should I use something other than TextWriter because of the potential large content? It seems very convenient for being able to create the whole payload of the request.
Is this totally the wrong approach? I want to be able to support very large files.
If you use a BLOCK_SIZE
that is 32 kB or more, you will be creating strings that are 85 kB or more, which are allocated on the large objects heap. Short lived objects should live in the regular heaps, not the large objects heap, so that may be the reason for the memory problems.
Also, I see two potential problems with the code:
The base64 encoding uses padding at the end of the string, so if you chop up a stream into bits and convert to base64 strings, and then write the strings to a stream, you don't end up with a single base64 stream.
Checking if the number of bytes read using the Read
method is the same as the number of requested bytes is not the proper way of checking for the end of the stream. The Read
method may read less bytes than requested any time it feels like it, and the correct way to check for the end of the stream is when the method returns zero.
Keep in mind that when converting data to base64, the resulting string will be 33% longer (assuming the input size is a multiple of 3, which is probably a good idea in your case). If BLOCK_SIZE is too large there might not be enough contiguous memory to hold the resulting base-64 string.
Try reducing BLOCK_SIZE, so that each piece of the base-64 is smaller, making it easier to allocate the memory for it.
However, if you're using an in-memory TextWriter like a StringWriter, you may run into the same problem, because it would fail to find a block of memory large enough to hold the internal buffer. If you're writing to something like a file, this should not be a problem, though.
Wild guess...HttpWebRequest.AllowWriteStreamBuffering is by default true, and according to MSDN "setting AllowWriteStreamBuffering to true might cause performance problems when uploading large datasets because the data buffer could use all available memory". Try setting
oRequest.AllowWriteStreamBuffering = false
and see what happens.
Try pulling your base64String declaration out of the loop. If that still doesn't help, try calling the garbage collector after so many iterations.
GC.Collect();
GC.WaitForPendingFinalizers();
Try reducing the block size or avoid assigning the result of the Convert call to a variable:
bytesRead = inputStream.Read(base64Block, 0, base64Block.Length);
tw.Write(Convert.ToBase64String(base64Block, 0, bytesRead));
Code looks ok from memory usage point of view, but I think you are passing writer for Memory-based stream (like MemoryStream) and storing data there causes OOM exception.
If BLOCK_SIZE is above 86Kb allocations will happen on Large Objects Heap (LOH), it will change behavior of allocations, but should not cause OOM by itself.
Note: your end condition is not correct - should be bytesRead != 0, in genral Read can return less bytes than asked even if there are more data left. Also FileStream is never doing it to my knowledge.
I would write the result to a temp file first.
using (TextWriter tw = new StreamWriter(oRequest.GetRequestStream()))
{
tw.Write(strHead);
var tempPath = Path.GetTempFileName();
try
{
using (var input = File.OpenRead(strPath))
using (var output = File.Open(
tempPath, FileMode.Open, FileAccess.ReadWrite))
{
StreamEncode(fileStream, output);
output.Seek(0, SeekOrigin.Begin);
CopyTo(output, ((StreamWriter)tw).BaseStream);
}
}
finally
{
File.Delete(tempPath);
}
tw.Write(strTail);
}
public void StreamEncode(Stream inputStream, Stream output)
{
// For Base64 there are 4 bytes output for every 3 bytes of input
byte[] base64Block = new byte[9000];
int bytesRead = 0;
string base64String = null;
using (var tw = new StreamWriter(output))
{
do
{
// read one block from the input stream
bytesRead = inputStream.Read(base64Block, 0, base64Block.Length);
// encode the base64 string
base64String = Convert.ToBase64String(base64Block, 0, bytesRead);
// write the string
tw.Write(base64String);
} while (bytesRead !=0 );
}
}
static void CopyTo(Stream input, Stream output)
{
const int length = 10240;
byte[] buffer = new byte[length];
int count = 0;
while ((count = input.Read(buffer, 0, length)) > 0)
output.Write(buffer, 0, count);
}