We are consuming large JSON streams from an HTTP Post request. The goal is to stream the incoming body as JSON using JsonTextReader and extract the embedded base64 encoded binary files to disk. In XML, an equivalent method might be XMLReader.ReadElementContentAsBase64Async.
Using JSON.NET, as we iterative how do we send the each item of the encodedImages array into a FileStream without holding the whole string in memory.
Example JSON Object:
{
"company":"{clientCompany}",
"batchName":"{clientBatchName}",
"fileType":"{clientFileType}",
"encodedImages":[
"{base64encodedimage}",
"{base64encodedimage}",
"{base64encodedimage}"
],
"customFields":{
"{clientCustomField1}":"{clientCustomValue}",
"{clientCustomField2}":"{clientCustomValue}",
"{clientCustomField3}":"{clientCustomValue}",
"{clientCustomField4}":"{clientCustomValue}"
}
}
It seems like your problem can be addressed in two parts: 1) How to parse and process the JSON in a memory efficient way, and 2) How to perform base-64 decoding iteratively
1) Memory efficient JSON parsing:
Assuming you can use the Newtonsoft JSON.net library, the
ReadAsBytes
orReadAsBytesAsync
methods of theJsonReader
class are going to be your best friends, as they allow iterative stream-based processing which will allows you to minimize your memory footprint during JSON parsing and processing. To avoid writing low-level parsing code for your entire document, you might consider writing aJsonConverter
implementation for theencodedImages
node of your example.2) iterative base-64 decoding
Most base-64 decoding implementations decode a string in its entirety. Support for iterative buffered decoding (as support by the
ReadElementContentAsBase64Async
method ofXmlReader
) requires some state to be maintained. Digging into the implementation of that class, you will find the internalBase64Decoder
class which does precisely what you need.