I'm developing a multi-part MIME parser using F# and FParsec. I'm developing iteratively, and so this is highly unrefined, brittle code--it only solves my first immediate problem. Red, Green, Refactor.
I'm required to parse a stream rather than a string, which is really throwing me for a loop. Given that constraint, to the best of my understanding, I need to call a parser recursively. How to do that is beyond my ken, at least with the way I've proceeded thus far.
namespace MultipartMIMEParser
open FParsec
open System.IO
type private Post = { contentType : string
; boundary : string
; subtype : string
; content : string }
type MParser (s:Stream) =
let ($) f x = f x
let ascii = System.Text.Encoding.ASCII
let str cs = System.String.Concat (cs:char list)
let q = "\""
let qP = pstring q
let pSemicolon = pstring ";"
let manyNoDoubleQuote = many $ noneOf q
let enquoted = between qP qP manyNoDoubleQuote |>> str
let skip = skipStringCI
let pContentType = skip "content-type: "
>>. manyTill anyChar (attempt $ preturn () .>> pSemicolon)
|>> str
let pBoundary = skip " boundary=" >>. enquoted
let pSubtype = opt $ pSemicolon >>. skip " type=" >>. enquoted
let pContent = many anyChar |>> str // TODO: The content parser needs to recurse on the stream.
let pStream = pipe4 pContentType pBoundary pSubtype pContent
$ fun c b t s -> { contentType=c; boundary=b; subtype=t; content=s }
let result s = match runParserOnStream pStream () "" s ascii with
| Success (r,_,_) -> r
| Failure (e,_,_) -> failwith (sprintf "%A" e)
let r = result s
member p.ContentType = r.contentType
member p.Boundary = r.boundary
member p.ContentSubtype = r.subtype
member p.Content = r.content
The first line of the example POST follows:
content-type: Multipart/related; boundary="RN-Http-Body-Boundary"; type="multipart/related"
It spans a single line in the file. Further sub-parts in the content include content-type
values that span multiple lines, so I know I'll have to refine my parsers if I am to reuse them.
Somehow I've got to call pContent
with the (string?) results of pBoundary
so that I can split the rest of the stream on the appropriate boundaries, and then somehow return multiple parts for the content of the post, each of which will be a separate post, with headers and content (which will obviously have to be something other than a string). My head is spinning. This code already seems far too complex to parse a single line.
Much appreciation for insight and wisdom!