F#, FParsec, and Calling a Stream Parser Recursive

2019-03-01 00:41发布

问题:

Thank you for the replies to my first post and my second post on this project. This question is basically the same question as the first, but with my code updated according to the feedback received on those two questions. How do I call my parser recursively?

I'm scratching my head and staring blankly at the code. I've no idea where to go from here. That's when I turn to stackoverflow.

I've included in code comments the compile-time errors I'm receiving. One stumbling block may be my discriminated union. I've not worked with discriminated unions much, so I may be using mine incorrectly.

The example POST I'm working with, bits of which I've included in my previous two questions, consists of one boundary that includes a second post with a new boundary. That second post includes several additional parts separated by the second boundary. Each of those several additional parts is a new post consisting of headers and XML.

My goal in this project is to build a library to be used in our C# solution, with the library taking a stream and returning the POST parsed into headers and parts recursively. I really want F# to shine here.

namespace MultipartMIMEParser

open FParsec
open System.IO

type Header = { name  : string
              ; value : string
              ; addl  : (string * string) list option }

type Content = Content of string
             | Post of Post list
and Post = { headers : Header list
           ; content : Content }

type UserState = { Boundary : string }
  with static member Default = { Boundary="" }

module internal P =
  let ($) f x = f x
  let undefined = failwith "Undefined."
  let ascii = System.Text.Encoding.ASCII
  let str cs = System.String.Concat (cs:char list)

  let makeHeader ((n,v),nvps) = { name=n; value=v; addl=nvps}

  let runP p s = match runParserOnStream p UserState.Default "" s ascii with
                 | Success (r,_,_) -> r
                 | Failure (e,_,_) -> failwith (sprintf "%A" e)

  let blankField = parray 2 newline

  let delimited d e =
      let pEnd = preturn () .>> e
      let part = spaces
                 >>. (manyTill
                      $ noneOf d
                      $ (attempt (preturn () .>> pstring d)
                                  <|> pEnd)) |>> str
       in part .>>. part

  let delimited3 firstDelimiter secondDelimiter thirdDelimiter endMarker =
      delimited firstDelimiter endMarker
      .>>. opt (many (delimited secondDelimiter endMarker
                      >>. delimited thirdDelimiter endMarker))

  let isBoundary ((n:string),_) = n.ToLower() = "boundary"

  let pHeader =
      let includesBoundary (h:Header) = match h.addl with
                                        | Some xs -> xs |> List.exists isBoundary
                                        | None    -> false
      let setBoundary b = { Boundary=b }
       in delimited3 ":" ";" "=" blankField
          |>> makeHeader
          >>= fun header stream -> if includesBoundary header
                                   then
                                     stream.UserState <- setBoundary (header.addl.Value
                                                                      |> List.find isBoundary
                                                                      |> snd)
                                     Reply ()
                                   else Reply ()

  let pHeaders = manyTill pHeader $ attempt (preturn () .>> blankField)

  let rec pContent (stream:CharStream<UserState>) =
      match stream.UserState.Boundary with
      | "" -> // Content is text.
              let nl = System.Environment.NewLine
              let unlines (ss:string list) = System.String.Join (nl,ss)
              let line = restOfLine false
              let lines = manyTill line $ attempt (preturn () .>> blankField)
               in pipe2 pHeaders lines
                        $ fun h c -> { headers=h
                                     ; content=Content $ unlines c }
      | _  -> // Content contains boundaries.
              let b = "--" + stream.UserState.Boundary
              // VS complains about pContent in the following line: 
              // Type mismatch. Expecting a
              //    Parser<'a,UserState>
              // but given a
              //    CharStream<UserState> -> Parser<Post,UserState>
              // The type 'Reply<'a>' does not match the type 'Parser<Post,UserState>'
              let p = pipe2 pHeaders pContent $ fun h c -> { headers=h; content=c }
               in skipString b
                  >>. manyTill p (attempt (preturn () .>> blankField))
                  // VS complains about Content.Post in the following line: 
                  // Type mismatch. Expecting a
                  //     Post list -> Post
                  // but given a
                  //     Post list -> Content
                  // The type 'Post' does not match the type 'Content'
                  |>> Content.Post

  // VS complains about pContent in the following line: 
  // Type mismatch. Expecting a
  //    Parser<'a,UserState>    
  // but given a
  //    CharStream<UserState> -> Parser<Post,UserState>
  // The type 'Reply<'a>' does not match the type 'Parser<Post,UserState>'
  let pStream = runP (pipe2 pHeaders pContent $ fun h c -> { headers=h; content=c })


type MParser (s:Stream) =
  let r = P.pStream s

  let findHeader name =
    match r.headers |> List.tryFind (fun h -> h.name.ToLower() = name) with
    | Some h -> h.value
    | None   -> ""

  member p.Boundary =
    let header = r.headers
                 |> List.tryFind (fun h -> match h.addl with
                                           | Some xs -> xs |> List.exists P.isBoundary
                                           | None    -> false)
     in match header with
        | Some h -> h.addl.Value |> List.find P.isBoundary |> snd
        | None   -> ""
  member p.ContentID = findHeader "content-id"
  member p.ContentLocation = findHeader "content-location"
  member p.ContentSubtype = findHeader "type"
  member p.ContentTransferEncoding = findHeader "content-transfer-encoding"
  member p.ContentType = findHeader "content-type"
  member p.Content = r.content
  member p.Headers = r.headers
  member p.MessageID = findHeader "message-id"
  member p.MimeVersion = findHeader "mime-version"

EDIT

In response to the feedback I've received thus far (thank you!), I made the following adjustments, receiving the errors annotated:

let rec pContent (stream:CharStream<UserState>) =
    match stream.UserState.Boundary with
    | "" -> // Content is text.
            let nl = System.Environment.NewLine
            let unlines (ss:string list) = System.String.Join (nl,ss)
            let line = restOfLine false
            let lines = manyTill line $ attempt (preturn () .>> blankField)
             in pipe2 pHeaders lines
                      $ fun h c -> { headers=h
                                   ; content=Content $ unlines c }
    | _  -> // Content contains boundaries.
            let b = "--" + stream.UserState.Boundary
            // The following complaint is about `pContent stream`:
            // This expression was expected to have type
            //     Reply<'a>    
            // but here has type
            //     Parser<Post,UserState>
            let p = pipe2 pHeaders (fun stream -> pContent stream) $ fun h c -> { headers=h; content=c }
             in skipString b
                >>. manyTill p (attempt (preturn () .>> blankField))
                // VS complains about the line above:
                // Type mismatch. Expecting a
                //     Parser<Post,UserState>    
                // but given a
                //     Parser<'a list,UserState>    
                // The type 'Post' does not match the type ''a list'

// See above complaint about `pContent stream`. Same complaint here.
let pStream = runP (pipe2 pHeaders (fun stream -> pContent stream) $ fun h c -> { headers=h; content=c })

I tried throwing in Reply ()s, but they just returned parsers, meaning c above became a Parser<...> rather than Content. That seemed to have been a step backwards, or at least in the wrong direction. I admit my ignorance, though, and welcome correction!

回答1:

I can help with one of the errors.

F# generally binds arguments left to right, so you need to use either parentheses around the recursive calls to pContent or a pipe-backward operator <| to show that you want to evaluate the recursive call and bind the return value.

It's also worth noting that <| is the same as your $ operator.

Content.Post is not a constructor for a Post object. You need a function to accept a Post list and return a Post. (Does something from the List module do what you need?)



回答2:

My first answer was completely wrong, but I'd thought I'd leave it up.

The types Post and Content are defined as:

type Content =
    | Content of string
    | Post of Post list
and Post =
    { headers : Header list
    ; content : Content }

Post is a Record, and Content is a Discriminated Union.

F# treats the cases for Discriminated Unions as a separate namespace from types. So Content is different from Content.Content, and Post is different from Content.Post. Because they are different, having the same identifier is confusing.

What is pContent supposed to be returning? If it's supposed to be returning the Discriminated Union Content, you need to wrap the Post record you are returning in the first case in the Content.Post case i.e.

$ fun h c -> Post [ { headers=h
                    ; content=Content $ unlines c } ]

(F# is able to infer that 'Post' refers to Content.Post case, instead of the Post record type here.)