Chunked Parsing with FParsec

2019-01-26 14:07发布

Is it possible to submit input to an FParsec parser in chunks, as from a socket? If not, is it possible to retrieve the current result and unparsed portion of an input stream so that I might accomplish this? I'm trying to run the chunks of input coming in from SocketAsyncEventArgs without buffering entire messages.

Update

The reason for noting the use of SocketAsyncEventArgs was to denote that sending data to a CharStream might result in asynchronous access to the underlying Stream. Specifically, I'm looking at using a circular buffer to push the data coming in from the socket. I remember the FParsec documentation noting that the underlying Stream should not be accessed asynchronously, so I had planned on manually controlling the chunked parsing.

Ultimate questions:

  1. Can I use a circular buffer under my Stream passed to the CharStream?
  2. Do I not need to worry myself with manually controlling the chunking in this scenario?

1条回答
老娘就宠你
2楼-- · 2019-01-26 14:33

The normal version of FParsec (though not the Low-Trust version) reads the input chunk-wise, or "block-wise", as I call it in the CharStream documentation. Thus, if you construct a CharStream from a System.IO.Stream and the content is large enough to span multiple CharStream blocks, you can start parsing before you've fully retrieved the input.

Note however, that the CharStream will consume the input stream in chunks of a fixed (but configurable) size, i.e. it will call the Read method of the System.IO.Stream as often as is necessary to fill a complete block. Hence, if you parse the input faster than you can retrieve new input, the CharStream may block even though there is already some unparsed input, because there's not yet enough input to fill a complete block.

Update

The answer(s) to your ultimate questions: 42.

  • How you implement the Stream from which you construct the CharStream is entirely up to you. The restriction you're remembering that excludes parallel access only applies to the CharStream class, which isn't thread safe.

  • Implementing the Stream as a circular buffer will likely restrict the maximum distance over which you can backtrack.

  • The block size of the CharStream influences how far you can backtrack when the Stream does not support seeking.

  • The simplest way to parse input asynchronously is to do the parsing in an async task (i.e. on a background thread). In the task you could simply read the socket synchronously, or, if you don't trust the buffering by the OS, you could use a stream class like the BlockingStream described in the article you linked in the second comment below.

  • If the input can be easily separated into independent chunks (e.g. lines for a line-based text format), it might be more efficient to chunk it up yourself and then parse the input chunk by chunk.

查看更多
登录 后发表回答