Is there an established way to write parsers that

Say I want to parse a file in language X. Really, I'm only interested in a small part of the information within. It's easy enough to write a parser in one of Haskell's many eDSLs for that purpose (e.g. Megaparsec).

data Foo = Foo Int  -- the information I'm after.

parseFoo :: Parsec Text Foo
parseFoo = ...

That readily gives rise to a function getFoo :: Text -> Maybe Foo.

But now I would also like to modify the source of the Foo information, i.e. basically I want to implement

changeFoo :: (Foo -> Foo) -> Text -> Text

with the properties

changeFoo id ≡ id
getFoo . changeFoo f ≡ fmap f . getFoo

It's possible to do that by changing the result of the parser to something like a lens

parseFoo :: Parsec Text (Foo, Foo -> Text)
parseFoo = ...

but that makes the definition a lot more cumbersome – I can't just gloss over irrelevant information anymore, but need to store the match of every string subparse and manually reassemble it.

This could be somewhat automated by keeping the string-reassembage in a StateT layer around the parser monad, but I couldn't just use the existing primitive parsers.

Is there an existing solution for this problem?

标签： parsing haskell parsec bijection

2条回答

Bombasti

2楼-- · 2020-08-09 11:57

Is this a case of "bidirectional transformation"? E.g., http://ceur-ws.org/Vol-1571/

In particular, "Invertible Syntax Descriptions: Unifying Parsing and Pretty Printing" by Rendel and Osterman http://dblp.org/rec/conf/haskell/RendelO10 , Haskell Symposium 2010 (cf. http://lambda-the-ultimate.org/node/4191 )

0人赞添加讨论(0) 举报

放荡不羁爱自由

3楼-- · 2020-08-09 12:18

A solution implemented in Haskell? I don't know of one; they may exist.

In general, though, one can store enough information to regenerate a legal version of the program that resembles the original to an arbitrary degree, by storing "formatting" information with collected tokens. In the limit, the format information is the original string for the token; any approximation of that will give successively less accurate answers.

If you keep whitespace as explicit tokens in the parse tree, in the limit you can even regenerate that. Whether that is useful likely depends on the application. In general, I think this is overkill.

Details on what/how to capture and how to regenerate can be found in my SO answer: Compiling an AST back to source code

0人赞添加讨论(0) 举报

Is there an established way to write parsers that

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间