Efficient Context-Free Grammar parser, preferably-第2页回答

I am in need of parsing a small subset of English for one of my project, described as a context-free grammar with (1-level) feature structures (example) and I need to do it efficiently .

Right now I'm using NLTK's parser which produces the right output but is very slow. For my grammar of ~450 fairly ambiguous non-lexicon rules and half a million lexical entries, parsing simple sentences can take anywhere from 2 to 30 seconds, depending it seems on the number of resulting trees. Lexical entries have little to no effect on performance.

Another problem is that loading the (25MB) grammar+lexicon at the beginning can take up to a minute.

From what I can find in literature, the running time of the algorithm used to parse such a grammar (Earley or CKY) should be linear to the size of the grammar and cubic to the size of the input token list. My experience with NLTK indicates that ambiguity is what hurts the performance most, not the absolute size of the grammar.

So now I'm looking for a CFG parser to replace NLTK. I've been considering PLY but I can't tell whether it supports feature structures in CFGs, which are required in my case, and the examples I've seen seem to be doing a lot of procedural parsing rather than just specifying a grammar. Can anybody show me an example of PLY both supporting feature structs and using a declarative grammar?

I'm also fine with any other parser that can do what I need efficiently. A Python interface is preferable but not absolutely necessary.

标签： python parsing nlp grammar nltk

8条回答

聊天终结者

2楼-- · 2019-01-31 14:16

Try running it on PyPy, it might be a lot faster.

0人赞添加讨论(0) 举报

Anthone

3楼-- · 2019-01-31 14:24

If it can be expressed as a PEG language (I don't think all CFGs can, but supposedly many can), then you might use pyPEG, which is supposed to be linear-time when using a packrat parsing implementation (although potentially prohibitive on memory usage).

I don't have any experience with it as I am just starting to research parsing and compilation again after a long time away from it, but I am reading some good buzz about this relatively up-to-date technique. YMMV.

0人赞添加讨论(0) 举报

上一页 1 2

Efficient Context-Free Grammar parser, preferably

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间