Parse raw text with MaltParser in Java

2019-05-10 09:17发布

问题:

I found that NLKT in python does it via *raw_parse* function but I need to use Java. I found cleartk has a MaltParser wrapper but there is no documentation about it. I'm looking for a function or a project that first converts raw English text to conll file that MaltParser can use and parses it with MaltParser. Any help is appreciated.

回答1:

There are examples coming with the MaltParser 1.7.2 distribution in the folder examples/apiexamples/srcex.

However, these examples only show how to run the MaltParser programmatically after tokenization and pos-tagging have already been performed (and after the output of these steps has been converted to a CONLL-like format).

Since I currently cannot offer a better (simpler/shorter) alternative, at least I could share with you a link to a Groovy script which performs tokenization, part-of-speech tagging (using OpenNLP) and dependency parsing (using MaltParser). The tools are made interoperable using UIMA. If one is familiar with Maven, it should be quite straight forward to derive a Java version of that script.

Mind, this is not the best answer, but at this point possibly better than nothing.

Note: I'm a developer on both, Apache UIMA and DKPro Core (the project to which the link points).