Stanford Parser is now 'thread-safe' as of version 2.0 (02.03.2012). I am currently running the command line tools and cannot figure out how to make use of my multiple cores by threading the program.
In the past, this question has been answered with "Stanford Parser is not thread-safe", as the FAQ still says. I am hoping to find someone who has had success threading the latest version.
I have tried using -t flag (-t10 and -tLLP) since that was all I could find in my searches, but both throw errors.
An example of a command I issue is:
java -cp stanford-parser.jar edu.stanford.nlp.parser.lexparser.LexicalizedParser \
-outputFormat "oneline" ./grammar/englishPCFG.ser.gz ./corpus > corpus.lex
Starting with version 2.0.5, you can now easily use multiple threads with the option
-nthreads k
. For example, your command can be like this:(Releases of version 2 prior to 2013 had no way to enable multithreading from the command-line, but only when using the API.)
Internally, you can simultaneously run as many parsing threads inside one JVM process as you want. You can do this either by getting and using multiple LexicalizedParserQuery objects (via the
parserQuery()
method) or implicitly by callingapply(...)
orparseTree(...)
off one LexicalizedParser. The-nthreads k
option does this for you by sending successive sentences to different parsers using theExecutor
framework. You can also simultaneously create multiple LexicalizedParser's, e.g., for parsing different languages.Multiple LexicalizedparserQuery objects share the same grammar (LexicalizedParser), but the memory space savings aren't huge, as most of the memory goes to the transient structures used in chart parsing. So, if you are running lots of parsing threads concurrently, you will need to give a lot of memory to the JVM, as in the example above.
p.s. Sorry, yes, some of the documentation still needs updating. But -tLPP is one flag for specifying language-specific resources. The Stanford Parser has no -t flag.