We are working on integrating Stanford NLP on our system and it is working fine, just that it causes gc overhead limit exceeded
. WE have the memory dump and will analyze it, but if ányone has some idea about this issue, please let us know. The server is quite powerful, SSD, 32gb RAM, Xeon E5 series.
Code we have:
String text = Jsoup.parse(groupNotes.getMnotetext()).text();
String lang;
try {
DetectorFactory.clear();
DetectorFactory.loadProfile("/home/deploy/profiles/");
Detector detector = DetectorFactory.create();
detector.append(text);
lang = detector.detect();
}catch (Exception ignored){
lang = "de";
}
LexicalizedParser lp;
if (lang.toLowerCase().equals("de")) {
lp = LexicalizedParser.loadModel(GERMAN_PCG_MODEL);
} else {
lp = LexicalizedParser.loadModel(ENGLISH_PCG_MODEL);
}
Tree parse;
parse = lp.parse(text);
List<String> stringList = new ArrayList<>();
List taggedWords = parse.taggedYield();
// System.out.println(taggedWords);
for (Object str : taggedWords) {
if (str.toString().contains("NN")) {
stringList.add(str.toString().replace("/NN", ""));
}
if (str.toString().contains("NNS")) {
stringList.add(str.toString().replace("/NNS", ""));
}
if (str.toString().contains("NNP")) {
stringList.add(str.toString().replace("/NNP", ""));
}
if (str.toString().contains("NNPS")) {
stringList.add(str.toString().replace("/NNPS", ""));
}
if (str.toString().contains("VB")) {
stringList.add(str.toString().replace("/VB", ""));
}
if (str.toString().contains("VBD")) {
stringList.add(str.toString().replace("/VBD", ""));
}
if (str.toString().contains("VBG")) {
stringList.add(str.toString().replace("/VBG", ""));
}
if (str.toString().contains("VBN")) {
stringList.add(str.toString().replace("/VBN", ""));
}
if (str.toString().contains("VBZ")) {
stringList.add(str.toString().replace("/VBZ", ""));
}
if (str.toString().contains("VBP")) {
stringList.add(str.toString().replace("/VBP", ""));
}
if (str.toString().contains("JJ")) {
stringList.add(str.toString().replace("/JJ", ""));
}
if (str.toString().contains("JJR")) {
stringList.add(str.toString().replace("/JJR", ""));
}
if (str.toString().contains("JJS")) {
stringList.add(str.toString().replace("/JJS", ""));
}
if (str.toString().contains("FW")) {
stringList.add(str.toString().replace("/FW", ""));
}
}
JVM options for Apache tomcat :
CATALINA_OPTS="$CATALINA_OPTS -server -Xms2048M -Xmx3048M -XX:OnOutOfMemoryError="/home/deploy/scripts/tomcatrestart.sh" -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemory -XX:HeapDumpPath=/path/to/date.hprof -XX:-UseGCOverheadLimit -Dspring.security.strategy=MODE_INHERITABLETHREADLOCAL"
Any ideas?
POM.xml :
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.7.0</version>
<classifier>models</classifier>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.7.0</version>
<classifier>models-german</classifier>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-parser</artifactId>
<version>3.7.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.7.0</version>
<scope>provided</scope>
</dependency>