I want to identify following as SKILL using stanfordNLP's TokensRegexNERAnnotator.
AREAS OF EXPERTISE
Areas of Knowledge
Computer Skills
Technical Experience
Technical Skills
There are many more sequence of text like above.
Code -
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.addAnnotator(new TokensRegexNERAnnotator("./mapping/test_degree.rule", true));
String[] tests = {"Bachelor of Arts is a good degree.", "Technical Skill is a must have for Software Developer."};
List tokens = new ArrayList<>();
// traversing each sentence from array of sentence.
for (String txt : tests) {
System.out.println("String is : " + txt);
// create an empty Annotation just with the given text
Annotation document = new Annotation(txt);
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
/* Next we can go over the annotated sentences and extract the annotated words,
Using the CoreLabel Object */
for (CoreMap sentence : sentences) {
for (CoreLabel token : sentence.get(TokensAnnotation.class)) {
System.out.println("annotated coreMap sentences : " + token);
// Extracting NER tag for current token
String ne = token.get(NamedEntityTagAnnotation.class);
String word = token.get(CoreAnnotations.TextAnnotation.class);
System.out.println("Current Word : " + word + " POS :" + token.get(PartOfSpeechAnnotation.class));
System.out.println("Lemma : " + token.get(LemmaAnnotation.class));
System.out.println("Named Entity : " + ne);
}
}
My regex rule file is -
$SKILL_FIRST_KEYWORD = "/area of/|/areas of/|/technical/|/computer/|/professional/" $SKILL_KEYWORD = "/knowledge/|/skill/|/skills/|/expertise/|/experience/"
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }
{ ruleType: "tokens", pattern: ($SKILL_FIRST_KEYWORD + $SKILL_KEYWORD), result: "SKILL" }
I am getting ArrayIndexOutOfBoundsException
error. I guess there is something wrong with my rule file. Can somebody please point me where am I making mistake?
Desired Output -
AREAS OF EXPERTISE - SKILL
Areas of Knowledge - SKILL
Computer Skills - SKILL
and so on.
Thanks in advance.