I want to identify following as SKILL using stanfordNLP's TokensRegexNERAnnotator.
Areas of Knowledge
Computer Skills
Technical Experience
Technical Skills
There are many more sequence of text like above.
Code -
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.addAnnotator(new TokensRegexNERAnnotator("./mapping/test_degree.rule", true));
String[] tests = {"Bachelor of Arts is a good degree.", "Technical Skill is a must have for Software Developer."};
List tokens = new ArrayList<>();
// traversing each sentence from array of sentence.
for (String txt : tests) {
System.out.println("String is : " + txt);
// create an empty Annotation just with the given text
Annotation document = new Annotation(txt);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
/* Next we can go over the annotated sentences and extract the annotated words,
Using the CoreLabel Object */
for (CoreMap sentence : sentences) {
for (CoreLabel token : sentence.get(TokensAnnotation.class)) {
System.out.println("annotated coreMap sentences : " + token);
// Extracting NER tag for current token
String ne = token.get(NamedEntityTagAnnotation.class);
String word = token.get(CoreAnnotations.TextAnnotation.class);
System.out.println("Current Word : " + word + " POS :" + token.get(PartOfSpeechAnnotation.class));
System.out.println("Lemma : " + token.get(LemmaAnnotation.class));
System.out.println("Named Entity : " + ne);
My regex rule file is -
$SKILL_FIRST_KEYWORD = "/area of/|/areas of/|/technical/|/computer/|/professional/" $SKILL_KEYWORD = "/knowledge/|/skill/|/skills/|/expertise/|/experience/"
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }
{ ruleType: "tokens", pattern: ($SKILL_FIRST_KEYWORD + $SKILL_KEYWORD), result: "SKILL" }
I am getting ArrayIndexOutOfBoundsException
error. I guess there is something wrong with my rule file. Can somebody please point me where am I making mistake?
Desired Output -
Areas of Knowledge - SKILL
Computer Skills - SKILL
and so on.
Thanks in advance.