How to handle two entity extraction methods in NLP

2020-02-13 05:42发布

问题:

I am using two different entity extraction methods (https://rasa.com/docs/nlu/entities/) while building my NLP model in the RASA framework to build a chatbot. The bot should handle different questions which have custom entities as well as some general ones like location or organisation. So I use both components ner_spacy and ner_crf to create the model. After that I build a small helper script in python to evaluate the model performance. There I noticed that the model struggles to choose the correct enity.

For example for a word 'X' it choosed the pre-defined enity 'ORG' from SpaCy, but it should be recogniced as a custom enity which I defined in the training data.

If I just use the ner_crf extractor I face huge problems in identifing location enities like capitals. Also one of my biggest problems are single answer enities.

Q : "What´s your favourite animal?"

A : Dog

My model is not able to extract this single entity 'animal' for this single answer. If I answer this question with two words like 'The Dog', the model has no problems to extract the animal entity with the value 'Dog'.

So my question is, is it clever to use two different components to extract entities? One for custom enities and the other one for pre-defined enities. If I use two methods, what´s the mechanism in the model which extractor is used?

By the way, currently I´m just testing things out, so my training samples are not that huge it should be (less then 100 examples). Could the problem been solved if I have much more training examples?

回答1:

You are facing 2 problems here. I am suggesting few ways that i found helpful.

1. Custom entity recognition: To solve this you need to add more training sentences with all possible lengths of entities. ner_crf is going to predict better when there are identifiable markers around entities (e.g. prepositions)

2. Extracting entities from single word answer : As a workaround, i suggest you to do below manipulations on client end.

When you are sending question like What´s your favorite animal?, append a marker to question to indicate to client that a single answer is expected. e.g. You can send ##SINGLE## What´s your favorite animal? to client.

Client can remove the ##SINGLE## from question and show it to user. But when client sends user's response to server, it doesn't send Dog, it send something like User responded with single answer as Dog

You can train your model to extract entities from such an answer.