Training and evaluating spaCy model by sentences o

2019-08-11 22:44发布

问题:

Observation:

Paragraph: I love apple. I eat one banana a day
Sentence: I love apple., I eat one banana a day
There are two sentences in this paragraph, I love apple and I eat one banana a day. If I put the whole paragraph into spaCy, it'll recognize only one entity, for example, apple, but if I put the sentences in paragraph one by one, spaCy can recognize two entities, apple and banana.(This is just an example to show my point, the actual recognition result could be different)

Situation:

After having trained a model by myself, I want to evaluate the recognizing accuracy of my model, there are two ways to pass the text into the spaCy model:
1. split the paragraph into sentences and pass the sentence one by one for sentence in paragraph: doc = nlp(sentence) # retrieve the parsing result 2. pass the paragraph at once doc = nlp(paragraph) # retrieve the parsing result

Question:

  1. I'm wondering which way would be better to test the performance of the model? Since I'm sure passing by sentence can always recognize more entities than passing by paragraph.
  2. If the second one is better, do I also need to change the way that I trained the model? Currently, I train the spacy model sentence by sentence rather than a paragraph.

The goal of my project:

After getting a document, recognize all the entities that I'm interested in the document.

Thanks!