I'm trying to create a simple program with Rasa which extracts a (French) street address from a text input.
Following the advice in Rasa-NLU doc (http://rasa-nlu.readthedocs.io/en/latest/entities.html), I want to use spaCy to do the address detection.
I saw (https://spacy.io/usage/training) that the corresponding spaCy prebuilt entity would be LOC
.
However, I don't understand how to create a training dataset with this entity.
Here is an excerpt from my current JSON training dataset :
{
"text" : "je vis au 2 Rue des Platanes",
"intent" : "donner_adresse",
"entities" : [
{
"start" : 10,
"end" : 28,
"value" : 2 Rue des Platanes",
"entity" : "adresse"
}
]
}
If I train the program and run it with the text input "je vis au 2 Rue des Hetres"
, I get this output :
{
"entities": [
"end": 26,
"entity": "adresse",
"extractor": "ner_crf",
"start": 10,
"value": "2 rue des hetres"
],
"intent": null,
"intent_ranking": [],
"text": "je vis au 2 Rue des Hetres"
}
Which is fine given my training dataset. But I would like to use spaCy's LOC
entity.
How can I achieve that ? (What am I doing wrong ?)
Here is a relevant summary of my config file, if needed :
{
"pipeline" : "spacy_sklearn",
"language" : "fr",
"spacy_model_name" : "fr_core_news_md"
}
If you want to use spaCy's pre-trained NER, you just need to add it to your pipeline, e.g.
But depending on what you need, you might want to just copy one of the preconfigured pipelines and add
"ner_spacy"
at the end