I'm trying to create a simple program with Rasa which extracts a (French) street address from a text input.
Following the advice in Rasa-NLU doc (http://rasa-nlu.readthedocs.io/en/latest/entities.html), I want to use spaCy to do the address detection.
I saw (https://spacy.io/usage/training) that the corresponding spaCy prebuilt entity would be LOC
.
However, I don't understand how to create a training dataset with this entity.
Here is an excerpt from my current JSON training dataset :
{
"text" : "je vis au 2 Rue des Platanes",
"intent" : "donner_adresse",
"entities" : [
{
"start" : 10,
"end" : 28,
"value" : 2 Rue des Platanes",
"entity" : "adresse"
}
]
}
If I train the program and run it with the text input "je vis au 2 Rue des Hetres"
, I get this output :
{
"entities": [
"end": 26,
"entity": "adresse",
"extractor": "ner_crf",
"start": 10,
"value": "2 rue des hetres"
],
"intent": null,
"intent_ranking": [],
"text": "je vis au 2 Rue des Hetres"
}
Which is fine given my training dataset. But I would like to use spaCy's LOC
entity.
How can I achieve that ? (What am I doing wrong ?)
Here is a relevant summary of my config file, if needed :
{
"pipeline" : "spacy_sklearn",
"language" : "fr",
"spacy_model_name" : "fr_core_news_md"
}