Use spaCy entities in Rasa-NLU training data

2019-07-15 02:00发布

问题:

I'm trying to create a simple program with Rasa which extracts a (French) street address from a text input.

Following the advice in Rasa-NLU doc (http://rasa-nlu.readthedocs.io/en/latest/entities.html), I want to use spaCy to do the address detection.

I saw (https://spacy.io/usage/training) that the corresponding spaCy prebuilt entity would be LOC.

However, I don't understand how to create a training dataset with this entity.

Here is an excerpt from my current JSON training dataset :

{
    "text" : "je vis au 2 Rue des Platanes",
    "intent" : "donner_adresse",
    "entities" : [
        {
            "start" : 10,
            "end" : 28,
            "value" : 2 Rue des Platanes",
            "entity" : "adresse"
        }
    ]
}

If I train the program and run it with the text input "je vis au 2 Rue des Hetres", I get this output :

{
    "entities": [
        "end": 26,
        "entity": "adresse",
        "extractor": "ner_crf",
        "start": 10,
        "value": "2 rue des hetres"
    ],
    "intent": null,
    "intent_ranking": [],
    "text": "je vis au 2 Rue des Hetres"
}

Which is fine given my training dataset. But I would like to use spaCy's LOC entity.

How can I achieve that ? (What am I doing wrong ?)

Here is a relevant summary of my config file, if needed :

{
    "pipeline" : "spacy_sklearn",
    "language" : "fr",
    "spacy_model_name" : "fr_core_news_md"
}

回答1:

If you want to use spaCy's pre-trained NER, you just need to add it to your pipeline, e.g.

pipeline = ["nlp_spacy", "tokenizer_spacy", "ner_spacy"]

But depending on what you need, you might want to just copy one of the preconfigured pipelines and add "ner_spacy" at the end