How to use Hindi Model in RASA NLU?

2019-07-25 18:34发布

问题:

I have build my model for Hindi language using FastText with spacy backend. I followed this tutorial to to build my model using FastText.

This URL

I have also linked my model with spacy by following command

python -m spacy link nl_model hi

Model is linked successfully you can check in the image below

Now I am not finding any help for using hindi language, Like what kind of config files do I need to use, where to import hindi model and how to proceed now? I also have question like how our data.json file look like for the hindi and how we will use entities and intents, name of the entities and intents should also be in Hindi or in English? Can some one help to process further? I am stuck here. I have to build a ChatBot in hindi using RASA Stack only.

Thanks in advance....

回答1:

It seems that you have successfully learned hi model using spaCy. The next step is to write a config file like:

language: "hi"

pipeline:
- name: "tokenizer_whitespace"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"

If your hi model which you just learned also have tokenizer, you can replace tokenizer_whitespace with tokenizer_spacy.

I should mention that the new intent classifier of rasa which is based on tensorflow does not need wordvectors of your hi model, it extract the wordevectors from scratch, see here. For the entity extraction you also don't need the hi model, just tokenizer do the stuffs for you! So, in overall, you can have your bot even without hi model!

The training data file should can be json or markdown as fully explained in doc. I think the name of your intents and entities should be in English but it is clear that the sample queries can be in any utf-8 language like hindi.

Then you can learn your model using different methods which explained in doc. for example:

python3 -m rasa_nlu.train \
    --config YOUR_CONFIG_FILE.yml \
    --data YOUR_TRAIN_DATA.json \
    --path PATH_TO_SAVE_MODEL

You can find a good quick start in doc.