I have build my model for Hindi language using FastText with spacy backend.
I followed this tutorial to to build my model using FastText.
This URL
I have also linked my model with spacy by following command
python -m spacy link nl_model hi
Model is linked successfully you can check in the image below
Now I am not finding any help for using hindi language, Like what kind of config files do I need to use, where to import hindi model and how to proceed now?
I also have question like how our data.json file look like for the hindi and how we will use entities and intents, name of the entities and intents should also be in Hindi or in English?
Can some one help to process further? I am stuck here.
I have to build a ChatBot in hindi using RASA Stack only.
Thanks in advance....
It seems that you have successfully learned hi
model using spaCy. The next step is to write a config file like:
language: "hi"
pipeline:
- name: "tokenizer_whitespace"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
If your hi
model which you just learned also have tokenizer, you can replace tokenizer_whitespace
with tokenizer_spacy
.
I should mention that the new intent classifier of rasa which is based on tensorflow does not need wordvectors of your hi
model, it extract the wordevectors from scratch, see here. For the entity extraction you also don't need the hi
model, just tokenizer do the stuffs for you!
So, in overall, you can have your bot even without hi
model!
The training data file should can be json or markdown as fully explained in doc. I think the name of your intents and entities should be in English but it is clear that the sample queries can be in any utf-8 language like hindi.
Then you can learn your model using different methods which explained in doc.
for example:
python3 -m rasa_nlu.train \
--config YOUR_CONFIG_FILE.yml \
--data YOUR_TRAIN_DATA.json \
--path PATH_TO_SAVE_MODEL
You can find a good quick start in doc.