I am trying to code dissociated press algorithm based on n-gram in scala. How to generate an n-gram for a large files: For example, for the file containing "the bee is the bee of the bees".
- First it has to pick a random n-gram. For example, the bee.
- Then it has to look for n-grams starting with (n-1) words. For example, bee of.
- it prints the last word of this n-gram. Then repeats.
Can you please give me some hints how to do it? Sorry for the inconvenience.
Here is a stream based approach. This will not required too much memory while computing n-grams.
OUTPUT:
You may try this with a parameter of n
Your questions could be a little more specific but here is my try.