I have a list of sentences:
text = ['cant railway station','citadel hotel',' police stn'].
I need to form bigram pairs and store them in a variable. The problem is that when I do that, I get a pair of sentences instead of words. Here is what I did:
text2 = [[word for word in line.split()] for line in text]
bigrams = nltk.bigrams(text2)
print(bigrams)
which yields
[(['cant', 'railway', 'station'], ['citadel', 'hotel']), (['citadel', 'hotel'], ['police', 'stn'])
Can't railway station and citadel hotel form one bigram. What I want is
[([cant],[railway]),([railway],[station]),([citadel,hotel]), and so on...
The last word of the first sentence should not merge with the first word of second sentence. What should I do to make it work?
Using enumerate and split function.
There are a number of ways to solve it but I solved in this way: