Forming Bigrams of words in list of sentences with-第2页回答

I have a list of sentences:

text = ['cant railway station','citadel hotel',' police stn'].

I need to form bigram pairs and store them in a variable. The problem is that when I do that, I get a pair of sentences instead of words. Here is what I did:

text2 = [[word for word in line.split()] for line in text]
bigrams = nltk.bigrams(text2)
print(bigrams)

which yields

[(['cant', 'railway', 'station'], ['citadel', 'hotel']), (['citadel', 'hotel'], ['police', 'stn'])

Can't railway station and citadel hotel form one bigram. What I want is

[([cant],[railway]),([railway],[station]),([citadel,hotel]), and so on...

The last word of the first sentence should not merge with the first word of second sentence. What should I do to make it work?

标签： python list list-comprehension nltk collocation

9条回答

smile是对你的礼貌

2楼-- · 2020-02-17 10:06

>>> text = ['cant railway station','citadel hotel',' police stn']
>>> bigrams = [(ele, tex.split()[i+1]) for tex in text  for i,ele in enumerate(tex.split()) if i < len(tex.split())-1]
>>> bigrams
[('cant', 'railway'), ('railway', 'station'), ('citadel', 'hotel'), ('police', 'stn')]

Using enumerate and split function.

0人赞添加讨论(0) 举报

祖国的老花朵

3楼-- · 2020-02-17 10:08

from nltk import word_tokenize 
from nltk.util import ngrams


text = ['cant railway station', 'citadel hotel', 'police stn']
for line in text:
    token = nltk.word_tokenize(line)
    bigram = list(ngrams(token, 2)) 

    # the '2' represents bigram...you can change it to get ngrams with different size

0人赞添加讨论(0) 举报

叛逆

4楼-- · 2020-02-17 10:16

There are a number of ways to solve it but I solved in this way:

>>text = ['cant railway station','citadel hotel',' police stn']
>>text2 = [[word for word in line.split()] for line in text]
>>text2
[['cant', 'railway', 'station'], ['citadel', 'hotel'], ['police', 'stn']]
>>output = []
>>for i in range(len(text2)):
    output = output+list(bigrams(text2[i]))
>>#Here you can use list comphrension also
>>output
[('cant', 'railway'), ('railway', 'station'), ('citadel', 'hotel'), ('police', 'stn')]

0人赞添加讨论(0) 举报

上一页 1 2

Forming Bigrams of words in list of sentences with

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间