How to generate word clouds from LDA models in Pyt

I am doing some topic modeling on newspaper articles, and have implemented LDA using gensim in Python3. Now I want to create a word cloud for each topic, using the top 20 words for each topic. I know I can print the words, and save the LDA model, but is there any way to just save the top words for each topic which I can further use for generating word clouds?

I tried to google it, but could not find anything relevant. Any help is appreciated.

标签： python lda word-cloud

3条回答

啃猪蹄的小仙女

2楼-- · 2019-02-07 16:07

You may also consider using pyldavis package which can be used to visualize LDA models generated through gensim. An example is shown here

0人赞添加讨论(0) 举报

可以哭但决不认输i

3楼-- · 2019-02-07 16:09

You can get the topn words from an LDA model using Gensim's built-in method show_topic.

lda = models.LdaModel.load('lda.model')

for i in range(0, lda.num_topics):
    with open('output_file.txt', 'w') as outfile:
        outfile.write('{}\n'.format('Topic #' + str(i + 1) + ': '))
        for word, prob in lda.show_topic(i, topn=20):
            outfile.write('{}\n'.format(word.encode('utf-8')))
        outfile.write('\n')

This will write a file with a format similar to this:

Topic #69: 
pet
dental
tooth
adopt
animal
puppy
rescue
dentist
adoption
animal
shelter
pet
dentistry
vet
paw
pup
patient
mix
foster
owner

Topic #70: 
periscope
disneyland
disney
snapchat
brandon
britney
periscope
periscope
replay
britneyspear
buffaloexchange
britneyspear
https
meerkat
blab
periscope
kxci
toni
disneyland
location

You may or may not need to adjust this to your needs, ie yield a list of top 20 words instead of outputting it to a text file.

The answer in this post gives a good explanation of how to use raw text to create the word clouds. How do I print lda topic model and the word cloud of each of the topics

0人赞添加讨论(0) 举报

\"骚年 ilove

4楼-- · 2019-02-07 16:14

is there any way to just save the top words for each topic ?

Yes there is. jLDADMM outputs the top topical words for each topic. In version 1.0, only top topical words are written in the top-word output file, without their probabilities given the topic.

0人赞添加讨论(0) 举报

How to generate word clouds from LDA models in Pyt

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间