-->

Creating wordcloud using python

2019-03-02 10:29发布

问题:

I am trying to create a wordcloud in python after cleaning text file ,

I got the required results i.e words which are mostly used in the text file but unable to plot.

My code:

import collections
from wordcloud import WordCloud
import matplotlib.pyplot as plt

file = open('example.txt', encoding = 'utf8' )
stopwords = set(line.strip() for line in open('stopwords'))
wordcount = {}

for word in file.read().split():
    word = word.lower()
    word = word.replace(".","")
    word = word.replace(",","")
    word = word.replace("\"","")
    word = word.replace("“","")
    if word not in stopwords:
        if word not in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1

d = collections.Counter(wordcount)
for word, count in d.most_common(10):
    print(word , ":", count)

#wordcloud = WordCloud().generate(text)
#fig = plt.figure()
#fig.set_figwidth(14)
#fig.set_figheight(18)

#plt.imshow(wordcloud.recolor(color_func=grey_color, random_state=3))
#plt.title(title, color=fontcolor, size=30, y=1.01)
#plt.annotate(footer, xy=(0, -.025), xycoords='axes fraction', fontsize=infosize, color=fontcolor)
#plt.axis('off')
#plt.show()

Edit: Plotted the wordcloud with following code:

wordcloud = WordCloud(background_color='white',
                          width=1200,
                          height=1000
                         ).generate((d.most_common(10)))


plt.imshow(wordcloud)
plt.axis('off')
plt.show()

But getting TypeError: expected string or buffer

when I tried the above code with .generate(str(d.most_common(10)))

The wordcloud formed is showing apostrophe(') sign after several words

using Jupyter Notebook | python3 | Ipython

回答1:

First download this file Symbola.ttf in the current folder of the following script.

Architecture file:

file.txt Symbola.ttf my_word_cloud.py

file.txt:

foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz foo buzz bizz
foo foo foo foo foo foo foo foo foo foo bizz bizz bizz bizz foo foo

my_word_cloud.py:

import io
from collections import Counter
from os import path

import matplotlib.pyplot as plt
from wordcloud import WordCloud

d = path.dirname(__file__)

# It is important to use io.open to correctly load the file as UTF-8
text = io.open(path.join(d, 'file.txt')).read()

words = text.split()
print(Counter(words))

# Generate a word cloud image
# The Symbola font includes most emoji
font_path = path.join(d, 'Symbola.ttf')
word_cloud = WordCloud(font_path=font_path).generate(text)

# Display the generated image:
plt.imshow(word_cloud)
plt.axis("off")
plt.show()

Result:

Counter({'foo': 17, 'bizz': 9, 'buzz': 5})

See a lot of other examples, here I created a simple example for you:

https://github.com/amueller/word_cloud/tree/master/examples



回答2:

most_common(x) is not a method of WordCloud. However, you can pass the parameter

max_words = 

and this should do what you're attempting.