python作词云（WordCloud）

时间 2019-11-16

原文原文链接

python作词云（WordCloud）

1. 安装

某个教程给出的方法，到[这里][1]下载相应的wordcolud，而后到相应目录pip安装。 
 其实直接

pip install wordcloud

就ok了，进入python。 import wordcloud成功便可。
html

##2. 文档简要说明

能够看到文档主要就3个主要的函数，目前主要介绍WordCloud模块以及相关的函数。python

WordCloud()

class wordcloud.WordCloud(font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=0.9, mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None, background_color='black', max_font_size=None, font_step=1, mode='RGB', relative_scaling=0.5, regexp=None, collocations=True, colormap=None, normalize_plurals=True)

font_path:字体位置，中文的时候须要制定一些。
prefer_horizontal:float，水平方向的拟合次数，若是小于1，一旦水平方向不合适就旋转这个词。意思就是词云的算法水平词和竖直方向词的一种数量衡量。
mask ：控制词云的背景。nd-array or None (default=None)若是是空，就使用width和height参数。否则使用mask做为背景。
scale:缩放图片
max_words：显示的最大词
stopwords：停用词
relative_scaling：这个比较有意思，若是true，字体的大小与词语顺序有关。false，字体大小与词云频率有关。web

相关函数

fit_words(frequencies) 根据单词与频率生成词云
generate(tex) 根据文本直接生成词云，仅限英文的
generate_from_frequencies(frequencies, max_font_size=None) 根据单词与频率生成词云，能够指定最大数目
generate_from_text（）根据文本直接生成词云。英文的
process_text(text) 根据text生成单词的统计数目，返回{word,int}，去除了停用词。只限于英文的算法

关于fit_words的参数问题，

让咱们传的是一个tuple，包含了word和frequency,实际我这么作的时候参数是错误了，看一下源代码dom

def fit_words(self, frequencies):
        """Create a word_cloud from words and frequencies. Alias to generate_from_frequencies. Parameters ---------- frequencies : array of tuples A tuple contains the word and its frequency. Returns ------- self """
        return self.generate_from_frequencies(frequencies)

参数说明仍然说是用“ A tuple contains the word and its frequency.”，又去调用了self.generate_from_frequencies(frequencies)，ide

def generate_from_frequencies(self, frequencies, max_font_size=None):
        """Create a word_cloud from words and frequencies. Parameters ---------- frequencies : dict from string to float A contains words and associated frequency. max_font_size : int Use this font-size instead of self.max_font_size Returns ------- self """
        # make sure frequencies are sorted and normalized
        frequencies = sorted(frequencies.items(), key=item1, reverse=True)
        frequencies = frequencies[:self.max_words]
        # largest entry will be 1
        max_frequency = float(frequencies[0][4])

        frequencies = [(word, freq / max_frequency)
                       for word, freq in frequencies]

这个时候的参数成了"dict from string to float",并且里面那个列表生成式至关于生成了一个新的 frequencies ，这个新的 frequencies 是个array of tuple。因此咱们仍是要传字典形式的。只不过函数内部变成了 array of tuple 。。。。。年久未修？函数

3. 知乎教育水平生成词云实例

#coding=utf-8


#导入wordcloud模块和matplotlib模块
from wordcloud import WordCloud,ImageColorGenerator
import  matplotlib.pyplot as plt
from scipy.misc import imread
import jieba
import jieba.analyse

content = (",").join(data2['教育经历'].values.tolist())#dataframe格式数据
tags = jieba.analyse.extract_tags(content, topK=200, withWeight=False)

text =" ".join(tags)
print(tags)

#读入背景图片
bj_pic=imread('1.png')

#生成词云（一般字体路径均设置在C:\\Windows\\Fonts\\也可自行下载）
font=r'C:\\Windows\\Fonts\\STFANGSO.ttf'#不加这一句显示口字形乱码  ""报错 
wordcloud=WordCloud(mask=bj_pic,background_color='white',font_path=font,scale=0.5).generate_from_text(text)  #直接根据文本生成 词云


plt.imshow(wordcloud)
plt.axis('off')
plt.show()

wordcloud.to_file('test2.jpg')

能够试着停用词，university,或者直接在tags里把不想要的删除。图片很小，用了scale=0.5
字体

尝试了一下fit_words，必须传入字典形式的。！！！this

wordcloud = WordCloud(mask=bj_pic,background_color='white',font_path=font,scale=3.5).fit_words({"sb":3,"我是":4,"操":10})

plt.imshow(wordcloud)
plt.axis('off')
plt.show()

python作词云 （WordCloud）

python作词云 （WordCloud）

1. 安装

3. 知乎教育水平 生成词云实例

python作词云（WordCloud）

python作词云（WordCloud）

3. 知乎教育水平生成词云实例