分析:spa
1. 读取小说,以读的形式打开code
with open('文件名.txt','r',encoding='utf8') as f: str = f.read()
2. 切割小说blog
ret = jieba.lcut(str)
3. 统计全部词语出现次数 --> 准备一个字典排序
dic = {} for word in ret: if len(word) == 1: # 去掉名字为一个字的 continue dic[word] = dic.get(word, 0) + 1
准备一个多余称呼的列表get
excluedes = ["诸葛亮","卧龙","玄德","关公","丞相",...]
若是文中某我的物有多个称呼时,应将多个称呼叠加到某一个称呼上,再将多余的删除it
dic['关羽'] = dic['关羽'] + dic['美髯公'] + dic['关公'] + dic['关云长'] + dic['云长'] for i in excluedes: del dic[i]
4. 对字典进行排序,升序class
lis = list(dic.items()) lis.sort(key=lambda x:x[1],reverse=True)
5. 取出出现次数前十的数据lambda
for i in range(10): print(lis[i][0])