Python3+itchat爬虫实战

本文主要记录如何用Python调用itchat来爬取好友信息，而且制做好友性别柱状图和好友个性签名词云。涉及以下模块：python

itchat ：一个开源的微信我的号接口，能够实现信息收发、获取好友列表等功能。git

jieba ：python中文分词组件，制做词云的时候会用到github

matpolotlib ：python的一个用来画图的库正则表达式

wordcloud ：用来制做词云微信

怎么下载？app

怎么安装？？dom

详细介绍？？？函数

在上面的粗体字模块名上点击一下就知道了~~~字体

OK ! 正式开始ui

代码环境：Python3+win10

第一步：python登录微信，并获取全部好友的信息

def my_friends():
     #二维码登录
    itchat.auto_login()
    #获取好友信息
    friends = itchat.get_friends(update=True)
    return friends

运行这个函数时电脑屏幕会出现一个二维码，手机微信扫描后便可完成登录。同时终端会输出以下信息：

    Getting uuid of QR code.
    Downloading QR code.
    Please scan the QR code to log in.
    Please press confirm on your phone.
    Loading the contact, this may take a little while.
    Login successfully as 某某某

itchat的get_friends方法会获取到全部好友信息。须要说明的是此处return的friends是列表类型，列表中的元素是字典类型，且列表中第0个元素是本身，这个后续数据处理的时候会遇到。至此，第一步已完成。

第二步：提取数据

在第一步中微信好友的数据已所有放入friends这个列表中，接下来遍历列表并从中取出咱们须要内容便可。

1.好友性别统计

def my_friends_sex(friends):
   
    #建立一个字典用于存放好友性别信息
    friends_sex = dict()
    #定义好友性别信息字典的key，分别为男性，女性，其余
    male    =  "男性"
    female  =  "女性"
    other   =  "其余"

    #遍历列表中每个好友的信息，     
    for i in friends[1:]:
        sex = i["Sex"]
        if sex == 1:
            #字典操做，找到key并为其的值加1
            friends_sex[male] = friends_sex.get(male,0) + 1
        elif sex == 2:
            friends_sex[female] = friends_sex.get(female,0) + 1
        elif sex == 0 :
            friends_sex[other] = friends_sex.get(other,0) + 1
    #打印好友性别信息的字典
    #print (friends_sex)
    #好友总数，从第二个开始是由于第一个好友是本身
    totle = len(friends[1:])
    
    proportion = [float(friends_sex[male])/totle*100,float(friends_sex[female])/totle*100,float(friends_sex[other])/totle*100]
    print (
       "男性好友：%.2f%% " % (proportion[0])     +'\n' +
       "女性好友：%.2f%% " % (proportion[1])   +'\n' +
       "其余：%.2f%% "  % (proportion[2])
       )
    return friends_sex

额~注释写的够详细吧，主要是怕本身过两天就忘了。。。

在遍历friends列表的时候本函数提取其元素的key为Sex，这是由于，由于Sex对应的是性别啊！另外还有几个其余经常使用的key：

       'NickName'      好友昵称
       'RemarkName'   备注
       'Signature'         签名
       'Province':          省
       'City':                   市
       'SEX'                    性别，1男 2女 0其余

return的friends_sex是一个字典，有三个key，分别是male,female,other。因为咱们的目的是画好友性别的统计图，因此须要获得每一个性别的人数。

2.获取好友个性签名

def my_friends_style(friends):
    #建立列表用于存放个性签名
    style = []
    for i in range(len(friends)):
        #每个好友的信息存放在列表中的字典里，此处获取到
        i = friends[i]
        #获得每一个字典的个性签名的key，即Signature
        #strip去除字符串首位的空格，replace去掉英文
        Signature = i['Signature'].strip().replace('span','').replace('class','').replace('emoji','')
        #经过正则表达式将签名中的特殊符号去掉，re.sub则至关于字符串操做中的replace
        rep = re.compile('1f\d+\w*|[<>/=]')
        Signature=rep.sub('',Signature)
        #放入列表
        style.append(Signature)
    #join() 方法用于将序列中的元素以指定的字符链接生成一个新的字符串。
    #此处将全部签名去除特殊符号和英文以后，拼接在一块儿
    text = ''.join(style)
    #将输出保存到文件，并用结巴来分词
    with io.open('F:\python_实战\itchat\微信好友个性签名词云\\text.txt','a',encoding = 'utf-8') as f:
        wordlist = jieba.cut(text,cut_all=False)
        word_space_split = ' '.join(wordlist)
        f.write(word_space_split)

个性签名的数据处理相比性别统计要复杂一丢丢，因为你们的个性签名都比较个性，大多包含一些表情或者特殊符号，全部提取到Signature后须要用strip方法去除字符串首位的空格，再用正则表达式去除特殊符号，最后用结巴分词后，将数据放入一个文件中，后续制做词云时使用。

结巴分词的cut_all=False表示精确模式，若是你设置为True，词云会很。。。

第三步：画图

1.好友性别柱状图

def drow_sex(friends_sex):
    #获取饼状图的标签和大小
    labels = []
    sizes = []
    for key in friends_sex:
        labels.append(key)
        sizes.append(friends_sex[key])
    #每块图的颜色，数量不足时会循环使用
    colors = ['red', 'yellow', 'blue']
    #每一块离中心的距离
    explode = (0.1,0,0)
    #autopct='%1.2f%%'百分数保留两位小数点；shadow=True,加阴影使图像更立体
    #startangle起始角度，默认为0°，通常设置为90比较好看
    plt.pie(sizes,explode=explode,labels=labels,colors=colors,autopct='%1.2f%%',shadow=True,startangle=90)
    #设置图像的xy轴一致
    plt.axis('equal')
    #显示颜色和标签对应关系
    plt.legend()
    #添加title，中文有乱码是个坑，不过我找到填平的办法了
    plt.suptitle("微信好友性别统计图")
    #保存到本地，由于show以后会建立空白图层，因此必须在show以前保存
    plt.savefig('F:\python_实战\itchat\好友性别饼状图.png')
    plt.show()

全是 matplotlib的用法，没啥好说的

若是有title中文乱码的问题，在程序开始前
from pylab import *
mpl.rcParams['font.sans-serif'] = ['SimHei']

2.好友个性签名词云

def wordart():
    back_color = imread('F:\python_实战\itchat\微信好友个性签名词云\\猫咪.png')
    wc = WordCloud(background_color='white',    #背景色
                   max_words=1000,
                   mask=back_color,     #以该参数值绘制词云
                   max_font_size=100,
                   
                   font_path="C:/Windows/Fonts//STFANGSO.ttf", #设置字体类型，主要为了解决中文乱码问题
                   random_state=42, #为每一词返回一个PIL颜色
            )
    
    #打开词源文件
    text = open("F:\python_实战\itchat\微信好友个性签名词云\\text.txt",encoding='utf-8').read()
    #
    wc.generate(text)
    #基于彩色图像生成相应颜色
    image_colosr = ImageColorGenerator(back_color)
    #显示图片
    plt.imshow(wc)
    #关闭坐标轴
    plt.axis("off")
    #保存图片
    wc.to_file("F:\python_实战\itchat\微信好友个性签名词云\\词云.png")

完工~~~

python基础知识补充：

1.字典操做

举例
    b={'A':1,'B':2,'C':3,'D':4}
    b['A']
    Out[28]: 1
    b['D']
    Out[29]: 4

2.字典get方法

get()方法语法：
dict.get(key, default=None)
参数
key -- 字典中要查找的键。
default -- 若是指定键的值不存在时，返回该默认值值。
举例
dict = {'Name': 'Zara', 'Age': 27}
print "Value : %s" % dict.get('Age')
print "Value : %s" % dict.get('Sex', "Never")
输出：
Value : 27
Value : Never

3.列表内容直接写入文件

with open('F:\python_实战\itchat\\friends.txt','a+') as f:
for i in range(len(friends)):
f.write(str(friends[i]))

4.strip()方法

用于移除字符串首位的特色字符，默认为去除空格 a = "assdgheas" a.strip('as') print(a) 输出：ssdghe