【Python】对我本身的博客进行统计,看看哪年哪月发帖量最大

代码很简单,主要利用了requests进行网络访问,beautifulSoup进行页面文本分析,re进行正则表达式抽取文字,前面两个须要pip install name去安装,后者是内部对象因此不用安装。代码以下,只有区区二十七行:html

#encoding=utf-8

from bs4 import BeautifulSoup import requests import re user_agent='Mozilla/4.0 (compatible;MEIE 5.5;windows NT)' headers={'User-Agent':user_agent} dic={}; #定义个字典对象,存月份和个数 for i in range(1,90): html=requests.get('='+str(i),headers=headers) soup= BeautifulSoup(html.text,'html.parser',from_encoding='utf-8'); for descDiv in soup.find_all(class_="postDesc2"): rawInfo=descDiv.text #获得class="postDesc2"的div的内容 yearMonth=re.search(r'\d{4}-\d{2}',rawInfo).group() #用正则表达式去匹配年月并取其值 # 将年月存入字典,若是存在就在原基础上加一 if yearMonth in dic: dic[yearMonth]=dic[yearMonth]+1 else: dic[yearMonth]=1 list=sorted(dic.items(),key=lambda x:x[1]) #将排序后的字典转化为数组 list.reverse() for item in list: print(item)

而获得的结果以下:python

('2017-09', 80) 
('2019-10', 66) 
('2018-04', 56) 
('2018-05', 45) 
('2013-09', 43) 
('2019-09', 42) 
('2017-08', 38) 
('2019-03', 37)
('2013-08', 32)
('2017-11', 32)
('2014-07', 26)
('2014-12', 22)
('2017-06', 21)
('2017-12', 21)
('2017-01', 20)
('2018-03', 19)
('2019-08', 18)
('2016-07', 17)
('2013-11', 15)
('2014-08', 15)
('2016-03', 15)
('2013-10', 14)
('2014-04', 14)
('2014-05', 14)
('2015-01', 14)
('2019-11', 13)
('2014-11', 12)
('2016-08', 12)
('2015-07', 10)
('2016-02', 9)
('2017-07', 9)
('2014-01', 8)
('2014-10', 7)
('2015-08', 7)
('2018-01', 7)
('2015-04', 6)
('2014-02', 5)
('2015-06', 5)
('2017-10', 5)
('2013-12', 4)
('2015-02', 4)
('2015-05', 4)
('2014-03', 3)
('2017-02', 3)
('2014-09', 2)
('2015-12', 2)
('2017-03', 2)
('2018-06', 2)
('2018-07', 2)
('2019-05', 2)
('2014-06', 1)
('2015-11', 1)
('2016-05', 1)
('2016-06', 1)
('2016-10', 1)
('2017-04', 1)
('2017-05', 1)
('2019-04', 1)
('2019-07', 1)

偶尔玩玩Python还挺有意思,这门技能可不能忘了。正则表达式

--END-- 2019年11月3日15:26:38windows

 

这是2020年1月31日的运行结果数组

C:\personal\programs\python>python 1.py
C:\Users\ufo\AppData\Local\Programs\Python\Python38\lib\site-packages\bs4\__init__.py:203: UserWarning: You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.
  warnings.warn("You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.")
('2017-09', 79)
('2020-01', 79)
('2019-11', 76)
('2019-12', 66)
('2019-10', 65)
('2018-04', 55)
('2018-05', 45)
('2019-09', 42)
('2019-03', 37)
('2017-11', 32)
('2014-12', 22)
('2017-06', 21)
('2017-12', 21)
('2017-01', 20)
('2018-03', 19)
('2017-08', 18)
('2016-07', 17)
('2019-08', 17)
('2016-03', 15)
('2015-01', 14)
('2014-11', 12)
('2016-08', 12)
('2014-08', 10)
('2015-07', 10)
('2016-02', 9)
('2017-07', 9)
('2014-10', 7)
('2015-08', 7)
('2018-01', 7)
('2015-04', 6)
('2015-06', 5)
('2017-10', 5)
('2015-02', 4)
('2015-05', 4)
('2017-02', 3)
('2014-09', 2)
('2015-12', 2)
('2017-03', 2)
('2018-06', 2)
('2018-07', 2)
('2019-05', 2)
('2015-11', 1)
('2016-05', 1)
('2016-06', 1)
('2016-10', 1)
('2017-04', 1)
('2017-05', 1)
('2019-04', 1)
('2019-07', 1)