[爬虫]美术做业，爬虫和百度图片

时间 2020-02-28

原文原文链接

当博主正在看几率论的时候，QQ群忽然出现了：html

但是博主的手绘板还没到，明天又要交差了，不管怎么赶，都搞不出一份像模像样的做品了。正则表达式

但博主想起曾经在知乎上看到的文章（https://www.zhihu.com/question/27621722），不久前还学习了爬虫技术，再加上学校的包容开放，便有了这个想法：ide

　　将相关的图片拼接在一块儿，组成内容。函数

说干就干。在查阅资料后，博主选择了旧版的百度图片（方便操做，也没有爬虫警告和防爬机制）。通过分析，咱们发现：学习

https://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=%E6%AD%A6%E6%B1%89%E5%8A%A0%E6%B2%B9&pn=20&gsm=3c&ct=&ic=0&lm=-1&width=0&height=0ui

对于一个特定的关键词（就是word后面的部分，这里是“武汉加油”），百度会搜集与之相关的图片。然后面pn则是相应的偏移数目，因为旧版百度图片一页上会放20张图，20就至关于翻了一页（说实话，我以为旧版的这样的设计好多了，新版的还会不停加载，很是难受和别扭）。url

接下来是得到url。根据百度的特性，咱们不难发现：
spa

这里用正则表达式:"objURL":"(.*?)"去匹配就行了，效果不错。设计

代码：3d

 1 import requests
 2 import os
 3 from bs4 import BeautifulSoup as bs
 4 import re
 5 
 6 maxstep=10
 7 tot=0
 8 path="picture"
 9 
10 headers={
11     'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)     Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'
12 }
13 #######################################新建文件夹
14 def mkdir(path):
15     if os.path.exists(path):
16         return
17     else:
18         os.makedirs(path)
19 #######################################保存图片
20 def save(content):
21     global tot,path
22     mkdir(path)
23     with open(path+"/"+str(tot)+".png","wb+") as file:
24         file.write(content)
25         file.close()
26 #######################################下载图片
27 def download(url):
28     global tot
29     tot=tot+1
30     try:
31         html=requests.get(url,timeout=2)
32         save(html.content)
33         print(tot,"succeeded")
34     except:
35         print(tot,"failed")
36 #######################################得到相应信息
37 def getHtml(url):
38     html=requests.get(url,headers=headers)
39     html.encoding="utf-8"
40     return html.content
41 #######################################主函数
42 def main():
43     for pages in range(1,30):
44         print("Now page",pages)
45         url="https://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=%E6%AD%A6%E6%B1%89%E5%8A%A0%E6%B2%B9&pn="+str(pages*20)+"&gsm=3c&ct=&ic=0&lm=-1&width=0&height=0"
46         html=getHtml(url)
47         pat='"objURL":"(.*?)"'
48         result=re.compile(pat).findall(str(html))
49         for i in result:
50             print(i)
51             download(i)
52 #    file=open("observe.txt","w",encoding="utf-8")
53 #    file.write(soup.prettify())
54 #######################################
55 if(__name__=="__main__"):
56     main()

View Code

下载内容：

接下来就是拼图片。使用软件Foto-Mosaik-Edda（操做简便，小学一年级英语水平就能使用）就能完成拼接。

加了些许修改的原图片：（做者固然不是我，否则我还写什么爬虫）

拼接后：

一个小时内完成赶工。