最近有时间,找了一些比较麻烦的网站来练手,而后想起来 之前说要弄商标网的,今天就又上去看了下!javascript
之前转载的连接 :商标局网请收下个人膝盖html
上去查看了下,感受怎么参数这么明显了!!!??? 应该是取消了不少爬虫限制!java
而后模拟请求的试了下,请求成功,成功获取到数值!
python
使用的接口是:json
http://sbgg.saic.gov.cn:9080/tmann/annInfoView/selectInfoidBycode.html cookie
http://sbgg.saic.gov.cn:9080/tmann/annInfoView/imageView.html http://sbgg.saic.gov.cn:9080/tmann/annInfoView/annSearchDG.htmlapp
组合起来 能根据不一样的 条件进行查询,并下载最终的图片,有一点须要注意的是 返回的是图片连接列表 ,咱们须要的是 下标为3的那个 dom
简单代码以下(仅作学习参考):post
import requests, re, json, time, random with open("搜索结果1.json", "r", encoding="utf-8") as f: data = f.read() def run(ann_num, page_no, ann_type_code): url = "http://sbgg.saic.gov.cn:9080/tmann/annInfoView/selectInfoidBycode.html" headers = { "Accept": "application/json, text/javascript, */*; q=0.01", "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8", "Connection": "keep-alive", "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8", "Cookie": "",# cookie "Host": "sbgg.saic.gov.cn:9080", "Origin": "http://sbgg.saic.gov.cn:9080", "Referer": "http://sbgg.saic.gov.cn:9080/tmann/annInfoView/annSearch.html?annNum=", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36", "X-Requested-With": "XMLHttpRequest", } data = { "annNum": ann_num, "annTypecode": ann_type_code, } response = requests.post(url=url, headers=headers, data=data, timeout=15) id = response.text print(id) URL2 = "http://sbgg.saic.gov.cn:9080/tmann/annInfoView/imageView.html" data2 = { "id": id, "pageNum": page_no, "flag": "1", } response2 = requests.post(url=URL2, headers=headers, data=data2, timeout=15) data = response2.text data = eval(data) image = data["imaglist"][3] print(image) if __name__ == '__main__': """代码仅作学习参考""" data_dict = eval(data) total = data_dict["total"] # 商标总数 rows = data_dict["rows"] # 商标总数 print(total) for i in rows: page_no = i["page_no"] # 页数编号 tm_name = i["tm_name"] # 商标名称 ann_type_code = i["ann_type_code"] # 请求参数 tmname = i["tmname"] # 商标名称 reg_name = i["reg_name"] # 公司名称 ann_type = i["ann_type"] # 公告仍是省定 ann_num = i["ann_num"] # 公告期数 reg_num = i["reg_num"] # 商标id id = i["id"] # 请求id rn = i["rn"] # 位置 app_date = i["ann_date"] # 申请日期 regname = i["regname"] # # 申请人名称??? if ann_type == "商标初步审定公告": run(ann_num, page_no, ann_type_code) time.sleep(5)