Tools >> options >> connections >> 勾选 allow remote computers to connect
html
http://10.209.143:1234算法
IP:是第二步查看到的ip地址,替换成你本身的IP
port:8888是你在fiddler中配置的
注意:有些浏览器会显示打不开,更换其余浏览器就能够了json
打开后点击最后的连接(光标处),进行证书安装就能够了api
部分手机能够直接点击 安装
部分手机须要 设置 >> wifi(或WLAN) >> 高级设置 >> 安装证书 >>
选中刚刚下载的 证书文件 FiddlerRoot.cer >> 肯定
设置(Settings) >> 更多设置 >> 系统安全 >> 从存储设备安装
为证书命名 , 输入本身喜欢的名字,譬如 fiddler ,肯定 , 显示 证书安装完成
安装完成后,在 设置(Settings) >> 更多设置 >> 系统安全 >> 信任的凭证 >>
系统和用户2个tab页 >> 用户 >> 能够查看到 DO_NOT_RUST_FiddlerRoot浏览器
PS: 不安装证书,抓取http的数据是没问题的,可是抓取不了https的数据安全
注意:
一、大部分app均可以直接抓包
二、少部分app没办法直接获取,须要 wireshark、反编译、脱壳 等方式去查找加密算法
三、app抓包通常都是抓取到服务器返回的json数据包服务器
手机打开豆果美食APP,同时打开fiddler,浏览你须要爬取的数据页面,而后就能够在fiddler中分析抓取的网络请求网络
由于手机数据通常都是json格式的数据,因此多注意网络请求的格式便可session
很快就找到了咱们须要的请求,接下来就用scrapy模拟请求解析数据app
import scrapy class DouguoItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() auth = scrapy.Field() cook_name = scrapy.Field() cook_time = scrapy.Field() cook_difficulty = scrapy.Field() cook_story = scrapy.Field() img = scrapy.Field()
# Obey robots.txt rules ROBOTSTXT_OBEY = False
# -*- coding: utf-8 -*- import scrapy import json from ..items import DouguoItem class DouguoJiachangSpider(scrapy.Spider): name = 'douguo_jiachang' # allowed_domains = ['baidu.com'] # start_urls = ['http://api.douguo.net/recipe/v2/search/0/20'] page = 0 def start_requests(self): base_url = 'http://api.douguo.net/recipe/v2/search/{}/20' url = base_url.format(self.page) data = { 'client': '4', '_session': '1542354711458863254010224946', 'keyword': '家常菜', 'order': '0', '_vs': '400' } self.page += 20 yield scrapy.FormRequest(url=url, formdata=data, callback=self.parse) def parse(self, response): date = json.loads(response.body.decode()) # 将json格式数据转换成字典 t = date.get('result').get('list') for i in t: douguo_item = DouguoItem() douguo_item['auth'] = i.get('r').get('an') douguo_item['cook_name'] = i.get('r').get('n') douguo_item['cook_time'] = i.get('r').get('cook_time') douguo_item['cook_difficulty'] = i.get('r').get('cook_difficulty') douguo_item['cook_story'] = i.get('r').get('cookstory') douguo_item['image_url'] = i.get('r').get('p') yield douguo_item
结果:
在前面的代码基础上继续更加功能
# Configure item pipelines # See https://doc.scrapy.org/en/latest/topics/item-pipeline.html ITEM_PIPELINES = { 'douguo.pipelines.DouguoPipeline': 229, 'douguo.pipelines.ImagePipline': 300, } ............. ............. ............. IMAGES_STORE = './images/' DOWNLOAD_DELAY = 1
import scrapy class DouguoItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() auth = scrapy.Field() cook_name = scrapy.Field() cook_time = scrapy.Field() cook_difficulty = scrapy.Field() cook_story = scrapy.Field() image_url = scrapy.Field() image_path = scrapy.Field()
# -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html import os import scrapy from scrapy.pipelines.images import ImagesPipeline from scrapy.exceptions import DropItem from .settings import IMAGES_STORE class DouguoPipeline(object): def process_item(self, item, spider): print(item) return item class ImagePipline(ImagesPipeline): def get_media_requests(self, item, info): ''' 对图片的地址生成Request请求进行下载 ''' yield scrapy.Request(url=item['image_url']) def item_completed(self, results, item, info): ''' 当图片下载完成以后,调用方法 ''' format = '.' + item['image_url'].split('.')[-1] # 设置图片格式 image_path = [x['path'] for ok, x in results if ok] # 获取图片的相对路径 old_path = IMAGES_STORE + image_path[0] # 老的路径 new_path = IMAGES_STORE + item['cook_name'] + format # 新的路径 路径+菜名+格式 item['image_path'] = new_path # 把新的路径传给item try: os.rename(old_path, new_path) # 改变下载的位置 except: raise DropItem('Image Download Failed') return item
结果:
ImagePipeline:
Scrapy用ImagesPipeline类提供一种方便的方式来下载和存储图片。须要PIL库支持。