取自shell
Scrapy终端(Scrapy shell)浏览器
#判断 url是不是想要的scrapy
def parse(self, response): if ".org" in response.url: from scrapy.shell import inspect_response #调试语句 inspect_response(response, self) >>> response.url 'http://example.org'
测试提取代码:ide
>>> sel.xpath('//h1[@class="fn"]') []
浏览器打开连接测试
>>> view(response) True
最后您能够点击Ctrl-D(Windows下Ctrl-Z)来退出终端,恢复爬取:url
>>> ^D2014-01-23 17:50:03-0400 [myspider] DEBUG: Crawled (200) <GET http://example.net> (referer: None)
在浏览器中打开URL.net
from scrapy.utils.response import open_in_browser def parse(self, response): if "item name" not in response.body: open_in_browser(response)