进行抓取任务时很苦恼的一点在于为了调试某个第三,四层以上的跳转连接须要等待将前面的连接都跑一遍,才能肯定某个页面的parse函数是否正确,scrapy的命令行参数 parse就是为了解决这一问题.html
Syntax: scrapy parse <url> [options]
意思就是 scrpy parse 网址 可选参数python
官网给出的例子 $ scrapy shell http://www.example.com/some/page.html
开始运行时结果老是没有打印出任何log来,因而将本来0.25的scrapy升级到1.0
这时再输入shell
scrapy parse http://www.douban.com -c group_parse
报了这样的错误scrapy
ERROR: Unable to find spider for: http://www.douban.com
还有多是这样的ide
Traceback (most recent call last): File "/usr/local/bin/scrapy", line 11, in <module> sys.exit(execute()) File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 143, in execute _run_print_help(parser, _run_command, cmd, args, opts) File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help func(*a, **kw) File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command cmd.run(args, opts) File "/Library/Python/2.7/site-packages/scrapy/commands/parse.py", line 220, in run self.set_spidercls(url, opts) File "/Library/Python/2.7/site-packages/scrapy/commands/parse.py", line 147, in set_spidercls self.spidercls.start_requests = _start_requests AttributeError: 'NoneType' object has no attribute 'start_requests'
好吧,自动找不到咱们就显示指定下爬虫的名字
就是在继承自spider类里定义的那个name里的值函数
class douban(Spider): name = "douban_spider"
ok 问题解决url