爬虫 - scrapy之定制命令

单爬虫运行

import sys
from scrapy.cmdline import execute
 
if __name__ == '__main__':
    execute(["scrapy","crawl","chouti","--nolog"])

而后右键运行py文件便可运行名为‘chouti‘的爬虫scrapy

同时运行多个爬虫

步骤以下:ide

- 在spiders同级建立任意目录,如:commands
- 在其中建立 crawlall.py 文件 (此处文件名就是自定义的命令)
- 在settings.py 中添加配置 COMMANDS_MODULE = '项目名称.目录名称'
- 在项目目录执行命令:scrapy crawlall


代码以下:
 1 from scrapy.commands import ScrapyCommand
 2     from scrapy.utils.project import get_project_settings
 3  
 4     class Command(ScrapyCommand):
 5  
 6         requires_project = True
 7  
 8         def syntax(self):
 9             return '[options]'
10  
11         def short_desc(self):
12             return 'Runs all of the spiders'
13  
14         def run(self, args, opts):
15             spider_list = self.crawler_process.spiders.list()
16             for name in spider_list:
17                 self.crawler_process.crawl(name, **opts.__dict__)
18             self.crawler_process.start()
19  
20 crawlall.py
相关文章
相关标签/搜索