Scrapy可视化管理软件SpiderKeeper

一般开发好的Scrapy爬虫部署到服务器上,要不使用nohup命令,要不使用scrapyd。若是使用nohup命令的话,爬虫挂掉了,你可能还不知道,你还得上服务器上查或者作额外的邮件通知操做。若是使用scrapyd,就是部署爬虫的时候有点复杂,功能少了点,其余还好。python

SpiderKeeper是一款管理爬虫的软件,和scrapinghub的部署功能差很少,能多台服务器部署爬虫,定时执行爬虫,查看爬虫日志,查看爬虫执行状况等功能。
项目地址:https://github.com/DormyMo/SpiderKeepergit

1、运行环境

  • Centos7
  • Python2.7
  • Python3.6
    注意:supervisor用的是Python2.7,scrapyd用的是Python3.6,须要自行编译安装。Python3具体安装自行百度。

2、安装依赖

一、supervisor pip install supervisor
二、scrapyd pip3 install scrapyd
三、SpiderKeeperpip3 install SpiderKeepergithub

3、配置scrapyd

一、新建scrapyd的配置文件:web

[scrapyd]
eggs_dir    = eggs
logs_dir    = logs
items_dir   =
jobs_to_keep = 5
dbs_dir     = dbs
max_proc    = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
http_port   = 6800
debug       = off
runner      = scrapyd.runner
application = scrapyd.app.application
launcher    = scrapyd.launcher.Launcher
webroot     = scrapyd.website.Root

[services]
schedule.json     = scrapyd.webservice.Schedule
cancel.json       = scrapyd.webservice.Cancel
addversion.json   = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json  = scrapyd.webservice.ListSpiders
delproject.json   = scrapyd.webservice.DeleteProject
delversion.json   = scrapyd.webservice.DeleteVersion
listjobs.json     = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus

4、配置supervisor

一、建立配置的文件夹和配置文件json

mkdir /etc/supervisor
echo_supervisord_conf > /etc/supervisor/supervisord.conf

二、编辑配置文件vim /etc/supervisor/supervisord.confvim

;[include]
;files = relative/directory/*.ini

改成api

[include]
files = conf.d/*.conf

三、新建conf.d文件夹mkdir /etc/supervisor/conf.d
四、添加scrapyd的配置文件vim /etc/supervisor/conf.d/scrapyd.conf服务器

[program:scrapyd]
command=/usr/local/python3.5/bin/scrapyd
directory=/opt/SpiderKeeper
user=root
stderr_logfile=/var/log/scrapyd.err.log
stdout_logfile=/var/log/scrapyd.out.log

五、添加spiderkeeper的配置文件vim /etc/supervisor/conf.d/spiderkeeper.confapp

[program:spiderkeeper]
command=spiderkeeper --server=http://localhost:6800
directory=/opt/SpiderKeeper
user=root
stderr_logfile=/var/log/spiderkeeper.err.log
stdout_logfile=/var/log/spiderkeeper.out.log

六、启动supervisor,supervisord -c /etc/supervisor/supervisord.confscrapy

5、使用

一、登陆http://localhost:5000
二、新建project
三、打包爬虫文件
pip3 install scrapyd-client
scrapyd-deploy --build-egg output.egg
四、上传打包好的爬虫egg文件

SpiderKeeper能够识别多台服务器的scrapyd,具体多加--server就好。

相关文章
相关标签/搜索