需求:现有爬虫程序(名为CNSubAllInd),须要使其一直保持在后台运行(若是执行完毕,当即从新启动,继续执行),并记录其运行日志。python
利用python的logging模块来记录日志,利用subprocess模块来和系统交互执行命令,检测到子程序结束运行以后,从新开启子程序。windows
代码以下keeprunning.py(CNSubAllInd就是须要保持在后台运行的程序):scrapy
# -*- coding: UTF-8 -*- #!DATE: 2018/10/9 #!@Author: yingying #keeprunning.py import os import subprocess # logging # require python2.6.6 and later import logging from logging.handlers import RotatingFileHandler ## log settings: SHOULD BE CONFIGURED BY config LOG_PATH_FILE = "D:\workspace\PyCharmProject\CompanyInfoSpider\my_service_mgr.log" LOG_MODE = 'a' LOG_MAX_SIZE = 10 * 1024 * 1024 # 10M per file LOG_MAX_FILES = 10 # 10 Files: my_service_mgr.log.1, printmy_service_mgrlog.2, ... LOG_LEVEL = logging.DEBUG LOG_FORMAT = "%(asctime)s %(levelname)-10s[%(filename)s:%(lineno)d(%(funcName)s)] %(message)s" handler = RotatingFileHandler(LOG_PATH_FILE, LOG_MODE, LOG_MAX_SIZE, LOG_MAX_FILES) formatter = logging.Formatter(LOG_FORMAT) handler.setFormatter(formatter) Logger = logging.getLogger() Logger.setLevel(LOG_LEVEL) Logger.addHandler(handler) # color output # pid = os.getpid() def print_error(s): print '\033[31m[%d: ERROR] %s\033[31;m' % (pid, s) def print_info(s): print '\033[32m[%d: INFO] %s\033[32;m' % (pid, s) def print_warning(s): print '\033[33m[%d: WARNING] %s\033[33;m' % (pid, s) def start_child_proc(command, merged): try: if command is None: raise OSError, "Invalid command" child = None if merged is True: # merge stdout and stderr child = subprocess.Popen(command) # child = subprocess.Popen(command, # stderr=subprocess.STDOUT, # 表示子进程的标准错误也输出到标准输出 # stdout=subprocess.PIPE # 表示须要建立一个新的管道 # ) else: # DO NOT merge stdout and stderr child = subprocess.Popen(command) # child = subprocess.Popen(command, # stderr=subprocess.PIPE, # stdout=subprocess.PIPE) return child except subprocess.CalledProcessError: pass # handle errors in the called executable except OSError: raise OSError, "Failed to run command!" def run_forever(command): print_info("start child process with command: " + ' '.join(command)) Logger.info("start child process with command: " + ' '.join(command)) merged = False child = start_child_proc(command, merged) failover = 0 while True: while child.poll() != None: failover = failover + 1 print_warning("child process shutdown with return code: " + str(child.returncode)) Logger.critical("child process shutdown with return code: " + str(child.returncode)) print_warning("restart child process again, times=%d" % failover) Logger.info("restart child process again, times=%d" % failover) child = start_child_proc(command, merged) # read child process stdout and log it out, err = child.communicate() returncode = child.returncode if returncode != 0: for errorline in err.slitlines(): Logger.info(errorline) else: Logger.info("execute child process failed") Logger.exception("!!!should never run to this!!!") if __name__ == "__main__": run_forever(['scrapy', 'crawl', 'CNSubAllInd'])
在这里感谢cheungmine提供的subprocess脚本写一个python的服务监控程序。ide
windows中运行方式:在命令行中输入start pythonw keeprunning.py命令,以后便会打开pythonw窗口以下:ui
注意:这个窗口是关不掉的,由于有keeprunning在后台运行,一旦检测到爬虫程序结束了,就会从新打开一个窗口(也即从新开启程序)。想要关闭的话,只能在任务管理器中关闭pythonw.exe程序,便中止了监控,当前爬虫程序执行完毕以后便结束爬虫。this
可是原做者提供的经过read来获取执行输出结果的方法(以下),我使用的时候会出现deadlock,每次就卡在read这里不往下执行了。spa
while True: while child.poll() != None: failover = failover + 1 print_warning("child process shutdown with return code: " + str(child.returncode)) Logger.critical("child process shutdown with return code: " + str(child.returncode)) print_warning("restart child process again, times=%d" % failover) Logger.info("restart child process again, times=%d" % failover) child = start_child_proc(command, merged) # deadlock!!! ch = child.stdout.read(1) if ch != '' and ch != '\n': line += ch if ch == '\n': print_info(line) line = ''
查了相关资料以及官方文档以后,Python Popen().stdout.read() hang发现问题就出在这里,按照官方文档的解释是之因此调用.stdout会卡死,是由于在读完最后一行后管道空了。.net
为了防止出现这样的状况应该使用communicate()来代替.stdout.read(),communicate的使用见官方文档命令行