前段时间有个程序忽然出现了子进程不工做的状况。html
后来经过调查发现是由于程序中同时使用了多线程,多进程以及 logging 模块,致使子进程中出现了死锁的状况。python
当建立子进程的时候,后台线程中的 logging 模块正好获取了一个锁(threading.RLock
)在记录日志信息。因为在 unix/linux 平台下 Python 是经过 fork 来建立子进程的,所以建立子进程的时候会把 logging 中的锁也复制了一份,当子进程中须要记录日志的时候发现 logging 的锁一直处于被占用的状态,从而出现了死锁(复制的这个锁永远也不会被释放,由于它的全部者是父进程的某个线程,可是这个线程释放锁的时候又不会影响子进程里的这个锁)。linux
复现问题的代码以下:segmentfault
import os import sys import threading import time class ThreadWorker(threading.Thread): def __init__(self): print('ThreadWorker: init') super().__init__() def run(self): print('ThreadWorker: running (rlock = {0})'.format(global_rlock)) global_rlock.acquire() print('ThreadWorker: i got lock {0}'.format(global_rlock)) time.sleep(5) global_rlock.release() print('ThreadWorker: release lock {0} and ' 'sleeping forever'.format(global_rlock)) time.sleep(600000) global_rlock = threading.RLock(verbose=True) worker = ThreadWorker() worker.start() time.sleep(1) print('forking') pid = os.fork() if pid != 0: # pid != 0 当前处于父进程 print('parent: running (rlock = {0})'.format(global_rlock)) else: # pid = 0 当前处于子进程 print('child: running (rlock = {0}), ' 'getting the lock...'.format(global_rlock)) global_rlock.acquire() print('child: got the lock {0}'.format(global_rlock)) sys.exit(0) time.sleep(10)
上面代码的执行结果以下:多线程
$ python fork.py ThreadWorker: init ThreadWorker: running (rlock = <unlocked _thread.RLock object owner=0 count=0 at 0x10116cb40>) ThreadWorker: i got lock <locked _thread.RLock object owner=123145307557888 count=1 at 0x10116cb40> forking parent: running (rlock = <locked _thread.RLock object owner=123145307557888 count=1 at 0x10116cb40>) child: running (rlock = <locked _thread.RLock object owner=123145307557888 count=1 at 0x10116cb40>), getting the lock... ThreadWorker: release lock <unlocked _thread.RLock object owner=0 count=0 at 0x10116cb40> and sleeping forever
从上面的结果中能够看出来:虽然线程随后释放了得到的锁,可是子进程却永远的卡在了获取锁的地方。app
那么, 应该如何解决这个问题呢?至少有三种解决办法:ui
先建立子进程,而后再建立线程:线程
import os import sys import threading import time class ThreadWorker(threading.Thread): def __init__(self): print('ThreadWorker: init') super().__init__() def run(self): print('ThreadWorker: running (rlock = {0})'.format(global_rlock)) global_rlock.acquire() print('ThreadWorker: i got lock {0}'.format(global_rlock)) time.sleep(5) global_rlock.release() print('ThreadWorker: release lock {0} and ' 'sleeping forever'.format(global_rlock)) time.sleep(600000) global_rlock = threading.RLock(verbose=True) worker = ThreadWorker() print('forking') pid = os.fork() if pid != 0: # pid != 0 当前处于父进程 print('parent: running (rlock = {0})'.format(global_rlock)) worker.start() else: # pid = 0 当前处于子进程 time.sleep(1) print('child: running (rlock = {0}), ' 'getting the lock...'.format(global_rlock)) global_rlock.acquire() print('child: got the lock {0}'.format(global_rlock)) global_rlock.release() print('child: release the lock {0}'.format(global_rlock)) sys.exit(0) time.sleep(10)
结果:unix
$ python fork2.py ThreadWorker: init forking parent: running (rlock = <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70>) ThreadWorker: running (rlock = <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70>) ThreadWorker: i got lock <locked _thread.RLock object owner=123145307557888 count=1 at 0x10f24cb70> child: running (rlock = <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70>), getting the lock... child: got the lock <locked _thread.RLock object owner=140735162044416 count=1 at 0x10f24cb70> child: release the lock <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70> ThreadWorker: release lock <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70> and sleeping forever
能够看到子进程和线程都可以正常获取锁。日志
不要混合使用 threading, multiprocessing, logging/其余使用了线程锁的模块。 要么都是多线程,要么都是多进程。
另外一个办法就是配置 logging 使用无锁的 handler 来记录日志信息。
Issue 6721: Locks in the standard library should be sanitized on fork - Python tracker
multithreading - Deadlock with logging multiprocess/multithread python script - Stack Overflow
python - 使用multiprocessing.Process调用start方法后,有较小的概率子进程中run方法未执行 - SegmentFault
python multiprocessing hanging, potential queue memory error? - Stack Overflow
Threads and fork(): think twice before mixing them. | Linux Programming Blog