网络编程之进程池与线程池

时间 2019-11-11

原文原文链接

网络编程之进程池与线程池

1、进程池与线程池

在刚开始学多进程或多线程时，咱们火烧眉毛地基于多进程或多线程实现并发的套接字通讯，然而这种实现方式的致命缺陷是：服务的开启的进程数或线程数都会随着并发的客户端数目地增多而增多，这会对服务端主机带来巨大的压力，甚至于不堪重负而瘫痪，因而咱们必须对服务端开启的进程数或线程数加以控制，让机器在一个本身能够承受的范围内运行，这就是进程池或线程池的用途，例如进程池，就是用来存放进程的池子，本质仍是基于多进程，只不过是对开启进程的数目加上了限制。html

介绍：python

官网：https://docs.python.org/dev/library/concurrent.futures.html

concurrent.futures模块提供了高度封装的异步调用接口
ThreadPoolExecutor：线程池，提供异步调用
ProcessPoolExecutor: 进程池，提供异步调用
Both implement the same interface, which is defined by the abstract Executor class.

基本方法：git

一、submit(fn, *args, **kwargs)
异步提交任务

二、map(func, *iterables, timeout=None, chunksize=1) 
取代for循环submit的操做

三、shutdown(wait=True) 
至关于进程池的pool.close()+pool.join()操做
wait=True，等待池内全部任务执行完毕回收完资源后才继续
wait=False，当即返回，并不会等待池内的任务执行完毕
但无论wait参数为什么值，整个程序都会等到全部任务执行完毕
submit和map必须在shutdown以前

四、result(timeout=None)
取得结果

五、add_done_callback(fn)
回调函数

2、进程池

介绍：github

一、The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.

二、class concurrent.futures.ProcessPoolExecutor(max_workers=None, mp_context=None)

三、An Executor subclass that executes calls asynchronously using a pool of at most max_workers processes. If max_workers is None or not given, it will default to the number of processors on the machine. If max_workers is lower or equal to 0, then a ValueError will be raised.

用法：编程

from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor
import os,time,random
def task(n):
    print('%s is runing' %os.getpid())
    time.sleep(random.randint(1,3))
    return n**2
if __name__ == '__main__':
    executor=ProcessPoolExecutor(max_workers=3)
    futures=[]
    for i in range(11):
        future=executor.submit(task,i)
        futures.append(future)
    executor.shutdown(True)
    print('+++>')
    for future in futures:
        print(future.result())

3、线程池

介绍：json

一、ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously.

二、class concurrent.futures.ThreadPoolExecutor(max_workers=None, thread_name_prefix='')

三、An Executor subclass that uses a pool of at most max_workers threads to execute calls asynchronously.

四、Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for ProcessPoolExecutor.

五、New in version 3.6: The thread_name_prefix argument was added to allow users to control the threading.Thread names for worker threads created by the pool for easier debugging.

用法：网络

把ProcessPoolExecutor换成ThreadPoolExecutor，其他用法所有相同

4、map方法

from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor
import os,time,random
def task(n):
    print('%s is runing' %os.getpid())
    time.sleep(random.randint(1,3))
    return n**2
if __name__ == '__main__':
    executor=ThreadPoolExecutor(max_workers=3)
    # for i in range(11):
    #     future=executor.submit(task,i)
    executor.map(task,range(1,12)) #map取代了for+submit

5、回调函数

能够为进程池或线程池内的每一个进程或线程绑定一个函数，该函数在进程或线程的任务执行完毕后自动触发，并接收任务的返回值看成参数，该函数称为回调函数。多线程

from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor
from multiprocessing import Pool
import requests
import json
import os
def get_page(url):
    print('<进程%s> get %s' %(os.getpid(),url))
    respone=requests.get(url)
    if respone.status_code == 200:
        return {'url':url,'text':respone.text}
def parse_page(res):
    res=res.result()
    print('<进程%s> parse %s' %(os.getpid(),res['url']))
    parse_res='url:<%s> size:[%s]\n' %(res['url'],len(res['text']))
    with open('db.txt','a') as f:
        f.write(parse_res)
if __name__ == '__main__':
    urls=[
        'https://www.baidu.com',
        'https://www.python.org',
        'https://www.openstack.org',
        'https://help.github.com/',
        'http://www.sina.com.cn/'
    ]
    p=ProcessPoolExecutor(3)
    for url in urls:
        p.submit(get_page,url).add_done_callback(parse_page) #parse_page拿到的是一个future对象obj，须要用obj.result()拿到结果