进程运行的三个状态:运行、就绪、阻塞html
同步:提交一个任务,自任务开始运行直到此任务结束(可能有IO),返回一个返回值以后,我再提交下一个任务python
异步:一次提交多个任务,而后直接执行下一行代码,等待任务结果浏览器
返回结果如何回收?服务器
案例:给三个老师发布任务:并发
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import time import random import os def task(i): print(f"{os.getpid()} 开始了") time.sleep(random.randint(1, 3)) print(f"{os.getpid()} 结束了") return i if __name__ == '__main__': pool = ProcessPoolExecutor(4) for i in range(6): obj = pool.submit(task, i) # obj是一个动态对象,返回当前对象的状态,有可能运行中(running),可能pending(就绪或阻塞),还可能使结束了(finished returned int) # obj.result()必须等到这个任务完成后,返回结果以后再执行下一个任务 print(obj.result()) # obj.result()没有返回值 pool.shutdown(wait=True) print("===主")
异步调用返回值如何接收? 未解决app
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import time import random import os def task(i): print(f"{os.getpid()} 开始了") time.sleep(random.randint(1, 3)) print(f"{os.getpid()} 结束了") return i if __name__ == '__main__': pool = ProcessPoolExecutor(4) for i in range(6): pool.submit(task, i) pool.shutdown(wait=True) # shutdown:让个人主进程等待进程池中全部的子进程都结束以后再执行下面的代码,优势相似于join # shutdown:在上一个进程池没有完成全部的任务以前,不容许添加新的任务 # 一个任务是经过一个函数实现的,任务完成了他的返回值就是函数的返回值 print("===主")
方式一:异步调用统一接收结果dom
缺点:我不能立刻收到任何一个已经完成的任务的返回值,我只能等到全部的任务所有结束以后统一回收异步
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import time import random import os def task(i): print(f"{os.getpid()} 开始了") time.sleep(random.randint(1, 3)) print(f"{os.getpid()} 结束了") return i if __name__ == '__main__': pool = ProcessPoolExecutor(4) lst = [] for i in range(6): obj = pool.submit(task, i) lst.append(obj) pool.shutdown() for i in lst: i.result() # print(i.result()) print("===主")
第二种方式:异步调用+回调函数函数
利用代码模拟一个浏览器,进行浏览器的工做流程获得一堆源代码url
对源代码进行数据清洗获得我想要的数据
页面请求的状态值
分别有:200请求成功、303重定向、400请求错误、401未受权、403禁止访问、404文件未找到、500服务器错误
代码
import requests ret = requests.get("http://www.baidu.com") if ret.status_code == 200: print(ret.text)
主代码
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import requests def task(url): """ 模拟的是爬取多个源代码,必定有IO操做 :param url: :return: """ ret = requests.get(url) if ret.status_code == 200: return ret.text def parse(content): """ 模拟对数据进行分析,通常没有IO :param content: :return: """ return len(content) if __name__ == '__main__': # 串行 耗费时间长,不可取 # ret1 = task("http://www.baidu.com") # print(parse(ret1)) # ret2 = task("http://www.JD.com") # print(parse(ret2)) # ret3 = task("http://www.taobao.com") # print(parse(ret3)) # ret4 = task("https://www.cnblogs.com/jin-xin/articles/7459977.html") # print(parse(ret4)) # 开启线程池,并发并行的执行 url_list = [ "http://www.baidu.com", "http://www.JD.com", "http://www.taobao.com", "https://www.cnblogs.com/jin-xin/articles/7459977.html", 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/9811379.html', 'https://www.cnblogs.com/jin-xin/articles/11245654.html', 'https://www.luffycity.com/' ] pool = ThreadPoolExecutor(4) obj_list = [] for url in url_list: obj = pool.submit(task, url) obj_list.append(obj) pool.shutdown(wait=True) for i in obj_list: print(parse(i.result())) print("===主")
总结:
缺点:
异步发出10个任务,并发的执行,可是统一接收了全部任务的返回值(效率低,不能实时的获取结果)
分析结果流程是串行,影响了效率
for res in obj_list: print(parse(res.result()))
针对版本一的缺点2进行改进,让串行变成并发或并行
解决方式
主代码
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import requests def task(url): """ 模拟的是爬取多个源代码,必定有IO操做 :param url: :return: """ ret = requests.get(url) if ret.status_code == 200: return parse(ret.text) def parse(content): """ 模拟对数据进行分析,通常没有IO :param content: :return: """ return len(content) if __name__ == '__main__': url_list = [ "http://www.baidu.com", "http://www.JD.com", "http://www.taobao.com", "https://www.cnblogs.com/jin-xin/articles/7459977.html", 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/9811379.html', 'https://www.cnblogs.com/jin-xin/articles/11245654.html', 'https://www.luffycity.com/' ] pool = ThreadPoolExecutor(4) obj_list = [] for url in url_list: obj = pool.submit(task, url) obj_list.append(obj) pool.shutdown(wait=True) for i in obj_list: print(i.result()) print("===主")
总结:
版本一与版本二对比
版本一:
版本二:
缺点:
基于异步调用回收全部任务的结果我要作到实时回收结果
并发执行任务每一个任务只是处理IO阻塞的,不能增长新的功能
异步调用+回调函数
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import requests def task(url): """ 模拟的是爬取多个源代码,必定有IO操做 :param url: :return: """ ret = requests.get(url) if ret.status_code == 200: return ret.text def parse(obj): """ 模拟对数据进行分析,通常没有IO :param content: :return: """ print(len(obj.result())) if __name__ == '__main__': url_list = [ "http://www.baidu.com", "http://www.JD.com", "http://www.taobao.com", "https://www.cnblogs.com/jin-xin/articles/7459977.html", 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/9811379.html', 'https://www.cnblogs.com/jin-xin/articles/11245654.html', 'https://www.luffycity.com/' ] pool = ThreadPoolExecutor(4) for url in url_list: obj = pool.submit(task, url) obj.add_done_callback(parse)
总结:
异步与回调是一回事?