aiohttp

时间 2019-11-12

标签 aiohttp 繁體版

原文原文链接

参考文档html

http://www.topjishu.com/10935.htmlpython

aiohttp 源码解析之 request 的处理过程

【转自太阳尚远的博客：http://blog.yeqianfeng.me/2016/04/01/python-yield-expression/】
使用过 python 的 aiohttp 第三方库的同窗会知道，利用 aiohttp 来构造一个最简单的web服务器是很是轻松的事情，只须要下面几行代码就能够搞定：web

from aiphttp import web
import asyncio

def index(request):
    return web.Response(body=b'<h1>Hello World!</h1>')
    
async def init(loop):
    app = web.Application(loop=loop)
    app.router.add_route('GET', '/index', index)
    server = await loop.create_server(app.make_handler(), '127.0.0.1', 8000)
    return server

def main():
    loop = asyncio.get_event_loop()
    loop.run_until_complete(init())
    loop.run_forever()

if __name__ == '__main__':
    main()

这样咱们就实现了一个最简单的 web 服务器...express

运行这个 python 文件，再打开浏览器，在地址栏输入 http://127.0.0.1:8000/index 你就能看见 Hello World 了。是否是很神奇？那么有的同窗到这里就会疑惑了，当用户在浏览器输入 http://127.0.0.1:8000/index 的时候，服务器到底是怎么把请求定位到咱们的 url 处理函数 index(request) 里的呢？从代码来看，能够确定地判断，是由于有浏览器

app.router.add_route('GET', '/index', index)

这行代码的缘由，服务器才知道，你的 request 请求(method:GET path:/index) 须要让 index(request)函数来处理。那么行代码的内部究竟作了什么？服务器是如何响应一个request请求的呢？让咱们打开单步调试，一步一步跟着服务器的脚步，看看发生了什么？服务器

咱们先看服务器是如何接收到请求的，多打几回断点就不难发现，当有request进来的时候，服务器会最终进入到 aiohttp 的 server.py 模块的 ServerHttpProtocol 类里的 start()函数里面：app

@asyncio.coroutine
def start(self):
   """Start processing of incoming requests.

   It reads request line, request headers and request payload, then
   calls handle_request() method. Subclass has to override
   handle_request(). start() handles various exceptions in request
   or response handling. Connection is being closed always unless
   keep_alive(True) specified.
   """
   # 先看函数注释，后面的代码后面说

从源码的注释来看，这个函数就是服务器开始处理request的地方了
继续分析start()函数的代码：less

......
@asyncio.coroutine
def start(self):
    .......
    while True:
        message = None
        self._keep_alive = False
        self._request_count += 1
        self._reading_request = False

        payload = None
        try:
            # read http request method
            # ....
            # 中间省略若干行...
            # ....
            yield from self.handle_request(message, payload)
            # ....

咱们看到了，在这个代码快的最后一句，将request交给了handle_request()函数去处理了，若是这个时候你在 ServerHttpProtocol 类里面找 handler_request() 函数，会发现，它并非一个 coroutine 的函数，到底是怎么回事呢？咱们单步执行到这里看看，而后 F7 进入到这个函数里面，发现原来这里进入的并非 ServerHttpProtocol 类里的函数，而是 web.py 里的 RequestHandler 类里的 handler_request() 函数，原来 RequestHandler 类是继承自 ServerHttpProtocol 类的，它里面覆写了 hander_request() 函数，并用 @asyncio.coroutine 修饰了,咱们看看它的代码：async

@asyncio.coroutine
def handle_request(self, message, payload):
    if self.access_log:
        now = self._loop.time()
    app = self._app
    # 此处才真正构造了Request对象
    request = web_reqrep.Request(
        app, message, payload,
        self.transport, self.reader, self.writer,
        secure_proxy_ssl_header=self._secure_proxy_ssl_header)
    self._meth = request.method
    self._path = request.path
    try:
        # 能够发现，这里 match_info 的得到是经过 self._router.resolve(request)函数来获得的。
        match_info = yield from self._router.resolve(request)
        # 获得的 match_info 必须为 AbstractMatchInfo 类型的对象
        assert isinstance(match_info, AbstractMatchInfo), match_info
        resp = None
        request._match_info = match_info
        ......
        if resp is None:
            handler = match_info.handler # 这个handler会不会就是咱们的request的最终处理函数呢？
            for factory in reversed(self._middlewares):
                handler = yield from factory(app, handler)
            # 重点来了，这里好像是在等待咱们的 url 处理函数处理的结果啊
            resp = yield from handler(request)
    except:
        ......
    # 下面这两句的的做用就是将返回的结果送到客户端了，具体的执行过程较为复杂，博主也就大体看了下，没有作详细思考。这里就不说了。
    resp_msg = yield from resp.prepare(request)
    yield from resp.write_eof()
    ......

经过上面的代码中的注释，咱们大体了解了几个关键点：ide

这个 match_info 到底是什么，是怎么得到的，他里面包含了哪些属性？
handler 又是什么，又是怎么得到的？
handler(request) 看起来很像咱们的 request 的最终处理函数，它的执行过程到底是怎样的？

了解了以上三点，基本上整个 request 请求的过程大概就了解了，咱们一步一步来看。

先看第一点，match_info 是怎么来的

仍是看代码，咱们进入到 self._route.resolve(request) 的源码中：

@asyncio.coroutine
def resolve(self, request):
    path = request.raw_path
    method = request.method
    allowed_methods = set()
    # 请留意这里是 for 循环
    for resource in self._resources:
        match_dict, allowed = yield from resource.resolve(method, path)
        if match_dict is not None:
            return match_dict
        else:
            allowed_methods |= allowed
    else:
        if allowed_methods:
            return MatchInfoError(HTTPMethodNotAllowed(method,allowed_methods))
        else:
            return MatchInfoError(HTTPNotFound())

代码量并很少，上面的代码里的 path 和 method 就是 request 对象里封装的客户端的请求的 url 和 method（例如： /index 和 GET），注意到第9行，return 了一个 match_dict 对象，说明没有差错的话，正确的返回结果就是这个 match_dict。match_dict 又是啥呢？看到 match_dict 经过 resource.resolve(method, path) 函数得到的，咱们不着急看这个函数的内部实现，咱们先看看 resource 是什么类型，这样看确定是看不出来的，惟一知道的是它是 self._resource （它是一个list）的元素，咱们打开调试器，执行到这一步就能够看到， self._resource 中存储的元素是 ResourceRoute 类型的对象，这个 ResourceRoute 咱们先不细说，只知道它有一个 resolve() 的成员函数：

@asyncio.coroutine
def resolve(self, method, path):
    allowed_methods = set()
    match_dict = self._match(path)
    if match_dict is None:
        return None, allowed_methods
    for route in self._routes:
        route_method = route.method
        allowed_methods.add(route_method)
        if route_method == method or route_method == hdrs.METH_ANY:
            # 这里的 return 语句是正常状况下的返回结果
            return UrlMappingMatchInfo(match_dict, route), allowed_methods
    else:
        return None, allowed_methods

咱们发现了，以前说的那个 match_dict 原来就是一个 UrlMappingMatchInfo 对象，可是，细心的同窗能够发现，这个函数里也有一个 match_dict 对象，这里的 match_dict 是 self._match(path) 的返回结果，那咱们再看看 self._match(path) 是怎样的一个过程，看调试信息的话，能够看到，这里的 self 是 PlainResource 类，他的 _match() 方法以下所示：

def _match(self, path):
    # string comparison is about 10 times faster than regexp matching
    if self._path == path:
        return {}
    else:
        return None

代码很是简洁，就是将传入的 path （好比 /index）与 PlainResource 类的实例的 _path 属性比较，若是相等就返回一个空字典，不然返回 None，我想这个返回结果既然是空字典，那他的做用在上层调用处应该是做为一个 if 语句的判断条件来用，事实也确实是这样的。若是，这里的 PlainResource 是什么，我在这里先告诉你，这是你在初始化服务器的时为服务器添加路由的时候就实例化的对象，它是做为app的一个属性存在的，这里先无论他，可是你要留意它，后面会讲到它。

好了，咱们再次回到 resolve(self, method, path) 函数中去（注意了，有两个 resolve 函数，我用参数将他们区分开来），在得到 match_dict 以后进行 None 的检查，若是是 None ，说明request的 path 在 app 的route中没有匹配的，那就直接返回 None 了，在上上层的 resolve(self, request)函数里继续遍历下一个 resource 对象而后匹配（balabala...）。
若是 match_dict 不为 None，说明这个resource对象里的 path 和 request 里的 path 是匹配的，那么就：

for route in self._routes:
    route_method = route.method
    allowed_methods.add(route_method)
    if route_method == method or route_method == hdrs.METH_ANY:
        # 这里的 return 语句是正常状况下的返回结果
        return UrlMappingMatchInfo(match_dict, route), allowed_methods

这个操做是当 path 匹配的时候再检查 method，若是这个 resource 的 method 与 request 的 method 也是相同的，或者 resource 的 method 是 "*"，（星号会匹配全部的method），则 return 一个 UrlMappingMatchInfo 对象，构造时传入了 match_dict 和 route，route 是 ResourceRoute 类型的对象，里面封装了 PlainResource 类型的对象，也就是 resource 对象。也就是说，如今返回的 UrlMappingMatchInfo 对象就是封装了与 request 的 path 和 method 彻底匹配的 PlainResource 对象。有点乱啊，是否是，只怪博主水平有限。。。

那么如今理一理，这个 UrlMappingMatchInfo 返回到哪了，回顾一下上面的内容就发现了，返回到的地方是 resolve(self, request) 函数的 match_dict 对象，还记的么，这个对象还在 for 循环里，match_dict 获得返回值，就判断是否为 None，若是是 None 就继续匹配下一个 PlainResource（后面会说到这个 PlainResource 是怎么来的，先不要急），若是不是 None，就直接返回 match_dict（是一个UrlMappingMatchInfo对象），这个 match_dict 返回给了谁？不急，再往前翻一翻，发现是返回给了 handler_request(self, message, payload) 函数的 match_info 了，回头看 handler_request() 的代码，要求 match_info 是 AbstractMatchInfo 类型的，其实并不矛盾，由于 UrlMappingMatchInfo 类就是继承自 AbstractMatchInfo 类的。

好了，如今第一个问题搞明白了，咱们知道了match_info 是什么，从哪来的，里面封装了那些信息。

如今咱们再看看 handler 是什么：

咱们继续看 handler_request(self, message, payload)：

# 这里是将返回的 match_info 封装到了 request 对象中了，以便后面使用，先无论他
request._match_info = match_info
......  # 省略号是省去了部分不做为重点的代码
if resp is None:
    # 这里咱们获得了 handler，看看它到底是什么
    handler = match_info.handler
    for factory in reversed(self._middlewares):
        handler = yield from factory(app, handler)
    resp = yield from handler(request)

终于又回到了咱们的 handler 了，能够看到，handler 实际上是 match_info 的一个属性，可是咱们看调试信息的话发现 match_info 并无 handler 这一属性，缘由是由于调试窗口能显示的都是非函数的属性，python中，函数也属于对象的属性之一，而这里的 handler 刚好就是一个函数，因此返回的 handler 才能是一个可调用的对象啊。闲话很少说，咱们的目的是搞清楚 handler 究竟是什么，为了弄清楚 match_info.handler 是啥，咱们进入 AbstractMatchInfo 类里面看看：

class AbstractMatchInfo(metaclass=ABCMeta):
    ......
    @asyncio.coroutine  # pragma: no branch
    @abstractmethod
    def handler(self, request):
        """Execute matched request handler"""
    ......

很明显，handler 是一个抽象方法，它的具体实现应该在其子类里，因此咱们再看看 UrlMappingMatchInfo 类：

class UrlMappingMatchInfo(dict, AbstractMatchInfo):
    ......
    @property
    def handler(self):
        return self._route.handler
    ......

原来 handler() 函数返回的是 UrlMappingMatchInfo 的 self._route.handler，这个 _route 又是啥呢？不知道就看调试信息啊～，看了调试信息后，原来 _route 是一个 ResourceRoute 类型的对象：

细心的同窗会发现，即使是 _route，也依然没有看到 hanler 啊，说明 handler 在 ResourceRoute 类里也是个函数。因此...，还要去看看 ResourceRoute 类：

class ResourceRoute(AbstractRoute):
    """A route with resource"""
    ......
    # 剩下的不贴了

我找了半天发现并无 handler() 函数啊，好，那咱们就去它的父类找去：

class AbstractRoute(metaclass=abc.ABCMeta):
    def __init__(self, method, handler, *,
                 expect_handler=None,
                 resource=None):
        self._method = method
        # 此处给 _handler 赋值
        self._handler = handler
        ......
    # 返回的是self._handler
    @property
    def handler(self):
        return self._handler
    ......

哈哈，原来在这里，小婊砸终于找到你啦。原来层层 handler 的最终返回的东西是 AbstractRoute 类里的 _handler，能够发现这个 _handler 是在 AbstractRoute 构造函数里给它赋值的，那么这个 AbstractRoute 类型的对象何时会实例化呢？

如今咱们回到最原始的地方，就是：

app.router.add_route('GET', '/index', index)

到了这里，就有必要说一下了，这个 app.router 返回的实际上是一个 UrlDispatcher 对象，在 Application 类里面有一个 @property 修饰的 router() 函数，返回的是Application对象的 _router 属性，而 _router 表明的就是一个 UrlDispatcher 对象。因此，上面的 add_route() 函数实际上是 UrlDisparcher 类的成员函数。这个 add_route() 究竟又作了什么事呢？。进入到 add_route()函数内部：

class UrlDispatcher(AbstractRouter, collections.abc.Mapping):
    ......
    def add_route(self, method, path, handler, *, name=None, expect_handler=None):
        resource = self.add_resource(path, name=name)
        return resource.add_route(method, handler,
                                  expect_handler=expect_handler)
    ......
    
    def add_resource(self, path, *, name=None):
        if not path.startswith('/'):
            raise ValueError("path should be started with /")
        if not ('{' in path or '}' in path or self.ROUTE_RE.search(path)):
            # 注意这里构造的 resource 对象是 PlainResource 类型的
            resource = PlainResource(path, name=name)
            self._reg_resource(resource)
            return resource

出于方便，我把接下来要分析的代码块也贴在上面，反正都是 UrlDispatcher 类的成员函数。。
看上面的注释就知道了，函数 add_resource() 返回了一个 PlainResource 类型的对象，前面屡次提到的 PlainResource 终于在这里看到了来源，构造 resource 对象的时候把传入 add_route()中的 path 给封装进去了。而后就到了：

return resource.add_route(method, handler,
                                  expect_handler=expect_handler)

看来 PlainResource 类里面也有一个 add_route() 成员函数，咱们继续 F7 进入PlainResource 的 add_route()里面：

class Resource(AbstractResource):
    ......
    def add_route(self, method, handler, *,expect_handler=None):
        for route in self._routes:
            if route.method == method or route.method == hdrs.METH_ANY:
                raise RuntimeError("Added route will never be executed, "
                                   "method {route.method} is "
                                   "already registered".format(route=route))
        route = ResourceRoute(method, handler, self,expect_handler=expect_handler)
        self.register_route(route)
        return route
    ......

这个函数实例化了一个 ResourceRoute 对象 route，而且把咱们一步步传进来的 method 和handler（真正的 URL 处理函数）也传入了 ResourceRoute 的构造方法中，咱们来看看这个 ResourceRoute 类的状况：

class ResourceRoute(AbstractRoute):
    """A route with resource"""
    def __init__(self, method, handler, resource, *, expect_handler=None):
        super().__init__(method, handler, expect_handler=expect_handler, resource=resource)

惊喜的发现原来 ResourceRoute 就是 AbstractRoute 的子类，实例化的时候须要调用父类的构造方法，因此咱们刚才疑问的 AbstractRoute 类就是在这个时候实例化的，其内部的 _handler 属性也是在这个时候赋值的，也就是对应下面这句话中的 index 函数，

app.router.add_route('GET', '/index', index)

这样一来，咱们添加路由的时候，GET，/index，index 这三个信息最终会被封装成一个 ResourceRoute 类型的对象，而后再通过层层封装，最终会变成 app 对象内部的一个属性，你屡次调用这个方法添加其余的路由就会有多个 ResourceRoute 对象封装进 app.

好了，咱们终于也弄清了 handler 的问题，看来 handler 所指向的确实就是咱们最终的 url 处理函数。

这样咱们再回到 handle_request() 中看：

@asyncio.coroutine
def handle_request(self, message, payload):
    ......
    handler = match_info.handler
    for factory in reversed(self._middlewares):
        handler = yield from factory(app, handler)
    resp = yield from handler(request)
    .......

看明白了吧，获得了匹配 request 的 handler，咱们就能够放心的调用它啦～～

这里或许有的同窗还有一个疑问，就是中间那个 for 循环是干什么的，我在这里简单解释一下。这里实际上是涉及到初始化 app 的时候所赋值的另外一个参数 middlewares，就像这样：

app = web.Application(loop=loop, middlewares=[
        data_factory, response_factory, logger_factory])

middlewares 实际上是一种拦截器机制，能够在处理 request 请求的先后先通过拦截器函数处理一遍，好比能够统一打印 request 的日志等等，它的原理就是 python 的装饰器，不知道装饰器的同窗还请自行谷歌，middlewares 接收一个列表，列表的元素就是你写的拦截器函数，for 循环里以倒序分别将 url 处理函数用拦截器装饰一遍。最后再返回通过所有拦截器装饰过的函数。这样在你最终调用 url 处理函数以前就能够进行一些额外的处理啦。

终于写完了，鉴于博主水平有限，有写的不妥的地方还请各位小伙伴留言指正，你们共同进步 ^_^