Tornado1.0源码分析-Web Framework

#Web Frameworkweb

做者:MetalBug
时间:2015-03-02
出处:http://my.oschina.net/u/247728/blog
声明:版权全部,侵犯必究
  • tornado.webRequestHandler and Application classes

Tornado的Web程序将URL或者URL范式映射到RequestHandler的子类。在其子类中定义了get()或者post()等函数,用于处理不一样的HTTP请求。正则表达式

如下是示例:cookie

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("You requested the main page")

application = web.Application([(r"/", MainPageHandler),])
http_server = httpserver.HTTPServer(application)
http_server.listen(8080)
ioloop.IOLoop.instance().start()

MainHandler继承于RequestHandler,重写了get()函数,在Application中将其映射到URL:/,因此当咱们以get方式访问host:/时会等到返回字符串"You requested the main page"。数据结构

1.Application##

Application包含了URL与其对于那个的handler(继承自RequestHandler),内部定义了__call__,因此可将其做为requset_callback传递给HTTPServer,当客户端访问对应URL,对调用对应的handler。app

###内部实现-数据结构### self.transforms用于对输出进行分块和压缩。框架

self.handlers主机名路由路径列表,每一个元素为(host, URLSpec objects)。异步

self.named_handlers为name映射对应handler的字典,用于reverse_url时反向查找。async

self.settings为设置,可用设置static_path,static_url_prefix等信息。ide

###内部实现-主要函数###函数

Application._init_()

初始化Application,主要作了如下工做: 1 .初始化self.transforms,默认为GZipContentEncodingChunkedTransferEncoding。 2 .初始化self.hanlders,先设定静态文件路由,再将添加路由规则。 3 .若是设置运行模式为Debug,则启用autoreload

def __init__(self, handlers=None, default_host="", transforms=None,
             wsgi=False, **settings):
     if transforms is None:
        self.transforms = []
        if settings.get("gzip"):
            self.transforms.append(GZipContentEncoding)
        self.transforms.append(ChunkedTransferEncoding)
    else:
        self.transforms = transforms
    ######
    if self.settings.get("static_path"):
        path = self.settings["static_path"]
        handlers = list(handlers or [])
        static_url_prefix = settings.get("static_url_prefix",
                                         "/static/")
        handlers = [
            (re.escape(static_url_prefix) + r"(.*)", StaticFileHandler,
             dict(path=path)),
            (r"/(favicon\.ico)", StaticFileHandler, dict(path=path)),
            (r"/(robots\.txt)", StaticFileHandler, dict(path=path)),
        ] + handlers
    if handlers: self.add_handlers(".*$", handlers)
    ####
    if self.settings.get("debug") and not wsgi:
        import autoreload
        autoreload.start()

Application.add_handler()

Application.add_handler()往self.handlers中添加路由路径规则。

self.handlers为主机名路由路径列表,每一个元素为tuple,包含了主机名和路由路径列表(URLSpec)。

Application.add_handler()先将host_pattern(主机名)和handlers(路由路径列表)合成一个tuple,而后添加到self.handles中。

def add_handlers(self, host_pattern, host_handlers):
    ####
    if self.handlers and self.handlers[-1][0].pattern == '.*$':
        self.handlers.insert(-1, (re.compile(host_pattern), handlers))
    else:
        self.handlers.append((re.compile(host_pattern), handlers))

    for spec in host_handlers:
        if spec.name:
    ####
            self.named_handlers[spec.name] = spec

Application.call()

Application定义了__call()__,使其实例可以被调用,做为HTTPServerrequset_callback。 该函数执行流程为: 1 .使用request初始化self.transformsself.transforms将会对发送数据进行分块和压缩。 2 .根据request的host获得路由路径列表,使用request.path依次匹配路由路径列表的每个对象,获得对应handler,同时解析获得路径中的参数(match.group())。 3 .匹配获得的handler是RequestHandler对象,调用其_execute()方法,它的做用是根据不一样的HTTP方法调用不一样的对应函数。

def __call__(self, request):
    transforms = [t(request) for t in self.transforms]
    ####
    handlers = self._get_host_handlers(request)
    ####
    for spec in handlers:
        match = spec.regex.match(request.path)
        if match:
            handler = spec.handler_class(self, request, **spec.kwargs)
            kwargs=dict((k, unquote(v)) for (k, v) in match.groupdict().iteritems())
            args=[unquote(s) for s in match.groups()]
            break
    if not handler:
        handler = ErrorHandler(self, request, 404)
    ####
    handler._execute(transforms, *args, **kwargs)
    return handler

###内部实现-内部细节###

  1. Application的初始化时候,调用了add_handlers(".*$", handlers)

这里将.*做为默认主机名,由于.*可以匹配任意字符,因此默认状况下,传入的路由路径列表即为默认路由路径列表。

  1. 由于.*可以匹配任意字符,因此在Application.add_handlers()中须要保证它被放置在列表的最后。

  2. Application为何定义__call__() 如下是__call__(),其与C++的functor相似,主要用在涉及须要保存内部状态的状况下。

__call__(self, [args...]) Allows an instance of a class to be called as a function. Essentially, this means that x() is the same as x.__call__(). Note that __call__ takes a variable number of arguments; this means that you define __call__ as you would any other function, taking however many arguments you'd like it to. __call__ can be particularly useful in classes whose instances that need to often change state.

但对于当前的Application,在这里其实并无特殊的做用,使用self.callback也能够。

2.RequestHandler##

Application.__call__()RequestHandler__execute()暴露给Application,在这个函数中,实现了对HTTP请求的具体的分发和处理。 在实际使用时,咱们继承RequestHandler并重写 get()post()等实现对HTTP请求的处理。

###内部实现-数据结构###

self.request表示RequestHandler须要处理的请求(HTTPRquest)。 self._auto_finish用于处理异步状况。

###内部实现-主要函数### RequestHandler._execute()RequestHandler._execute()中,会根据HTTP请求的方法调用相对应的函数进行处理。 主要流程以下: 1 .若是为POST请求,同时设置了xsrf检查,那么先校验xsrf。 2 .调用self.prepare(),该函数为子类重写,作处理请求前的准备。 3 .根据HTTP请求方法调用对应处理函数。 4 .若是为self._auto_finishTrue,那么执行self.finish()结束请求。

def _execute(self, transforms, *args, **kwargs):
    self._transforms = transforms
    try:
        if self.request.method not in self.SUPPORTED_METHODS:
            raise HTTPError(405)
        if self.request.method == "POST" and \
           self.application.settings.get("xsrf_cookies"):
            self.check_xsrf_cookie()
        self.prepare()
        if not self._finished:
            getattr(self, self.request.method.lower())(*args, **kwargs)
            if self._auto_finish and not self._finished:
                self.finish()
    except Exception, e:
        self._handle_request_exception(e)

Requesthandler.finish() Requesthandler.finish()用于业务逻辑代码执行后的处理工做。 主要完成了如下善后工做: 1 .设置返回请求的头部。 2 .调用self.flush()函数将缓冲区经过IOStream输出。 3 .关闭链接。

def finish(self, chunk=None):
    if chunk is not None: self.write(chunk)
    if not self._headers_written:
        ####set_header
    if hasattr(self.request, "connection"):
        self.request.connection.stream.set_close_callback(None)
    if not self.application._wsgi:
        self.flush(include_footers=True)
        self.request.finish()
        self._log()
    self._finished = True

Requesthandler.flush() Requesthandler.flush()先将缓冲区中数据使用transform进行分块和压缩,再发送到客户端。

def flush(self, include_footers=False):
    if self.application._wsgi:
        raise Exception("WSGI applications do not support flush()")
    chunk = "".join(self._write_buffer)
    self._write_buffer = []
    if not self._headers_written:
        self._headers_written = True
        for transform in self._transforms:
            self._headers, chunk = transform.transform_first_chunk(
                self._headers, chunk, include_footers)
        headers = self._generate_headers()
    else:
        for transform in self._transforms:
            chunk = transform.transform_chunk(chunk, include_footers)
        headers = ""

    if self.request.method == "HEAD":
        if headers: self.request.write(headers)
        return

    if headers or chunk:
        self.request.write(headers + chunk)

###内部实现-内部细节###

RequestHadlers.finish()中,会将self.request.connection.stream.close_callback(下称close_callback)设置为None。 由于request已经结束,清除close_callback可以避免出现RequestHandle回收不及时状况。 若是不清除,假设request为长链接,当一次请求结束,这时候RequestHandler会由于close_back仍然绑定在request上而不会被回收。

def finish(self, chunk=None):
    ####
    if hasattr(self.request, "connection"):
        # Now that the request is finished, clear the callback we
        # set on the IOStream (which would otherwise prevent the
        # garbage collection of the RequestHandler when there
        # are keepalive connections)
        self.request.connection.stream.set_close_callback(None)
    if not self.application._wsgi:
        self.flush(include_footers=True)
        self.request.finish()
        self._log()
    self._finished = True

上述代码中,先将close_callback设置为None,再调用request.finish(),根据以前对HTTPRequest和IOStream分析,在request.finish()中由于_close_callback已被设置为None,并不会被调用,这是为何呢。

其实在这里,咱们要注意的是RequestHandler.on_connection_close()IOstream.on_close_callback()意义并不一致。

RequestHandler中,使用情景是当检测到客户端断开链接时使用,在异步调用时会被调用,能够作一些错误处理等工做。

def on_connection_close(self):
    """Called in async handlers if the client closed the connection.

    You may override this to clean up resources associated with
    long-lived connections.

    Note that the select()-based implementation of IOLoop does not detect
    closed connections and so this method will not be called until
    you try (and fail) to produce some output.  The epoll- and kqueue-
    based implementations should detect closed connections even while
    the request is idle.
    """
    pass

IOStream中,self._close_callbackIOStream.close()时被调用,也就是在Request.finish()时被调用。

def set_close_callback(self, callback):
    """Call the given callback when the stream is closed."""
    self._close_callback = callback

#总结

根据对ApplicationRequestHandler的分析,咱们能够了解到Tornado1.0的Web框架对于一个请求的处理流程以下:

1 .Web程序为每个请求建立一个RequestHandler对象而且初始化。 2 .Web程序调用RequestHandler.prepare()。不管使用了哪一种HTTP方法,RequestHandler.prepare()都会被调用到,这个方法在子类中重写。 3 .Web程序根据HTTP方法调用对应处理函数:例如get()post()put()等。若是URL的正则表达式模式中有分组匹配,那么相关匹配会做为参数传入方法 。

固然咱们也能够看到,在Tronado1.0中,对于RequestHandler的设计仍是有不足的,例如上文讲到的close_callback意义问题,例如能够重写prepare()用于处理前的准备,为何不能在finish()在添加调用on_finish(),用于本身增添的善后工做?这些都是有待完善的,具体的能够看Tornado后序版本的处理。

PS:博主本身对于Web这块了解比较薄弱,哪里说错请各位多多指正,谢谢。

相关文章
相关标签/搜索