BigPipe

时间 2019-11-30

标签 bigpipe 繁體版

原文原文链接

In the traditional model, the life cycle of a user request is the following:javascript

Browser sends an HTTP request to web server.
Web server parses the request, pulls data from storage tier then formulates an HTML document and sends it to the client in an HTTP response.
HTTP response is transferred over the Internet to browser.
Browser parses the response from web server, constructs a DOM tree representation of the HTML document, and downloads CSS and JavaScript resources referenced by the document.
After downloading CSS resources, browser parses them and applies them to the DOM tree.
After downloading JavaScript resources, browser parses and executes them.

BigPipe is a fundamental redesign of the dynamic web page serving system. The general idea is to decompose web pages into small chunks called pagelets, and pipeline them through several execution stages inside web servers and browsers. This is similar to the pipelining performed by most modern microprocessors: multiple instructions are pipelined through different execution units of the processor to achieve the best performance. Although BigPipe is a fundamental redesign of the existing web serving process, it does not require changing existing web browsers or servers; it is implemented entirely in PHP and JavaScript. BigPipe breaks the page generation process into several stages:css

Request parsing: web server parses and sanity checks the HTTP request.
Data fetching: web server fetches data from storage tier.
Markup generation: web server generates HTML markup for the response.
Network transport: the response is transferred from web server to browser.
CSS downloading: browser downloads CSS required by the page.
DOM tree construction and CSS styling: browser constructs DOM tree of the document, and then applies CSS rules on it.
JavaScript downloading: browser downloads JavaScript resources referenced by the page.
JavaScript execution: browser executes JavaScript code of the page.

The first three stages are executed by the web server, and the last four stages are executed by the browser. Each pagelet must go through all these stages sequentially, but BigPipe enables several pagelets to be executed simultaneously in different stageshtml

In BigPipe, the life cycle of a user request is the following: The browser sends an HTTP request to web server. After receiving the HTTP request and performing some sanity check on it, web server immediately sends back an unclosed HTML document that includes an HTML <head> tag and the first part of the <body> tag. The <head> tag includes BigPipe’s JavaScript library to interpret pagelet responses to be received later. In the <body> tag, there is a template that specifies the logical structure of page and the placeholders for pagelets.前端

After flushing the first response to the client, web server continues to generate pagelets one by one. As soon as a pagelet is generated, its response is flushed to the client immediately in a JSON-encoded object that includes all the CSS, JavaScript resources needed for the pagelet, and its HTML content, as well as some meta data. For example:java

1 <script type="text/javascript">
2 big_pipe.onPageletArrive({id: “pagelet_composer”, content=<HTML>, css=[..], js=[..], …})
3 </script>

At the client side, upon receiving a pagelet response via “onPageletArrive” call, BigPipe’s JavaScript library first downloads its CSS resources; after the CSS resources are downloaded, BigPipe displays the pagelet by setting its corresponding placeholder div’s innerHTML to the pagelet’s HTML markup. Multiple pagelets’ CSS can be downloaded at the same time, and they can be displayed out-of-order depending on whose CSS download finishes earlier. In BigPipe, JavaScript resource is given lower priority than CSS and page content. Therefore, BigPipe won’t start downloading JavaScript for any pagelet until all pagelets in the page have been displayed. After that all pagelets’ JavaScript are downloaded asynchronously. Pagelet’s JavaScript initialization code would then be executed out-of-order depending on whose JavaScript download finishes earlier.web

It is worth noting that BigPipe was inspired by pipelining microprocessors. However, there are some differences between the pipelining performed by them. For example, although most stages in BigPipe can only operate on one pagelet at a time, some stages such as CSS downloading and JavaScript downloading can operate on multiple pagelets simultaneously, which is similar to superscalar microprocessors. Another important difference is that in BigPipe, we have implemented a ‘barrier’ concept borrowed from parallel programming, where all pagelets have to finish a particular stage, e.g. pagelet displaying stage, before any one of them can proceed further to download JavaScript and execute them. 编程

淘宝上也有一篇翻译的文章，还有提到一些实现上的讨论，不错：json

1. 服务器端的并行化浏览器

理想状况下，服务器端的实现是并行处理不一样的pagelet 的内容，这样能够提高性能。服务器并发处理多个pagelet 的内容时，一个pagelet 内容生成好了，马上将其flush 给浏览器。可是PHP 是不支持线程，因此服务器没法利用多线程的概念去并发的加载多个pagelet 的内容。对于小型网站来讲，使用串行的加载pagelet 的内容就已经能够达到优化的要求了。对于大型网站，为了达到更快的速度，服务器端能够选择并发的独立不一样的pagelet 的内容，具体实现有如下几种方式：缓存

java 多线程。后台逻辑使用java，可使用java 的多线程机制去同时加载不一样的pagelet 的内容，加载完成后加页面内容返回给浏览器。在最后的引用部分能够看到网上用java多线程实现的例子。
使用PHP实现。PHP 不支持线程，没法像java 使用多线程的机制来并发处理不一样pagelet 的内容。可是，Facebook 和淘宝主搜索的业务逻辑是用PHP 实现的，因此咱们必须考虑如何在PHP下完成并发处理。PHP 扩展中有curl 模块，能够在该模块中curl_multi_fetch()函数进行批处理请求，把原本应该串行的请求访问并发的执行。

2. 直接调用flush函数输出

到这里，可能会有这样的疑问，为什服务器不直接把生成好的HTML 内容分部flush() 返回给客户端，而是使用json 格式传递，而后用js 解析呢？这不是画蛇添足么？实际上，这也是目前主搜索前端使用的方法。咱们看看使用BigPipe方式的两大好处：

(1) 若是直接调用flush()函数输出html 源码，当模块较多的状况，模块间必须按顺序加载，在html 前面的模块必须先加载完，后面的才能加载，这样也就没办法每一个模块同时显示一些内容。若是采用JS 的话，能够前台显示多个loading，并且不须要关心到底哪一个模块先加载完，这样还能发挥后台多线程处理数据的优点。

(2)使用JS 这种方式能够是页面结构更加清晰，管理更加方便。同时作到了页面逻辑结构和数据解耦，首先返回的是页面的结构，接着不断地返回js脚本，而后动态添加页面内容，而不是全部完整的html 源码一块儿输出，增长了可维护性。

3. 访问者是爬虫或者访问者浏览器禁止使用JS的状况

咱们知道BigPipe 使用js 脚本加载页面，那么当用户在浏览器里设置禁止使用js 脚本（虽然人数不多），就会形成加载页面失败，这一样是很是很差的用户体验。对搜索引擎的爬虫来说，一样会遇到相似的问题。解决办法是当用户发送访问请求时，服务器端检测user-agent 和客户端是否支持js 脚本。若是user-agent 显示是一个搜索引擎爬虫或者客户端不支持js，就不使用BigPipe ，而用原有的模式，从而解决问题。

4. 对SEO的影响

这是一个必须考虑的问题，现在是搜索引擎的时代，若是网页对搜索引擎不友好，或者使搜索引擎很难识别内容，那么会下降网页在搜索引擎中的排名，直接减小网站的访问次数。在BigPipe 中，页面的内容都是动态添加的，因此可能会使搜索引擎没法识别。可是正如前面所说，在服务器端首先要根据user-agent 判断客户端是不是搜索引擎的爬虫，若是是的话，则转化为原有的模式，而不是动态添加。这样就解决了对搜索引擎的不友好。

5. 融合其余技术

除了使用BigPipe，Facebook的页面加载技术还融合了其余的页面优化技术，具体以下：

5.1 资源文件的G-zip压缩

这是很是重要的技术，使用G-zip 对css 和js 文件压缩可使大小减小70%，这是多么诱人的数字！在网络传输的文件中，主要就是样式表和脚本文件。如此能够大大减少传输的内容，使页面加载速度变得更快。具体实现能够借助服务器来进行，例如Apache，使用mod_deflate 模块来完成具体配置为： AddOutputFilterByType DEFLATE text/html text/css application/xjavascript

5.2 将js文件进行了精简

对js 文件进行精简，能够从代码中移除没必要要的字符，注释以及空行以减少js 文件的大小，从而改善加载的页面的时间。精简js 脚本的工具可使用JSMin，使用精简后的脚本的大小会减小20%左右。这也是一个很大的提高。

5.3 将css和js文件进行合并

这是前端优化的一项原则，将多个样式表和js 文件进行合并，这样的话，将会减小http 的请求个数。对于上亿用户的网站来讲，这也会带来性能的提高，大约会减小5%左右的时间损耗。

5.4 使用外部JS和CSS

一样是前端优化的一项原则。纯粹就速度来言，使用内联的js 和css 速度要更快，由于减小了http 请求。可是，使用外部的文件更有利于文件的复用，这与面向对象编程的概念很像。更为重要的是，虽然在第一次的加载速度慢一点，但css 文件和js脚本是能够被浏览器缓存。即以后用户的屡次访问中，使用外部的js 和css 将会将会更好的提高速度。

5.5 将样式表放在顶部

和上面内容类似，这也是一种规范，将html 内容所需的css 文件放在首部加载是很是重要的。若是放在页面尾部，虽然会使页面内容更快的加载（由于将加载css 文件的时间放在最后，从而使页面内容先显示出来），可是这样的内容是没有使用样式表的，在css 文件加载进来后，浏览器会对其使用样式表，即再次改变页面的内容和样式，称之为“无样式内容的闪烁”，这对于用户来讲固然是不友好的。实现的时候将css 文件放在<head>标签中便可。

5.6 将脚本放在底部实现“barrier”

支持页面动态内容的Js 脚本对于页面的加载并无什么做用，把它放在顶部加载只会使页面更慢的加载，这点和前面的提到的css 文件恰好相反，因此能够将它放在页尾加载。是用户能看到的页面内容先加载，js 文件最后加载，这样会使用户以为页面速度更快。Bigpipe实现一个“barrier”的概念，即当全部的pagelet的内容所有加载好了以后，浏览器再向服务器发送js 的http 请求。能够在BigPipe.js 中将全部的pagelet 所需的js文件的路径保存下来，在判断全部的内容加载完成后统一贯服务器发送请求。