jQuery 2.0.3 源码分析Sizzle引擎 - 解析原理

时间 2019-11-14

标签 jquery 2.0.3 源码分析 sizzle 引擎解析原理栏目 JQuery 繁體版

原文原文链接

声明：本文为原创文章，如需转载，请注明来源并保留原文连接Aaron，谢谢！node

先来回答博友的提问：jquery

如何解析浏览器

div > p + div.aaron input[type="checkbox"]

顺便在深刻理解下解析的原理：app

HTML结构ide

<div id="text">
  <p>
     <input type="text" />
  </p>
  <div class="aaron">
     <input type="checkbox" name="readme" value="Submit" />
     <p>Sizzle</p>
  </div>
</div>

选择器语句函数

div > p + div.aaron input[type="checkbox"]

组合后的意思大概就是：优化

1. 选择父元素为 <div> 元素的全部子元素 <p> 元素spa

2. 选择紧接在 <p> 元素以后的全部 <div> 而且class="aaron " 的全部元素code

3. 以后选择 div.aaron 元素内部的全部 input而且带有 type="checkbox" 的元素对象

就针对这个简单的结构，咱们实际中是不可能这么写的，可是这里我用简单的结构，描述出复杂的处理

咱们用组合语句，jquery中,在高级浏览器上都是用过querySelectorAll处理的,因此咱们讨论的都是在低版本上的实现，伪类选择器,XML 要放到后最后，本文暂不涉及这方便的处理.

须要用到的几个知识点:

1: CSS选择器的位置关系

2: CSS的浏览器实现的基本接口

3: CSS选择器从右到左扫描匹配

CSS选择器的位置关系

文档中的全部节点之间都存在这样或者那样的关系

其实不难发现，一个节点跟另外一个节点有如下几种关系：

祖宗和后代

父亲和儿子

临近兄弟

普通兄弟

在CSS选择器里边分别是用：空格；>；+；~

（其实还有一种关系：div.aaron，中间没有空格表示了选取一个class为aaron的div节点）

<div id="grandfather">
  <div id="father">
    <div id="child1"></div>
    <div id="child2"></div>
    <div id="child3"></div>
  </div>
</div>

爷爷grandfather与孙子child1属于祖宗与后代关系（空格表达）
父亲father与儿子child1属于父子关系，也算是祖先与后代关系（>表达）
哥哥child1与弟弟child2属于临近兄弟关系（+表达）
哥哥child1与弟弟child2,弟弟child3都属于普通兄弟关系（~表达）

在Sizzle里有一个对象是记录跟选择器相关的属性以及操做：Expr。它有如下属性：

relative = {
  ">": { dir: "parentNode", first: true },
  " ": { dir: "parentNode" },
  "+": { dir: "previousSibling", first: true },
  "~": { dir: "previousSibling" }
}

因此在Expr.relative里边定义了一个first属性，用来标识两个节点的“紧密”程度，例如父子关系和临近兄弟关系就是紧密的。在建立位置匹配器时，会根据first属性来匹配合适的节点。

CSS的浏览器实现的基本接口

除去querySelector,querySelectorAll

HTML文档一共有这么四个API：

getElementById，上下文只能是HTML文档。
getElementsByName，上下文只能是HTML文档。
getElementsByTagName，上下文能够是HTML文档，XML文档及元素节点。
getElementsByClassName，上下文能够是HTML文档及元素节点。IE8尚未支持。

因此要兼容的话sizzle最终只会有三种彻底靠谱的可用

Expr.find = {
      'ID'    : context.getElementById,
      'CLASS' : context.getElementsByClassName,
      'TAG'   : context.getElementsByTagName
}

CSS选择器从右到左扫描匹配

接下咱们就开始分析解析规则了

1. 选择器语句

div > p + div.aaron input[type="checkbox"]

2. 开始经过词法分析器tokenize分解对应的规则（这个上一章具体分析过了）

分解每个小块
type: "TAG"
value: "div" 
matches ....

type: ">"
value: " > "

type: "TAG"
value: "p"
matches ....

type: "+"
value: " + "

type: "TAG"
value: "div"
matches ....

type: "CLASS"
value: ".aaron"
matches ....

type: " "
value: " "

type: "TAG"
value: "input"
matches ....

type: "ATTR"
value: "[type="checkbox"]"
matches ....

除去关系选择器，其他的有语意的标签都都对应这分析出matches

好比
最后一个属性选择器分支
"[type="checkbox"]"

matches = [
   0: "type"
   1: "="
   2: "checkbox"
]
type: "ATTR" 
value: "[type="checkbox"]"

因此就分解出了9个部分了

那么如何匹配才是最有效的方式？

3. 从右往左匹配

最终仍是经过浏览器提供的API实现的，因此Expr.find就是最终的实现接口了

首先肯定的确定是从右边往左边匹配，可是右边第一个是

"[type="checkbox"]"

很明显Expr.find 中不认识这种选择器，因此只能在往前扒一个

趴到了

type: "TAG"
value: "input"

这种标签Expr.find能匹配到了，因此直接调用

Expr.find["TAG"] = support.getElementsByTagName ?
    function(tag, context) {
        if (typeof context.getElementsByTagName !== strundefined) {
            return context.getElementsByTagName(tag);
        }
} :

可是getElementsByTagName方法返回的是一个合集

因此

这里引入了seed - 种子合集（搜索器搜到符合条件的标签），放入到这个初始集合seed中

OK了这里暂停了，不在往下匹配了，在用这样的方式往下匹配效率就慢了

开始整理：

重组一下选择器，剔掉已经在用于处理的tag标签,input

因此选择器变成了：

selector: "div > p + div.aaron [type="checkbox"]"

这里能够优化下，若是直接剔除后，为空了，就证实知足了匹配要求，直接返回结果了

到这一步为止

咱们可以使用的东东：

1 seed合集

2 经过tokenize分析解析规则组成match合集

原本是9个规则快，由于匹配input，因此要对应的也要踢掉一个因此就是8个了

3 选择器语句,对应的踢掉了input

"div > p + div.aaron [type="checkbox"]"

此时send目标合集有2个最终元素了

那么如何用最简单，最有效率的方式从2个条件中找到目标呢？

涉及的源码：

//引擎的主要入口函数
    function select(selector, context, results, seed) {
        var i, tokens, token, type, find,
            //解析出词法格式
            match = tokenize(selector);

        if (!seed) { //若是外界没有指定初始集合seed了。
            // Try to minimize operations if there is only one group
            // 没有多组的状况下
            // 若是只是单个选择器的状况，也便是没有逗号的状况：div, p，能够特殊优化一下
            if (match.length === 1) {

                // Take a shortcut and set the context if the root selector is an ID
                tokens = match[0] = match[0].slice(0); //取出选择器Token序列

                //若是第一个是selector是id咱们能够设置context快速查找
                if (tokens.length > 2 && (token = tokens[0]).type === "ID" &&
                    support.getById && context.nodeType === 9 && documentIsHTML &&
                    Expr.relative[tokens[1].type]) {

                    context = (Expr.find["ID"](token.matches[0].replace(runescape, funescape), context) || [])[0];
                    if (!context) {
                        //若是context这个元素（selector第一个id选择器）都不存在就不用查找了
                        return results;
                    }
                    //去掉第一个id选择器
                    selector = selector.slice(tokens.shift().value.length);
                }

                // Fetch a seed set for right-to-left matching
                //其中： "needsContext"= new RegExp( "^" + whitespace + "*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\(" + whitespace + "*((?:-\\d)?\\d*)" + whitespace + "*\\)|)(?=[^-]|$)", "i" )
                //便是表示若是没有一些结构伪类，这些是须要用另外一种方式过滤，在以后文章再详细剖析。
                //那么就从最后一条规则开始，先找出seed集合
                i = matchExpr["needsContext"].test(selector) ? 0 : tokens.length;

                //从右向左边查询
                while (i--) { //从后开始向前找！
                    token = tokens[i]; //找到后边的规则

                    // Abort if we hit a combinator
                    // 若是遇到了关系选择器停止
                    //
                    //  > + ~ 空
                    //
                    if (Expr.relative[(type = token.type)]) {
                        break;
                    }

                    /*
                  先看看有没有搜索器find，搜索器就是浏览器一些原生的取DOM接口，简单的表述就是如下对象了
                  Expr.find = {
                    'ID'    : context.getElementById,
                    'CLASS' : context.getElementsByClassName,
                    'NAME'  : context.getElementsByName,
                    'TAG'   : context.getElementsByTagName
                  }
                */
                    //若是是:first-child这类伪类就没有对应的搜索器了，此时会向前提取前一条规则token
                    if ((find = Expr.find[type])) {

                        // Search, expanding context for leading sibling combinators
                        // 尝试一下可否经过这个搜索器搜到符合条件的初始集合seed
                        if ((seed = find(
                            token.matches[0].replace(runescape, funescape),
                            rsibling.test(tokens[0].type) && context.parentNode || context
                        ))) {

                            //若是真的搜到了
                            // If seed is empty or no tokens remain, we can return early
                            //把最后一条规则去除掉
                            tokens.splice(i, 1);
                            selector = seed.length && toSelector(tokens);

                            //看看当前剩余的选择器是否为空
                            if (!selector) {
                                //是的话，提早返回结果了。
                                push.apply(results, seed);
                                return results;
                            }

                            //已经找到了符合条件的seed集合，此时前边还有其余规则，跳出去
                            break;
                        }
                    }
                }
            }
        }


        // "div > p + div.aaron [type="checkbox"]"

        // Compile and execute a filtering function
        // Provide `match` to avoid retokenization if we modified the selector above
        // 交由compile来生成一个称为终极匹配器
        // 经过这个匹配器过滤seed，把符合条件的结果放到results里边
        //
        //    //生成编译函数
        //  var superMatcher =   compile( selector, match )
        //
        //  //执行
        //    superMatcher(seed,context,!documentIsHTML,results,rsibling.test( selector ))
        //
        compile(selector, match)(
            seed,
            context, !documentIsHTML,
            results,
            rsibling.test(selector)
        );
        return results;
    }

这个过程在简单总结一下：

selector："div > p + div.aaron input[type="checkbox"]"

解析规则：
1 按照从右到左
2 取出最后一个token  好比[type="checkbox"]
                            {
                                matches : Array[3]
                                type    : "ATTR"
                                value   : "[type="
                                checkbox "]"
                            }
3 过滤类型 若是type是 > + ~ 空 四种关系选择器中的一种，则跳过，在继续过滤
4 直到匹配到为 ID,CLASS,TAG  中一种 , 由于这样才能经过浏览器的接口索取
5 此时seed种子合集中就有值了,这样把刷选的条件给缩的很小了
6 若是匹配的seed的合集有多个就须要进一步的过滤了,修正选择器 selector: "div > p + div.aaron [type="checkbox"]"
7 OK,跳到一下阶段的编译函数

Sizzle不只仅是简简单单的从右往左匹配的

Sizzle1.8开始引入编译函数的概念，也是下一章的重点