精读《正则 ES2018》

时间 2019-11-21

标签精读正则 ES2018 栏目正则表达式繁體版

原文原文链接

1. 引言

本周精读的文章是 regexp-features-regular-expressions。html

这篇文章介绍了 ES2018 正则支持的几个重要特性：前端

Lookbehind assertions - 后行断言
Named capture groups - 命名捕获组
s (dotAll) Flag - . 匹配任意字符
Unicode property escapes - Unicode 属性转义

2. 概述

还在用下标匹配内容吗？匹配任意字符只有 [\w\W] 吗？如今正则有更简化的写法了，事实上正则正在变得更加易用，是时候更新对正则的认知了。git

2.1. Lookbehind assertions

完整的断言定义分为：正/负向断言与先/后行断言的笛卡尔积组合，在 ES2018 以前仅支持先行断言，如今终于支持了后行断言。es6

解释一下这四种断言：github

正向先行断言 (?=...) 表示以后的字符串能匹配 pattern。正则表达式

const re = /Item(?= 10)/;

console.log(re.exec("Item"));
// → null

console.log(re.exec("Item5"));
// → null

console.log(re.exec("Item 5"));
// → null

console.log(re.exec("Item 10"));
// → ["Item", index: 0, input: "Item 10", groups: undefined]

负向先行断言 (?!...) 表示以后的字符串不能匹配 pattern。typescript

const re = /Red(?!head)/;

console.log(re.exec("Redhead"));
// → null

console.log(re.exec("Redberry"));
// → ["Red", index: 0, input: "Redberry", groups: undefined]

console.log(re.exec("Redjay"));
// → ["Red", index: 0, input: "Redjay", groups: undefined]

console.log(re.exec("Red"));
// → ["Red", index: 0, input: "Red", groups: undefined]

在 ES2018 后，又支持了两种新的断言方式：express

正向后行断言 (?<=...) 表示以前的字符串能匹配 pattern。后端

先行时字符串放前面，pattern 放后面；后行时字符串放后端，pattern 放前面。先行匹配以什么结尾，后行匹配以什么开头。函数

const re = /(?<=€)\d+(\.\d*)?/;

console.log(re.exec("199"));
// → null

console.log(re.exec("$199"));
// → null

console.log(re.exec("€199"));
// → ["199", undefined, index: 1, input: "€199", groups: undefined]

负向后行断言 (?<!...) 表示以前的字符串不能匹配 pattern。

注：下面的例子表示 meters 以前 不能匹配 三个数字。

const re = /(?<!\d{3}) meters/;

console.log(re.exec("10 meters"));
// → [" meters", index: 2, input: "10 meters", groups: undefined]

console.log(re.exec("100 meters"));
// → null

文中给了一个稍复杂的例子，结合了正向后行断言与负向后行断言：

注：下面的例子表示 meters 以前 能匹配 两个数字，且以前 不能匹配 数字 35.

const re = /(?<=\d{2})(?<!35) meters/;

console.log(re.exec("35 meters"));
// → null

console.log(re.exec("meters"));
// → null

console.log(re.exec("4 meters"));
// → null

console.log(re.exec("14 meters"));
// → ["meters", index: 2, input: "14 meters", groups: undefined]

2.2. Named Capture Groups

命名捕获组能够给正则捕获的内容命名，比起下标来讲更可读。

其语法是 ?<name>：

const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const [match, year, month, day] = re.exec("2020-03-04");

console.log(match); // → 2020-03-04
console.log(year); // → 2020
console.log(month); // → 03
console.log(day); // → 04

也能够在正则表达式中，经过下标 \1 直接使用以前的捕获组，好比：

解释一下，\1 表明 (\w\w) 匹配的内容而非 (\w\w) 自己，因此当 (\w\w) 匹配了 'ab' 后，\1 表示的就是对 'ab' 的匹配了。

console.log(/(\w\w)\1/.test("abab")); // → true

// if the last two letters are not the same
// as the first two, the match will fail
console.log(/(\w\w)\1/.test("abcd")); // → false

对于命名捕获组，能够经过 \k<name> 的语法访问，而不须要经过 \1 这种下标：

下标和命名能够同时使用。

const re = /\b(?<dup>\w+)\s+\k<dup>\b/;

const match = re.exec("I'm not lazy, I'm on on energy saving mode");

console.log(match.index); // → 18
console.log(match[0]); // → on on

2.3. s (dotAll) Flag

虽然正则中 . 能够匹配任何字符，但却没法匹配换行符。所以聪明的开发者们用 [\w\W] 巧妙的解决了这个问题。

然而这终究是个设计缺陷，在 ES2018 支持了 /s 模式，这个模式下，. 等价于 [\w\W]：

console.log(/./s.test("\n")); // → true
console.log(/./s.test("\r")); // → true

2.4. Unicode Property Escapes

正则支持了更强大的 Unicode 匹配方式。在 /u 模式下，能够用 \p{Number} 匹配全部数字：

u 修饰符能够识别全部大于 0xFFFF 的 Unicode 字符。

const regex = /^\p{Number}+$/u;
regex.test("²³¹¼½¾"); // true
regex.test("㉛㉜㉝"); // true
regex.test("ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫ"); // true

\p{Alphabetic} 能够匹配全部 Alphabetic 元素，包括汉字、字母等：

const str = "漢";

console.log(/\p{Alphabetic}/u.test(str)); // → true

// the \w shorthand cannot match 漢
console.log(/\w/u.test(str)); // → false

终于有简便的方式匹配汉字了。

2.5. 兼容表

能够到原文查看兼容表，整体上只有 Chrome 与 Safari 支持，Firefox 与 Edge 都不支持。因此大型项目使用要再等几年。

3. 精读

文中列举的四个新特性是 ES2018 加入到正则中的。但正如兼容表所示，这些特性基本还都不能用，因此不如咱们再温习一下 ES6 对正则的改进，找一找与 ES2018 正则变化的结合点。

3.1. RegExp 构造函数优化

当 RegExp 构造函数第一个参数是正则表达式时，容许指定第二个参数 - 修饰符（ES5 会报错）：

new RegExp(/book(?=s)/giu, "iu");

不痛不痒的优化，，毕竟大部分时间构造函数不会这么用。

3.2. 字符串的正则方法

将字符串的 match()、replace()、search、split 方法内部调用时都指向到 RegExp 的实例方法上，好比

String.prototype.match 指向 RegExp.prototype[Symbol.match]。

也就是正则表达式本来应该由正则实例触发，但如今却支持字符串直接调用（方便）。但执行时其实指向了正则实例对象，让逻辑更为统一。

举个例子：

"abc".match(/abc/g) /
  // 内部执行时，等价于
  abc /
  g[Symbol.match]("abc");

3.3. u 修饰符

概述中，Unicode Property Escapes 就是对 u 修饰符的加强，而 u 修饰符是在 ES6 中添加的。

u 修饰符的含义为 “Unicode 模式”，用来正确处理大于 \uFFFF 的 Unicode 字符。

同时 u 修饰符还会改变如下正则表达式的行为：

点字符本来支持单字符，但在 u 模式下，能够匹配大于 0xFFFF 的 Unicode 字符。
将 \u{61} 含义由匹配 61 个 u 改编为匹配 Unicode 编码为 61 号的字母 a。
能够正确识别非单字符 Unicode 字符的量词匹配。
\S 能够正确识别 Unicode 字符。
u 模式下，[a-z] 还能识别 Unicode 编码不一样，可是字型很近的字母，好比 \u212A 表示的另外一个 K。

基本上，在 u 修饰符模式下，全部 Unicode 字符均可以被正确解读，而在 ES2018，又新增了一些 u 模式的匹配集合来匹配一些常见的字符，好比 \p{Number} 来匹配 ¼。

3.4. y 修饰符

y 修饰符是 “粘连”（sticky）修饰符。

y 相似 g 修饰符，都是全局匹配，也就是从上次成功匹配位置开始，继续匹配。y 的区别是，必须是上一次匹配成功后的下一个位置就当即匹配才算成功。

好比：

/a+/g.exec("aaa_aa_a"); // ["aaa"]

3.5. flags

经过 flags 属性拿到修饰符：

const regex = /[a-z]*/gu;

regex.flags; // 'gu'

4. 总结

本周精读借着 regexp-features-regular-expressions 这篇文章，一块儿理解了 ES2018 添加的正则新特性，又顺藤摸瓜的整理了 ES6 对正则作的加强。

若是你擅长这种扩散式学习方式，不妨再进一步温习一下整个 ES6 引入的新特性，笔者强烈推荐阮一峰老师的 ECMAScript 6 入门一书。

ES2018 引入的特性还太新，单在对 ES6 特性的使用应该和对 ES3 同样熟练。

若是你身边的小伙伴还对 ES6 特性感到惊讶，请把这篇文章分享给他，防止退化为 “只剩项目经验的 JS 入门者”。

讨论地址是：精读《正则 ES2018》 · Issue #127 · dt-fe/weekly

若是你想参与讨论，请点击这里，每周都有新的主题，周末或周一发布。前端精读 - 帮你筛选靠谱的内容。