进阶的正则表达式

“若是你有一个问题想到能够用正则来解决，那么你如今有两个问题了。” 🤷‍♀️html

青铜-正则基础

正则表达式是用于匹配字符串中字符组合的模式。git

建立正则表达式

使用正则表达式字面量建立 /ab+c/g
调用 RegExp 对象的构造函数建立 new RegExp("ab+c","g")
- 接收两个参数，第一个参数是字符串或正则表达式，第二个参数是修饰符(flag)
- 若是第一个参数是正则表达式，那么只使用会使用第二个参数的修饰符，而忽略原有正则表达式的修饰符（ES6 扩展）

RegExp 对象

1. 实例属性和方法

RegExp.prototype.exec(str)
RegExp.prototype.test(str)
RegExp.prototype.flags (ES6) 返回正则表达式的修饰符
RegExp.prototype.sticky (ES6) 表示是否设置了 y 修饰符 ...

2. 静态属性

RegExp.lastIndex

3. 字符串对象

有 6 个方法可使用正则表达式es6

str.search(regexp)正则表达式
str.match(regexp) 返回一个数组数组
str.replace(regexp|substr, newSubStr|function)markdown
str.split(separator) 分隔符 separator 包括 str|regexpapp
str.matchAll() - ES2020 新增ide
str.replaceAll() - ES2020 新增函数

4. RegExp.prototype.test(str) 和 String.prototype.search(regexp)

test() 判断正则表达式与指定的字符串是否匹配，返回 true 或 false。
相似于 String 的 search() 方法，返回匹配的索引，不然返回-1

let str = "hello world!";
/world/.test(str); // true

let str = "hello world!";
str.search(/world/); // 6
复制代码

若想知道更多返回信息（然而执行比较慢），可以使用 exec() 方法 ⬇️ ⬇️ ⬇️oop

5. RegExp.prototype.exec(str) 和 String.prototype.match(regexp)

exec() 在指定字符串中搜索匹配。匹配成功返回一个数组，并更新正则表达式对象的 lastIndex 属性
- 数组包括：第一项是匹配成功的文本、第二项起是相关的捕获组内容、以及其余属性（index 匹配到的索引值、input 原始字符串、groups 命名捕获组）
match() 也是返回一个数组，包括第一个完整匹配，及其相关的捕获组（返回结果与 exec() 方法相同）
当 match() 方法使用 g 标志，会返回匹配的全部结果

// 返回结果相同
let str = "hello world world!";
/world/.exec(str);

let str = "hello world world!";
str.match(/world/);

// 会返回匹配的全部结果
let str = "hello world world!";
str.match(/world/g);
复制代码

编写一个正则表达式

正则表达式由简单字符 + 特殊字符组成

1. 6 个可选标识 (flags)

正则表达式有六个可选参数 (flags) 容许全局和不分大小写搜索等

g 全局搜索 global
i 不区分大小写搜索 ignorecase
m 多行搜索 multiline
s 容许 . 匹配换行符 (ES2018)
u unicode 模式匹配（ES6）
y 执行“粘性(sticky)”搜索,匹配从目标字符串的当前位置开始（ES6）

语法：

let regExp = /pattern/flags;
let regExp = new RegExp("pattern", "flags");

let str = "Hello World!";
/world/i.test(str); // true
复制代码

2. 特殊字符

在正则表达式中具备特殊意义的专用字符，能够分为：特殊字符、量词、范围/组、断言、Unicode 属性转义。

1. 特殊单字符

. 匹配任意字符（除换行符外）
\d 匹配数字 digit => [0-9]
\D 匹配非数字 => [^0-9]
\w 匹配一个字符 word（包括字母数字下划线） => [A-Za-z0-9_]
\W 匹配非字符 => [^a-za-z0-9_]
\s 匹配空白字符 space，包括空格、制表符、换页符和换行符
\S 匹配非空白字符
\b 匹配一个单词的边界 boundary
\r 匹配回车符
\n 匹配换行符
\uhhhh 匹配十六进制数表示的 Unicode 字符
\u{hhhh} 匹配十六进制数表示的 Unicode 字符（ES6 新增写法，须要设置 u 标志）

let str = "He played the King in a8 and she moved her Queen in c2.";
str.match(/\w\d/g); // ["a8","c2"]
复制代码

// 匹配 Unicode 字符
let str = "happy 🙂, confused 😕, sad 😢";
let reg = /[\u{1F600}-\u{1F64F}]/gu;
str.match(reg); // ['🙂', '😕', '😢']
复制代码

// 匹配中文字符 [\u4e00-\u9fa5]
let str = "123我是456中文";
let reg = /[\u4e00-\u9fa5]/g;
str.match(reg); // ["我", "是", "中", "文"]
复制代码

2. 量词

* 匹配 0 次以上（0+ 即有没有都行） => {0,}
+ 匹配 1 次以上（1+ 即至少一次）=> {1,}
? 匹配 0 或 1 次（可选，可能有可能没有，有点像 TS 的可选）=> {0,1}
{n} 匹配字符恰好出现 n 次
{n,m} 至少 n 次，最多 m 次
{n,} 至少出现了 n 次

// 匹配规则：一个或多个字符 和 一个空格，全局匹配，忽略大小写
let re = /\w+\s/gi;
"fee fi fo fum".match(re); // ["fee ", "fi ", "fo "]
复制代码

3. 范围 Range / 组 group

[xyz] 字符集合，匹配方括号中的任意字符，破折号(-)能够指定范围
[^xyz] 反向字符集，匹配任何没有包含在方括号中的字符
x|y 匹配 x 或 y

let str = "The Caterpillar and Alice looked at each other";
let reg = /\b[a-df-z]+\b/gi;
str.match(reg);  // ["and", "at"]
复制代码

(x) 1. 分组 2. 捕获，匹配 x 并记住匹配项，后续经过 \n 来引用第 n 个捕获的组，替换时使用 $n 来指代。
(?:x) 非捕获括号，匹配的子字符串不会被记住，能够节省性能

let reg = /(apple) (banana) \1 \2/;
"apple banana apple banana apple banana".match(reg);
复制代码

let reg = /(\w+)\s(\w+)/;
let str = "John Smith";
str.replace(reg, "$2 $1"); // "Smith, John"
复制代码

4. 断言-主要是对边界的判断

^ 匹配输入的开始（注意：字符集合[^xyz]中表示反向）
$ 匹配输入的结束
\b 匹配一个单词的边界
x(?=y) 先行断言，匹配 x (仅当后面为 y) 如： /Jack(?=Sparrow)/ 匹配 Jack
(?<=y)x 后行断言（ES2018），匹配 x (仅当前面为 y) /(?<=Jack)Sparrow/ 匹配 Sparrow

let str = "https://xxx.xx.com/#/index?type=xx&value=xxx";
let reg = /(?<=\?).+/g;
str.match(reg); // ['type=xx&value=xxx']

// 条件过滤
let oranges = ["ripe orange A", "green orange B", "ripe orange C"];
oranges.filter((item) => item.match(/(?<=ripe )orange/)); //  ["ripe orange A", "ripe orange C"]
复制代码

x(?!y) 先行否认断言，匹配 x (仅当后面不为 y) /Jack(?!Sparrow)/
(?<!y)x 后行否认断言（ES2018），匹配 x (仅当前面不为 y) /(?<!Jack)Sparrow/

白银-正则进阶

下面主要是一些（ES6 新增）修饰符与对应的属性

g 修饰符与 lastIndex 属性

lastIndex 用来指定“下一次匹配的起始索引”，须要设置 g 标志才生效
由于在设置了 g 标志位的状况下，RegExp 对象是有状态的，会将上次成功匹配后的位置记录在 lastIndex 属性中
使用 exec() / test() 方法匹配成功后，会更新正则对象的 lastIndex 属性，匹配失败 lastIndex 重置为 0

let regExp = /ab*/g;
regExp.exec("abbcdefabh"); // ['abb',index:0]
regExp.lastIndex; // 3
// 继续匹配
regExp.exec("abbcdefabh"); // ['ab',index:7]
regExp.lastIndex; // 9
// 再继续匹配
regExp.exec("abbcdefabh"); // null
regExp.lastIndex; // 0
复制代码

有了上述特性，exec() / test () 方法可对字符串进行循环匹配(查找出全部匹配)

let reg = /ab*/g;
let str = "abbcdefabh";
let arr = [];
while ((arr = reg.exec(str)) !== null) {
  console.log(arr, reg.lastIndex);
}
// 对比 match ，只会返回匹配到的结果
str.match(reg); // ['abb','ab']
复制代码

y 修饰符与 sticky 属性（ES6）

y 也叫作“粘连”修饰符，也是全局匹配
与 g 修饰符区别是，g 修饰符只要剩余位置中存在匹配就可，而 y 修饰符确保“匹配必须从剩余的第一个位置开始”，即粘连。

let regExp = /ab*/y;
regExp.exec("abbcdefabh"); // ['abb',index:0]
regExp.lastIndex; // 3
// 继续匹配
regExp.exec("abbcdefabh"); //null
regExp.lastIndex; // 0

regExp.sticky; // true 表示设置了y修饰符
复制代码

理解： y 修饰符号隐含了头部匹配的标志。y 修饰符的设计本意，就是让头部匹配的标志^在全局匹配中都有效。

u 修饰符与 unicode 属性（ES6）

u 修饰符用来匹配大于 \uFFFF 的 Unicode 字符（ES6）（\uhhhh 匹配十六进制数表示的 Unicode 字符）
unicode 属性，表示是否设置了 u 修饰符

/^\uD83D/.test('\uD83D\uDC2A') // true "\uD83D\uDC2A"表明一个字符
/^\uD83D/u.test('\uD83D\uDC2A') // false

let  r = /hello/u;
r.unicode; // true
复制代码

s 修饰符与 dotAll 属性（ES6）

ES5 中 . 匹配任意字符（除换行符外）
ES2018 新增 s 修饰符，使得 . 能够匹配任意单个字符，称为 dotAll 模式。

/foo.bar/.test("foo\nbar"); // false
// ES2018
/foo.bar/s.test("foo\nbar"); // true
/foo.bar/s.dotAll; // true
复制代码

黄金-正则深刻

具名组匹配

1. 组匹配

// exec() 返回数组的第一项是匹配成功的文本，从第二项起，每项都对应“捕获括号”里匹配成功的文本
let regex = /(\d{4})-(\d{2})-(\d{2})/;
regex.exec("1999-12-31"); // ["1999-12-31", "1999", "12", "31", index: 0，groups: undefined]
复制代码

每一组的匹配含义不容易看出来，并且只能用数字序号引用 \n

2. 具名组匹配 (ES2018)

容许为每个组匹配指定一个名字，既便于阅读代码，又便于引用。

语法： /?<组名字>(x)/

let regex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
regex.exec("1999-12-31");
// ["1999-12-31", "1999", "12", "31", index: 0，groups: {day: "31",month: "12",year: "1999"}]
复制代码

3. 解构赋值

将匹配结果返回的数组直接解构

let {
  groups: { one, two },
} = /^(?<one>.*):(?<two>.*)$/u.exec("foo:bar");
one; // foo
two; // bar
复制代码

4. 替换

替换时，用 %<组名字> 引用具名组

let re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u;

"2015-01-02".replace(re, "$<day>/$<month>/$<year>");
// '02/01/2015'
复制代码

字符串(新增)方法

String.prototype.matchAll(regexp) （ES2020）

matchAll() 方法能够一次性取出全部匹配，且包含捕获组。返回的是一个遍历器（Iterator）
正则表达式必须设置全局模式 g ，不然会抛出异常 TypeError

在 matchAll 出现以前，经过在循环中调用 regexp.exec() 来获取全部匹配项信息若是使用 matchAll ，就能够没必要使用 while 循环加 exec 方式了

let regexp = /t(e)(st(\d?))/g;
let str = "test1test2";

// match 方式匹配
str.match(regexp); // ['test1', 'test2']

// exec 方式匹配
regexp.exec(str); //  ["test1", "e", "st1", "1", index: 0 ]

// matchAll 方式匹配，能够更好地获取捕获组
[...str.matchAll(regexp)]; // [Array(4), Array(4)]
复制代码

String.prototype.replace(regexp|substr, newSubStr|function)

当第一个参数为正则表达式，第二个参数为函数时：

str.replace(regexp, function)
function 参数以下，也是返回一个新字符串，来替换 regexp 匹配到的结果

let re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u;
"2015-01-02".replace(
  re,
  ( matched, // 匹配结果 capture1, // 匹配组1(必须对应上) capture2, // 匹配组2 capture3, // 匹配组3 index, // index input, // input groups // 具名组 ) => {
    console.log(matched, capture1, capture2, capture3, index, input, groups);
    let { day, month, year } = groups;
    return `${day}/${month}/${year}`;
  }
); // "02/01/2015"
复制代码

String.prototype.replaceAll(regexp|substr, newSubstr|function) (ES2021)

能够一次性替换全部匹配
当第一个参数为正则表达式（必须带 g 修饰符），第二个参数为函数时 function 参数与 replace 用法相同

参考

正则表达式

regexper

c.runoob.com/front-end/8…

阮一峰-正则的扩展

Unicode 编码在线转换

Unicode 与 JavaScript 详解