JavaScript引用类型——“RegExp类型”的注意要点

时间 2019-12-07

标签 javascript 引用类型 regexp 注意要点栏目 JavaScript 繁體版

原文原文链接

EegExp 类型

ECMAScript 经过RegExp 类型来支持正则表达式。语法以下：html

var expression = / pattern / flags;

每一个正则表达式均可带有一或多个标志（flags），正则表达式的匹配模式支持下列3 个标志。正则表达式

g：表示全局（global）模式，该模式将被应用于全部字符串，而非在发现第一个匹配项时当即中止；express
i：表示不区分大小写（case-insensitive）模式，该模式在肯定匹配项时忽略模式与字符串的大小写；数组
m：表示多行（multiline）模式，在到达一行文本末尾时还会继续查找下一行中是否存在与模式匹配的项；函数

如：测试

var pattern1 = /at/g; //匹配字符串中全部“at”的实例
var pattern2 = /[bc]at/i; //匹配字符串第一个“bat”或“cat”，不区分大小写
var pattern3 = /.at/gi; //匹配字符串中全部以“at”结尾的3个字符串的组合，不区分大小写this

与其余语言中的正则表达式相似，模式中使用的全部元字符都必须转义。正则表达式中的元字符包括：设计

( [ { \ ^ $ | ? * + . ] )

如：code

var pattern1 = /\[bc\]at/i; //匹配第一个“[bc]at”，不区分大小写；
var pattern2 = /\.at/gi; //匹配全部“.at”，不区分大小写；

另外，还可使用RegExp 构造函数，它接收两个参数：一个是要匹配的字符串模式，另外一个是可选的标志字符串。如：htm

var pattern1 = new RegExp("[bc]at","i");

因为RegExp 构造函数的模式参数是字符串，因此在某些状况下要对字符串进行双重转义。全部元字符串都必须双重转义，如\n 一般被转义为\\n，而在正则表达式中就会变成\\\n。如：

/\[bc\]at/             => \\[bc\\]at
/\.at/                   => \\.at
/name\/age/            => name\\/age
/\d.\d{1,2}/        => \\d.\\d{1,2}
/\w\\hello\\123/    => \\w\\\\hello\\\\123

正则表达式字面两始终会共享同一个RegExp 实例，而使用构造函数建立的每个新RegExp 实例都是一个新实例。如：

var re = null,i;

for (var i = 0; i < 10; i ++){
    re = /cat/g;
    re.test("catastrophe");
}

for (var i = 0; i < 10; i ++){
    re = new RegExp("cat","g");
    re.test("catastrophe");
}

对于第一个，因为会测试到字符串末尾，因此下次再调用test()就要从头开始。而第二个循环使用RegExp 构造函数在每次循环冲建立正则表达式。由于媒体迭代都会建立一个新的RegExp 实例，因此每次调用text()都会返回true。

RegExp 实例属性

global：布尔值，是否设置了g；
ignoreCase：布尔值，是否设置了i；
multiline：布尔值，是否设置了m；
lastIndex：整数，开始搜索下一个匹配项的字符位置，从0 开始算起；
source：正则表达式的字符串表示，按照字面量形式返回；

如：

var pattern = new RegExp("\\[bc\\]at","i");
document.write(pattern.global); //false
document.write(pattern.ignoreCase); //true
document.write(pattern.multiline); //false
document.write(pattern.lastIndex); //0
document.write(pattern.source); //\[bc\]at

注意最后一个，source 属性保存的是规范形式的字符串，就是字面量形式所用的字符串。

RegExp 实例方法

主要有两个方法，一个是exec()方法，一个是test()方法。

exec()方法是专门为捕获组而设计的。接收一个字符串参数，而后返回包含第一个匹配项信息的数组；或者null；返回的数组还额外包含两个属性：index 和input。在数组中，第一项是与整个模式匹配的字符串，其余项是与模式中的不活组匹配的字符串。如：

var text = "mom and dad and baby";
var pattern = /mom( and dad( and baby)?)?/gi;

var matches = pattern.exec(text);
console.log(matches.index);
console.log(matches.input);
console.log(matches[0]);
console.log(matches[1]);
console.log(matches[2]);

/*
[Log] 0 (repetition.html, line 33)
[Log] mom and dad and baby (repetition.html, line 34)
[Log] mom and dad and baby (repetition.html, line 35)
[Log]  and dad and baby (repetition.html, line 36)
[Log]  and baby (repetition.html, line 37)
*/

由于整个字符串自己与模式匹配，因此返回的数组matches 的index 为0；数组中的第一项是匹配的整个字符串，第二项包含与第一个不活租匹配的内容，第三项包含与第二个捕获组匹配的内容。

第一个例子，这是一个全局模式：

var text = "this is a Global setting not a global function";
var pattern = /global/gi;

var matches = pattern.exec(text);
console.log(matches); //["Global"]
console.log(matches.index); //10
console.log(matches.input); //this is a Global setting not a global function
console.log(matches[1]); //undefined 这里没有捕获组
console.log(matches[0]); //Global
console.log(pattern.lastIndex); //16

matches = pattern.exec(text);
console.log(matches); //["global"]再次调用该exec()则继续查找新的匹配项
console.log(matches.index); //31
console.log(pattern.lastIndex); //37

第二个例子，这不是一个全局模式：

var text = "this is a Global setting not a global function";
var pattern = /global/i;

var matches = pattern.exec(text);
console.log(matches); //["Global"]
console.log(matches.index); //10
console.log(pattern.lastIndex); //0

matches = pattern.exec(text);
console.log(matches); //["Global"] 这里仍然是Global，说明非全局模式会从头开始搜索。
console.log(matches.index); //10
console.log(pattern.lastIndex); //0

全局模式，每次调用exec()都会返回字符串中的下一个匹配项；而非全局模式，每次调用exec()返回的都是第一个匹配项。

test()方法则是接收一个字符串参数。在模式与该参数匹配的状况下返回true；不然返回false。一般在只想知道目标字符串与某个模式是否匹配，但不须要知道文本内容的状况下，使用这个方法很是方便。

var text = "testing!";
var pattern = /est/gi;

if (pattern.test(text)){
    document.write("matched")
}else{
    document.write("not matched")
}

RegExp 构造函数属性

这些属性有一个长属性名也有一个短属性名。最经常使用的有两个：

leftContext（$` input 字符串中lastMatch 以前的文本）；
rightContext ($' input 字符串中lastMatch 以后的文本)；

其余几个属性Opera 和IE 对此兼容很差。有：

input ($_ 最近一次要匹配的字符串)；
lastMatch ($& 最近的一次匹配项)；
lastParen ($+ 最近一次的捕获组)；
multiline ($* 返回布尔值，表示是否全部表达式都使用多行模式)；

如：

var text = "hello there";
var pattern = / /gi;
if(pattern.exec(text)){
    document.write("targeted" + "<br/>");
    document.write(RegExp.leftContext);
    document.write(RegExp.rightContext);
}else{
    document.write("missed" + "<br/>");
}

又如：

var text = "hello there";
var pattern = / /gi;
if(pattern.exec(text)){
    document.write("targeted" + "<br/>");
    document.write(RegExp.input); //hello there
    document.write(RegExp.multiline); //false
}else{
    document.write("missed" + "<br/>");
}

由于短属性名不是有效的标识符，所以必须经过方括号语法来访问它们。如RegExp["$'"]

另外，还有9 个用于存储捕获组的构造函数属性。语法是RegExp.$1`RegExp.$2`等等。如：

var text = "this has been a short summer";
var pattern = /(..)or(.)/g;

if (pattern.test(text)){
    document.write(RegExp.$1);
    document.write(RegExp.$2);
}

模式的缺陷

具体访问模式的局限