Java 正则表达式

时间 2019-11-17

标签 java 正则表达式栏目 Java 繁體版

原文原文链接

1.Java 正则表达式 java.util.regex

Matcher (匹配器类) 真正影响搜索的对象html

Pattern (模式类) 用来表达和陈述所要搜索模式对象java

用法一：git

  Pattern p = Pattern.compile("a*b");
  Matcher m = p.matcher("aaaaab");
  boolean b = m.matches();

用法二：正则表达式

 
 boolean b = Pattern.matches("a*b", "aaaaab");

正则表达式：
app

  x   The character x  字符X
  \\   The backslash character  反斜杠
  \t   The tab character ('\u0009') 制表符Tab
  \n   The newline (line feed) character ('\u000A') 换行符 
  \r   The carriage-return character ('\u000D')  回车符
  \f   The form-feed character ('\u000C') 换页符

  [abc]   a, b, or c (simple class)  匹配字符三者中的某一个
  [^abc]   Any character except a, b, or c (negation) 匹配的字符不包含abc这三个字符任意一个
  [a-zA-Z] a through z or A through Z, inclusive (range)匹配字符是小写字母a-z任意一个或者大写字母A-Z
            任意一个
  [a-d[m-p]]   a through d, or m through p: [a-dm-p] (union) 等价于[a-dm-p]
  [a-e&&[def]]   d, e, or f (intersection) 匹配 d，e，f 之中的一个 而且 在a-e的范围内
  [a-z&&[^bc]]  a through z, except for b and c: [ad-z] (subtraction) 等价 [ad-z]
  [a-z&&[^m-p]]  a through z, and not m through p: [a-lq-z](subtraction) 等价 [a-lq-z]

.  Any character (may or may not match line terminators) 任意字符 可能有或者没有
\d  A digit: [0-9] 数字0-9 
\D  A non-digit: [^0-9] 边界  不是数字
\s  A whitespace character: [ \t\n\x0B\f\r] 空白字符
\S  A non-whitespace character: [^\s] 不是空白字符
\w  A word character: [a-zA-Z_0-9]  单词 匹配a-zA-Z 或者_或者0-9
\W  A non-word character: [^\w] 不是单词

 Greedy 贪婪模式     最大匹配
 ^       The beginning of a line  如^abc 以字符串abc开头
 $       The end of a line  $abc  以字符串abc结尾
 X?      X, once or not at all X  出现1次或者不出现
 X*      X, zero or more times    出现0或者屡次
 X+      X, one or more times     出现1次或者屡次
 X{n}    X, exactly n times       出现n次
 X{n,}   X, at least n times      出现至少n
 X{n,m}  X, at least n but not more than m times 出现至少n次可是很少于m次

Reluctant 勉强  最小匹配 
X??	X, once or not at all
X*?	X, zero or more times
X+?	X, one or more times
X{n}?	X, exactly n times
X{n,}?	X, at least n times
X{n,m}?	X, at least n but not more than m times

Possessive 独占  彻底匹配
X?+	X, once or not at all
X*+	X, zero or more times
X++	X, one or more times
X{n}+	X, exactly n times
X{n,}+	X, at least n times
X{n,m}+	X, at least n but not more than m times

Special constructs (named-capturing and non-capturing)  名称捕获 、非捕获 
(?<name>X)	X, as a named-capturing group
(?:X)	X, as a non-capturing group
(?idmsuxU-idmsuxU) 	Nothing, but turns match flags i d m s u x U on - off
(?idmsux-idmsux:X)  	X, as a non-capturing group with the given flags i d m s u x on - off
(?=X)	X, via zero-width positive lookahead
(?!X)	X, via zero-width negative lookahead
(?<=X)	X, via zero-width positive lookbehind
(?<!X)	X, via zero-width negative lookbehind
(?>X)	X, as an independent, non-capturing group

\n	Whatever the nth capturing group matched
\k<name>	Whatever the named-capturing group "name" matched

根据 Java Language Specification 的要求，Java 源代码的字符串中的反斜线被解释为 Unicode 转义或其余字符转义。所以必须在字符串字面值中使用两个反斜线，表示正则表达式受到保护，不被 Java 字节码编译器解释。spa

匹配字符串\string
正常状况下 正则表达式为 \string 
在正则表达式中\ 用于引用转义构造，同时还用于引用其余将被解释为非转义构造的字符。
所以，表达式 \\ 与单个反斜线匹配，而 \{ 与左括号匹配。所以应该改成 \\string
在java 中 字符串\\string 应该定义的字符串常量为 \\\\string  这样编译器才能认为这个字符串合法。

   String num  = "\\string";
   System.out.println(num);
   Pattern pn = Pattern.compile("\\\\string");
   System.out.println(pn.toString());
   System.out.println(pn.matcher(num).matches());
   
   \string
   \\string
   true

String num  = "stresss";
System.out.println(num);
Pattern pn = Pattern.compile("s*tres{2,4}");
System.out.println(pn.toString());
System.out.println(pn.matcher(num).matches());
stresss
s*tres{2,4}
true
Groups

Group是指里用()括起来的，能被后面的表达式调用的正则表达式。Group 0 表示整个表达式，group 1表示第一个被括起来的group，以此类推。因此 A(B(C))D 里面有三个group：group 0是ABCD， group 1是BC，group 2是C。设计

你能够用下述Matcher方法来使用group：
code

public int groupCount( )返回matcher对象中的group的数目。不包括group0。
orm

public String group( ) 返回上次匹配操做(比方说find( ))的group 0(整个匹配)匹配的字符串
htm

public String group(int i)返回上次匹配操做的某个group匹配的字符串。若是匹配成功，可是没能找到group，则返回 null。

public int start(int group)返回上次匹配所找到的，group的开始位置。

public int end(int group)返回上次匹配所找到的，group的结束位置，最后一个字符的下标加一。

public Matcher appendReplacement(StringBuffer sb,String replacement)

实现非终端追加和替换步骤。

此方法执行如下操做：

替换字符串可能包含到之前匹配期间所捕获的子序列的引用：$g 每次出现时，都将被 group(g) 的计算结果替换。$ 以后的第一个数始终被视为组引用的一部分。若是后续的数能够造成合法组引用，则将被合并到 g 中。只有数字 '0' 到 '9' 被视为组引用的可能组件。例如，若是第二个组匹配字符串 "foo"，则传递替换字符串 "$2bar" 将致使 "foobar" 被追加到字符串缓冲区。可能将美圆符号 ($) 做为替换字符串中的字面值（经过前面使用一个反斜线 (\$)）包括进来。注意，在替换字符串中使用反斜线 (\) 和美圆符号 ($) 可能致使与做为字面值替换字符串时所产生的结果不一样。美圆符号可视为到如上所述已捕获子序列的引用，反斜线可用于转义替换字符串中的字面值字符。

此方法设计用于循环以及 appendTail 和 find 方法中。

例如，如下代码将 one dog two dogs in the yard 写入标准输出流中：

 Pattern p = Pattern.compile("cat");
 Matcher m = p.matcher("one cat two cats in the yard");
 StringBuffer sb = new StringBuffer();
 while (m.find()) {
     m.appendReplacement(sb, "dog");
 }

它从追加位置开始在输入序列读取字符，并将其追加到给定字符串缓冲区。在读取之前匹配以前的最后字符（即位于索引 start() - 1 处的字符）以后，它就会中止。它将给定替换字符串追加到字符串缓冲区。它将此匹配器的追加位置设置为最后匹配位置的索引加 1，即 end()。

 public boolean  matches() 是否匹配表达式
 public StringBuffer  appendTail(StringBuffer sb)  添加最后未匹配上的末尾
 public int       start()  返回匹配成功的字符串的起始索引
 public int       end()   返回匹配成功的字符串的结束位置
 public boolean   find()  从上一次匹配成功以后开始查询匹配的字符串  
 public String    group()  匹配的正则表达式
 public boolean   lookingAt() 每次都是从字符串开头开始匹配
 public Matcher   reset()   重置matcher实例,避免前面的操做影响
 public Matcher   reset(Charseque char)  Matcher对象去匹配新的字符串

1. java正则表达式
2. Java正则表达式《二》
3. java-正则表达式
4. Java 正则表达式
5. java 正则表达式
6. Java的正则表达式
7. Java正则表达式
8. java----------------------------正则表达式
9. java--正则表达式
10. 正则表达式（Java）
更多相关文章...
• Scala 正则表达式 - Scala教程
• PHP 正则表达式(PCRE) - PHP参考手册
• 委托模式
• Java 8 Stream 教程