String.replaceAll 正则表达式特殊字符横线-

时间 2019-11-13

标签 string.replaceall string replaceall 正则表达式特殊字符横线栏目正则表达式繁體版

原文原文链接

需求，把以下字符替换成空格：html

!#$%&()[]*+-@?{|}~¢£¤¥¦§©ª«¬®¯°±²³µ¶¹º»¼«½¾¿×~‘’`_\\^þÞ¡¨!<>\'*˝´\"ſß÷ΓΔΘΛΞΠΣΦΨΩγδθΛΦЂЃЉЊЋЍЏБДЖЗИЙЛФЦШЧЩЪЫЬЭЮЯ‐–—―‘’‚“”„†‡…•‰‹›‽₂₁₀ⁿ⁾⁽⁼⁻⁺⁹⁸⁷⁶⁵⁴⁰⁄₃₄₅₆₇₈₉₊₋₌₎₍€℅ℓ№℗⅟⅞⅝⅜⅛⅚⅙⅘⅗⅖⅕⅔⅓℮Ω™℠←↑→↓↔↕↖↗↘↙∂∆∏∑−∙√ﬄﬃﬂﬁﬀ◊≥≤≠≈∫∞ѲҐΏГПѝѢjava

天然考虑使用String的replaceAll来替换，jdk中此方法的定义以下：正则表达式

/**
     * Replaces each substring of this string that matches the given <a
     * href="../util/regex/Pattern.html#sum">regular expression</a> with the
     * given replacement.
     *
     * <p> An invocation of this method of the form
     * <i>str</i>{@code .replaceAll(}<i>regex</i>{@code ,} <i>repl</i>{@code )}
     * yields exactly the same result as the expression
     *
     * <blockquote>
     * <code>
     * {@link java.util.regex.Pattern}.{@link
     * java.util.regex.Pattern#compile compile}(<i>regex</i>).{@link
     * java.util.regex.Pattern#matcher(java.lang.CharSequence) matcher}(<i>str</i>).{@link
     * java.util.regex.Matcher#replaceAll replaceAll}(<i>repl</i>)
     * </code>
     * </blockquote>
     *
     *<p>
     * Note that backslashes ({@code \}) and dollar signs ({@code $}) in the
     * replacement string may cause the results to be different than if it were
     * being treated as a literal replacement string; see
     * {@link java.util.regex.Matcher#replaceAll Matcher.replaceAll}.
     * Use {@link java.util.regex.Matcher#quoteReplacement} to suppress the special
     * meaning of these characters, if desired.
     *
     * @param   regex
     *          the regular expression to which this string is to be matched
     * @param   replacement
     *          the string to be substituted for each match
     *
     * @return  The resulting {@code String}
     *
     * @throws  PatternSyntaxException
     *          if the regular expression's syntax is invalid
     *
     * @see java.util.regex.Pattern
     *
     * @since 1.4
     * @spec JSR-51
     */
    public String replaceAll(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceAll(replacement);
    }

第一个参数是正则表达式，把须要替换的字符放到[]中，而后放入第一个参数，这还没完，须要把这些字符中的属于正则表达式的特殊字符转义一下。express

特殊字符可见以下连接：连接测试

把特殊字符抽取出来，单独替换，代码以下：this

result = result.replaceAll("[\\$\\(\\)\\*\\+\\.\\[\\]\\?\\\\^\\{\\}\\|]", " ");
        result = result.replaceAll("[!#%&-@~¢£¤¥¦§©ª«¬\u00AD®¯°±²³µ¶¹º»¼«½¾¿×~‘’`_þÞ¡¨!<>'˝´\"ſß÷ΓΔΘΛΞΠΣΦΨΩγδθΛΦЂЃЉЊЋЍЏБДЖЗИЙЛФЦШЧЩЪЫЬЭЮЯ‐–—―‘’‚“”„†‡…•‰‹›‽₂₁₀ⁿ⁾⁽⁼⁻⁺⁹⁸⁷⁶⁵⁴⁰⁄₃₄₅₆₇₈₉₊₋₌₎₍€℅ℓ№℗⅟⅞⅝⅜⅛⅚⅙⅘⅗⅖⅕⅔⅓℮Ω™℠←↑→↓↔↕↖↗↘↙∂∆∏∑−∙√ﬄﬃﬂﬁﬀ◊≥≤≠≈∫∞ѲҐΏГПѝѢ]", " ");

写完了以后测试发现数字也能够被替换掉，这就奇怪了，使用二分法来筛选究竟是哪块除了问题，最后定位到&-@，原来横线也是特殊字符，只要ASCII码在&（38）和@（64）之间的（好比数字、括号、星号、加号）都会知足正则表达式。把它也抽取出来转义就行了，以下：spa

result = result.replaceAll("[\\$\\(\\)\\*\\+\\.\\[\\]\\?\\\\^\\{\\}\\|\\-]", " ");
        result = result.replaceAll("[!#%&@~¢£¤¥¦§©ª«¬\u00AD®¯°±²³µ¶¹º»¼«½¾¿×~‘’`_þÞ¡¨!<>'˝´\"ſß÷ΓΔΘΛΞΠΣΦΨΩγδθΛΦЂЃЉЊЋЍЏБДЖЗИЙЛФЦШЧЩЪЫЬЭЮЯ‐–—―‘’‚“”„†‡…•‰‹›‽₂₁₀ⁿ⁾⁽⁼⁻⁺⁹⁸⁷⁶⁵⁴⁰⁄₃₄₅₆₇₈₉₊₋₌₎₍€℅ℓ№℗⅟⅞⅝⅜⅛⅚⅙⅘⅗⅖⅕⅔⅓℮Ω™℠←↑→↓↔↕↖↗↘↙∂∆∏∑−∙√ﬄﬃﬂﬁﬀ◊≥≤≠≈∫∞ѲҐΏГПѝѢ]", " ");