String.replaceAll 正则表达式特殊字符横线-

需求,把以下字符替换成空格:html

!#$%&()[]*+-@?{|}~¢£¤¥¦§©ª«¬­®¯°±²³µ¶¹º»¼«½¾¿×~‘’`_\\^þÞ¡¨!<>\'*˝´\"ſß÷ΓΔΘΛΞΠΣΦΨΩγδθΛΦЂЃЉЊЋЍЏБДЖЗИЙЛФЦШЧЩЪЫЬЭЮЯ‐–—―‘’‚“”„†‡…•‰‹›‽₂₁₀ⁿ⁾⁽⁼⁻⁺⁹⁸⁷⁶⁵⁴⁰⁄₃₄₅₆₇₈₉₊₋₌₎₍€℅ℓ№℗⅟⅞⅝⅜⅛⅚⅙⅘⅗⅖⅕⅔⅓℮Ω™℠←↑→↓↔↕↖↗↘↙∂∆∏∑−∙√fflffiflfiff◊≥≤≠≈∫∞ѲҐΏГПѝѢjava

天然考虑使用String的replaceAll来替换,jdk中此方法的定义以下:正则表达式

/**
     * Replaces each substring of this string that matches the given <a
     * href="../util/regex/Pattern.html#sum">regular expression</a> with the
     * given replacement.
     *
     * <p> An invocation of this method of the form
     * <i>str</i>{@code .replaceAll(}<i>regex</i>{@code ,} <i>repl</i>{@code )}
     * yields exactly the same result as the expression
     *
     * <blockquote>
     * <code>
     * {@link java.util.regex.Pattern}.{@link
     * java.util.regex.Pattern#compile compile}(<i>regex</i>).{@link
     * java.util.regex.Pattern#matcher(java.lang.CharSequence) matcher}(<i>str</i>).{@link
     * java.util.regex.Matcher#replaceAll replaceAll}(<i>repl</i>)
     * </code>
     * </blockquote>
     *
     *<p>
     * Note that backslashes ({@code \}) and dollar signs ({@code $}) in the
     * replacement string may cause the results to be different than if it were
     * being treated as a literal replacement string; see
     * {@link java.util.regex.Matcher#replaceAll Matcher.replaceAll}.
     * Use {@link java.util.regex.Matcher#quoteReplacement} to suppress the special
     * meaning of these characters, if desired.
     *
     * @param   regex
     *          the regular expression to which this string is to be matched
     * @param   replacement
     *          the string to be substituted for each match
     *
     * @return  The resulting {@code String}
     *
     * @throws  PatternSyntaxException
     *          if the regular expression's syntax is invalid
     *
     * @see java.util.regex.Pattern
     *
     * @since 1.4
     * @spec JSR-51
     */
    public String replaceAll(String regex, String replacement) {
        return Pattern.compile(regex).matcher(this).replaceAll(replacement);
    }

第一个参数是正则表达式,把须要替换的字符放到[]中,而后放入第一个参数,这还没完,须要把这些字符中的属于正则表达式的特殊字符转义一下。express

特殊字符可见以下连接:连接测试

把特殊字符抽取出来,单独替换,代码以下:this

result = result.replaceAll("[\\$\\(\\)\\*\\+\\.\\[\\]\\?\\\\^\\{\\}\\|]", " ");
        result = result.replaceAll("[!#%&-@~¢£¤¥¦§©ª«¬\u00AD®¯°±²³µ¶¹º»¼«½¾¿×~‘’`_þÞ¡¨!<>'˝´\"ſß÷ΓΔΘΛΞΠΣΦΨΩγδθΛΦЂЃЉЊЋЍЏБДЖЗИЙЛФЦШЧЩЪЫЬЭЮЯ‐–—―‘’‚“”„†‡…•‰‹›‽₂₁₀ⁿ⁾⁽⁼⁻⁺⁹⁸⁷⁶⁵⁴⁰⁄₃₄₅₆₇₈₉₊₋₌₎₍€℅ℓ№℗⅟⅞⅝⅜⅛⅚⅙⅘⅗⅖⅕⅔⅓℮Ω™℠←↑→↓↔↕↖↗↘↙∂∆∏∑−∙√fflffiflfiff◊≥≤≠≈∫∞ѲҐΏГПѝѢ]", " ");

写完了以后测试发现数字也能够被替换掉,这就奇怪了,使用二分法来筛选究竟是哪块除了问题,最后定位到&-@,原来横线也是特殊字符,只要ASCII码在&(38)和@(64)之间的(好比数字、括号、星号、加号)都会知足正则表达式。把它也抽取出来转义就行了,以下:spa

result = result.replaceAll("[\\$\\(\\)\\*\\+\\.\\[\\]\\?\\\\^\\{\\}\\|\\-]", " ");
        result = result.replaceAll("[!#%&@~¢£¤¥¦§©ª«¬\u00AD®¯°±²³µ¶¹º»¼«½¾¿×~‘’`_þÞ¡¨!<>'˝´\"ſß÷ΓΔΘΛΞΠΣΦΨΩγδθΛΦЂЃЉЊЋЍЏБДЖЗИЙЛФЦШЧЩЪЫЬЭЮЯ‐–—―‘’‚“”„†‡…•‰‹›‽₂₁₀ⁿ⁾⁽⁼⁻⁺⁹⁸⁷⁶⁵⁴⁰⁄₃₄₅₆₇₈₉₊₋₌₎₍€℅ℓ№℗⅟⅞⅝⅜⅛⅚⅙⅘⅗⅖⅕⅔⅓℮Ω™℠←↑→↓↔↕↖↗↘↙∂∆∏∑−∙√fflffiflfiff◊≥≤≠≈∫∞ѲҐΏГПѝѢ]", " ");
相关文章
相关标签/搜索