Java 经常使用类源码解析——String

时间 2019-11-07

标签 java 经常使用源码解析 string 栏目 Java 繁體版

原文原文链接

String

类图

成员变量

/** * 存储字符，被 final 修饰，没法修改 */
    private final char value[];

    /** * 存储 String 的 hashcode */
    private int hash; // Default to 0
复制代码

String 类的成员变量主要是上面两个。java

经常使用构造方法

`public String( )`

public String() {
		this.value = "".value;
}
复制代码

这里直接用 "".value 赋值，而 value 是 "" 这个 String 对象的私有成员变量，为何能够直接访问呢？git

由于 java 的访问控制符是基于类的，而不是基于对象的。因此在同一个类中，能够访问该类不一样对象的私有成员变量。正则表达式

`public String(String original)`

public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
}
复制代码

这种方式建立出来的字符串其实是 original 的一份拷贝，新字符串的 value 变量与 original 字符串的 value 变量是同一个内存地址的对象。因此，若是不须要显示拷贝的状况下，没有必要使用这种方式建立对象。算法

`public String(char value[])`

public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
}
复制代码

根据字符数组建立字符串，这里使用 Arrays.copyOf 方法能够防止对 value 字符数组的修改影响到建立出来的字符串中的 value 数组。数组

`public String(char value[], int offset, int count)`

public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }
复制代码

这个方法与上面的构造方法相似，最后给 value 赋值使用的 Arrays.copyOfRange 方法来进行指定范围的拷贝。工具

经常使用方法

`public String substring(int beginIndex, int endIndex)`

public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }
复制代码

该方法用来获取子字符串，截取范围为 [beginIndex, endIndex)，即包括起始索引，不包括终止索引。this

最后返回的新字符串使用的 public String(char value[], int offset, int count) 来构造。spa

`public boolean equals(Object anObject)`

public boolean equals(Object anObject) {
        // 直接比较内存地址
        if (this == anObject) {
            return true;
        }
        // 判断 anObject 是否属于 String 类
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            // 比较长度是否相等
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                // 逐位判断值是否相等
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }
复制代码

String 的 equals 方法是一个经典的 Object 类的重写方法，其操做主要包括四个步骤code

比较两个对象内存地址是否相同（Object 中的 equals 方法实现）
判断传入对象是否属于 String 类
比较长度是否相等
经过循环逐位比较相同索引的值是否相等

`public String replace(char oldChar, char newChar)`

public String replace(char oldChar, char newChar) {
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */

            //循环判断字符串中是否有须要被替换的字符
            while (++i < len) {
                if (val[i] == oldChar) {
                    break;
                }
            }
            //若是有须要被替换的字符串，则进入该过程
            if (i < len) {
                // 构造新的字符数据 buf，放入已经遍历过的字符
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    buf[j] = val[j];
                }
                // 若是字符串没有所有被遍历，继续遍历；当索引 i 位置上的元素等于 oldChar 时替换为 newChar
                while (i < len) {
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }
复制代码

该方法替换字符步骤以下：cdn

在 while 循环中判断原字符串中是否有须要被替换的字符 oldChar
若是原字符串中有 oldChar，则进入新字符串构建过程
新建 buf[] 数组，将原字符串已经遍历的不等于 oldChar 的字符放入其中
若是原字符串没有所有被遍历，则继续遍历；当索引 i 位置上的元素等于 oldChar 时替换为 newChar
根据新构建的 buf[] 数组返回新的字符串对象

这里只介绍了参数为 char 的字符替换，参数为 String 的替换都是使用 正则表达式 来匹配并替换的。

`public String[] split(String regex)`

public String[] split(String regex) {
        return split(regex, 0);
    }
    
    public String[] split(String regex, int limit) {
        /* fastpath if the regex is a (1)one-char String and this character is not one of the RegEx's meta characters ".$|()[{^?*+\\", or (2)two-char String and the first char is the backslash and the second is not the ascii digit or ascii letter. */
        char ch = 0;
        if ((
                // 字符长度为 1 时，匹配是不是特殊字符
                (regex.value.length == 1 && ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
                // 字符长度为 2 时，匹配第一个字符为'\'，第二个字符非字母与数字
                (regex.length() == 2 && regex.charAt(0) == '\\' && (((ch = regex.charAt(1)) - '0') | ('9' - ch)) < 0
                        && ((ch - 'a') | ('z' - ch)) < 0 && ((ch - 'A') | ('Z' - ch)) < 0))
                // 匹配是不是字符范围
                && (ch < Character.MIN_HIGH_SURROGATE || ch > Character.MAX_LOW_SURROGATE)) {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            // 遍历 String，将分割的部分分别加入 list 中
            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
            // 没有匹配到字符
            if (off == 0) {
                return new String[]{this};
            }

            // list 添加留下来的部分
            if (!limited || list.size() < limit) {
                list.add(substring(off, value.length));
            }

            // 构造结果
            int resultSize = list.size();
            if (limit == 0) {
                //移除尾部空字符串
                while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                    resultSize--;
                }
            }
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        // 其他状况，使用正则表达式来处理
        return Pattern.compile(regex).split(this, limit);
    }
复制代码

具体步骤都在方法注释上，关注遍历 String 的操做

while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
复制代码

当 regex 为单个字符时，已遍历字符索引为 off，next 为 regex 出现的索引。当有元素匹配上 regex 时，off = next + 1，而当有两个连续的 regex 字符出现时，也会出现 next = next + 1。此时 next = off，substring(off, next) 为空字符串。

因此，若字符串中出现连续的单一字符 regex N 次，则后面的 N - 1 个 regex 会致使结果中出现 N - 1 个空字符串。

在 regex 长度大于一时，正常匹配的处理过程也会将连续的 regex处理成空字符串。

其余方法

`public native String intern()`

public native String intern();
复制代码

intern 在开发中基本上不会使用到，可是在方法分析中常常遇到。

intern 方法的做用在 jdk 的注释中已经解释的很清楚了。

当字符串已经存在常量池中时，返回该字符串在常量池中的内存地址；

若是字符串在常量池中不存在时，将该字符串加入常量池，再返回其在常量池中的内存地址。

用一段代码来解释：

①        String s1 = "Hello";
②        String s2 = "Hello";
③        String s3 = new String("Hello");
④        System.out.println(s1 == s2);//true
⑤        System.out.println(s1 == s3);//false
⑥        s3 = s3.intern();
⑦        System.out.println(s1 == s3);//true
复制代码

第一步，在栈中声明了一个变量 s1，在常量池中加入了字符串 "Hello"。

第二步，在栈中声明了一个变量 s2，指向常量池中的 "Hello"。

第三步，在堆中建立了一个对象，对象指向常量池中的 "Hello"，栈中声明的变量 s3 指向的是堆中的对象。

所以，s1 == s2 为 true，而 s1 == s3 为 false。

第六步调用了 s3 = s3.intern()，至关于获取了常量池中 "Hello" 的内存地址，并使 s3 指向它。

所以第七步，s1 == s3 会输出 true。

String 的不变性

public final class String implements java.io.Serializable, Comparable<String>, CharSequence {

    /** * 存储字符，被 final 修饰，没法修改 */
    private final char value[];
复制代码

String 被 final 修饰，说明该类不能被继承。

String 中保存数据的是 char 数组 value，value 也被 final 修饰，因此当 String 被赋值以后，内存地址没法再修改。即便能够改变 value 数组中的值，可是 value 被 private 修饰，内部也没有开放对 value 修改的方法，因此 value 产生后，内存地址没法修改。

以上两点肯定了 String 的不变性。

总结

String 在开发中使用很简单，可是在算法中会使用到不少它的方法，因此要清楚一些细节，好比 substring 方法的范围开闭问题，好比 split 方法出现连续匹配的问题。

String 中一些方法的返回结果中，有一些是还须要咱们自行处理的，好比 split 方法出现的空字符串问题，咱们可使用 Guava 中关于字符串的一些工具类来处理，来获得符合咱们需求的结果。