String源码分析

时间 2019-12-12

标签 string 源码分析繁體版

原文原文链接

1、类定义java

public final class String implements java.io.Serializable, Comparable<String>, CharSequence {...}

final型，表示不能被继承。
实现了Serializable，表示能够序列化和反序列化。
实现了Comparable，表示须要完成compareTo(String s)方法，用于比较
实现了CharSequence，包含了length():int , charAt(int):char,subSequence(int,int):CharSequece,toString():String,chars():intStream,codePoints():IntStream.

2、成员变量正则表达式

private final char value[];
private int hash;
private static final long serialVersionUID = -6849794470754667710L;
private static final ObjectStreamField[] serialPersistentFields = new ObjectStreamField[0];
public static final Comparator<String> CASE_INSENSITIVE_ORDER = new CaseInsensitiveComparator();

value 做为string的底层实现，为字符数组。数组

hash 为字符串的hashcode安全

serialVersionUID 做为系列化和反序列化的标志网络

serialPersistentFields ObjectStreamFields数组用来声明一个类的序列化字段。类中未使用ide

CASE_INSENSITIVE_ORDER 用于作无大小写排序用的比较器，一个内部类生成的比较器函数

3、方法ui

2.1 构造方法this

(1)字符串做为参数编码

public String(){ this.value = "".value};
public String(String original){
  this.value=original.value; 
  this.hash=original.hash;
}

用一个String类型的对象来初始化一个String。这里将直接将源String中的value和hash两个属性直接赋值给目标String。

(2)字符数组做为参数

public String(char value[]){
  this.value=Arrays.copyOf(value, value.length)
}

public String(char value[],int offest, int count){
  if(offest<0){
    throw new StringIndexOutOfBoundsException(count);
  }
  if(offest <=0){
    if(count<0){throw new StringIndexOutOfBoundsException(count);}
    if(offest<=value.length){this.value = "".vlaue; return;}
  }
  if(offest>value.length-count){
    throw new StringIndexOutOfBoundsException(offset+ count);
  }
  this.value = Arrays.copyOfRange(value,offset,offset+count);
}

当咱们使用字符数组建立String的时候，会用到Arrays.copyOf方法和Arrays.copyOfRange方法。这两个方法是将原有的字符数组中的内容逐一的复制到String中的字符数组中。

(3)int数组做为参数

public String(int[] codePoints, int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= codePoints.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > codePoints.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }

        final int end = offset + count;

        // Pass 1: Compute precise size of char[]
        int n = count;
        for (int i = offset; i < end; i++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                continue;
            else if (Character.isValidCodePoint(c))
                n++;
            else throw new IllegalArgumentException(Integer.toString(c));
        }

        // Pass 2: Allocate and fill in char[]
        final char[] v = new char[n];

        for (int i = offset, j = 0; i < end; i++, j++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                v[j] = (char)c;
            else
                Character.toSurrogates(c, v, j++);
        }

        this.value = v;
    }

(4) 字节数组做为参数

public String(byte bytes[], int offset, int length, String charsetName)
            throws UnsupportedEncodingException {
        if (charsetName == null)
            throw new NullPointerException("charsetName");
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(charsetName, bytes, offset, length);
    }

public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBounds(bytes, offset, length);
        this.value =  StringCoding.decode(charset, bytes, offset, length);
    }

public String(byte bytes[], String charsetName)
            throws UnsupportedEncodingException {
        this(bytes, 0, bytes.length, charsetName);
    }

public String(byte bytes[], Charset charset) {
        this(bytes, 0, bytes.length, charset);
    }

public String(byte bytes[], int offset, int length) {
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(bytes, offset, length);
    }

public String(byte bytes[]) {
        this(bytes, 0, bytes.length);
    }

byte是网络传输或存储的序列化形式。byte[]和String之间的相互转换就不得不关注编码问题。String(byte[] bytes, Charset charset)是指经过charset来解码指定的byte数组，将其解码成unicode的char[]数组，够形成新的String。其中都用到了decode函数，具体以下：

static char[] decode(String charsetName, byte[] ba, int off, int len)
        throws UnsupportedEncodingException
    {
        StringDecoder sd = deref(decoder);
        String csn = (charsetName == null) ? "ISO-8859-1" : charsetName;
        if ((sd == null) || !(csn.equals(sd.requestedCharsetName())
                              || csn.equals(sd.charsetName()))) {
            sd = null;
            try {
                Charset cs = lookupCharset(csn);
                if (cs != null)
                    sd = new StringDecoder(cs, csn);
            } catch (IllegalCharsetNameException x) {}
            if (sd == null)
                throw new UnsupportedEncodingException(csn);
            set(decoder, sd);
        }
        return sd.decode(ba, off, len);
    }

能够如是不指定字符集的话，则会用默认的ISO-8859-1字符集解码

(5)StringBuffer和StringBulider做为参数

public String(StringBuffer buffer) {
        synchronized(buffer) {
            this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
        }
    }

public String(StringBuilder builder) {
        this.value = Arrays.copyOf(builder.getValue(), builder.length());
    }

关于效率问题，Java的官方文档有提到说使用StringBuilder的toString方法会更快一些，缘由是StringBuffer的toString方法是synchronized的，在牺牲了效率的状况下保证了线程安全。

2.2 经常使用方法

length() 返回字符串长度

isEmpty() 返回字符串是否为空

charAt(int index) 返回字符串中第（index+1）个字符

char[] toCharArray() 转化成字符数组

trim() 去掉两端空格

toUpperCase() 转化为大写

toLowerCase() 转化为小写

String concat(String str) //拼接字符串

String replace(char oldChar, char newChar) //将字符串中的oldChar字符换成newChar字符

//以上两个方法都使用了String(char[] value, boolean share)；

boolean matches(String regex) //判断字符串是否匹配给定的regex正则表达式

boolean contains(CharSequence s) //判断字符串是否包含字符序列s

String[] split(String regex, int limit) 按照字符regex将字符串分红limit份。

String[] split(String regex)

getBytes

public byte[] getBytes(String charsetName)throws UnsupportedEncodingException {
        if (charsetName == null) throw new NullPointerException();
        return StringCoding.encode(charsetName, value, 0, value.length);
    }

public byte[] getBytes(Charset charset) {
        if (charset == null) throw new NullPointerException();
        return StringCoding.encode(charset, value, 0, value.length);
    }

比较方法

boolean equals(Object anObject)；
boolean contentEquals(StringBuffer sb)；
boolean contentEquals(CharSequence cs)；
boolean equalsIgnoreCase(String anotherString)；
int compareTo(String anotherString)；
int compareToIgnoreCase(String str)；
boolean regionMatches(int toffset, String other, int ooffset,int len)  //局部匹配
boolean regionMatches(boolean ignoreCase, int toffset,String other, int ooffset, int len)   //局部匹配

其中比较有特色的：

public boolean equals(Object anObject) {  
        if (this == anObject) {  //判断两个对象是不是指向同一内存地址的
            return true;
        }
        if (anObject instanceof String) {  //判断两个字符串的值是否相同
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

其中的局部匹配使用参考

判断字符串开始结束字符串

public boolean startsWith(String prefix, int toffset) {  //prefix前缀， toffset开始比较的位置
        char ta[] = value;
        int to = toffset;
        char pa[] = prefix.value;
        int po = 0;
        int pc = prefix.value.length;
        // Note: toffset might be near -1>>>1.
        if ((toffset < 0) || (toffset > value.length - pc)) {
            return false;
        }
        while (--pc >= 0) {
            if (ta[to++] != pa[po++]) {
                return false;
            }
        }
        return true;
    }
同理有：
public boolean startsWith(String prefix){}
public boolean endsWith(String suffix) {return startsWith(suffix, value.length - suffix.value.length);}

4、总结

String对象是不可改变的，赋值给字符串引用以新的引用时，实际是改变其指向的内存地址，可是原内存的值是没有改变的。

5、注意

string类前面的final只是说明了其类不可以被继承，并不能理解为是string的实例对象为final型的
string中的final char value[] 是保存其内容的实现，这是final型的变量是不能改变其引用的
string中的不变性是指：string指向的字符串本生的值是不可变的，可是string的引用是可变的