Java深刻研究String字符串

说起String字符串,咱们更多的是用于文本的传输与存储,在JDK源码中也被申明为final类型,同时也不属于Java中基本的数据类型,例如以直接双引号申明的常量String nameStr="Manna Yang";或者采用构造函数建立String nameStr=new String("Manna Yang");下面将逐步揭开其神秘面纱...java

class字节码文件结构

在探究String字符串常量池以前,咱们首先看下经过javap -v命令编译后的字节码数组

  1. 原始Java代码,经过javac编译为class,
public class TestString{
      private String testStr="Manna Yang";
      public static int TYPE=0;
      
      public static void main(String[] args){
            System.out.println("Manna Yang");
      }
}
复制代码
  1. 编译后的字节码
Classfile /C:/Users/15971/Desktop/TestString.class
  Last modified 2019-9-18; size 566 bytes
  MD5 checksum 72f3c93ff8293c97a3da06775fa48ba0
  Compiled from "TestString.java"
public class TestString
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref #8.#22 // java/lang/Object."<init>":()V
   #2 = String #23 // Manna Yang
   #3 = Fieldref #7.#24 // TestString.testStr:Ljava/lang/String;
   #4 = Fieldref #25.#26 // java/lang/System.out:Ljava/io/PrintStream;
   #5 = Methodref #27.#28 // java/io/PrintStream.println:(Ljava/lang/String;)V
   #6 = Fieldref #7.#29 // TestString.TYPE:I
   #7 = Class #30 // TestString
   #8 = Class #31 // java/lang/Object
   #9 = Utf8 testStr
  #10 = Utf8 Ljava/lang/String;
  #11 = Utf8 TYPE
  #12 = Utf8 I
  #13 = Utf8 <init>
  #14 = Utf8 ()V
  #15 = Utf8 Code
  #16 = Utf8 LineNumberTable
  #17 = Utf8 main
  #18 = Utf8 ([Ljava/lang/String;)V
  #19 = Utf8 <clinit>
  #20 = Utf8 SourceFile
  #21 = Utf8 TestString.java
  #22 = NameAndType #13:#14 // "<init>":()V
  #23 = Utf8 Manna Yang
  #24 = NameAndType #9:#10 // testStr:Ljava/lang/String;
  #25 = Class #32 // java/lang/System
  #26 = NameAndType #33:#34 // out:Ljava/io/PrintStream;
  #27 = Class #35 // java/io/PrintStream
  #28 = NameAndType #36:#37 // println:(Ljava/lang/String;)V
  #29 = NameAndType #11:#12 // TYPE:I
  #30 = Utf8 TestString
  #31 = Utf8 java/lang/Object
  #32 = Utf8 java/lang/System
  #33 = Utf8 out
  #34 = Utf8 Ljava/io/PrintStream;
  #35 = Utf8 java/io/PrintStream
  #36 = Utf8 println
  #37 = Utf8 (Ljava/lang/String;)V
{
  public static int TYPE;
    descriptor: I
    flags: ACC_PUBLIC, ACC_STATIC

  public TestString();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1 // Method java/lang/Object."<init>":()V
         4: aload_0
         5: ldc           #2 // String Manna Yang
         7: putfield      #3 // Field testStr:Ljava/lang/String;
        10: return
      LineNumberTable:
        line 1: 0
        line 2: 4

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #4 // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #2 // String Manna Yang
         5: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
         8: return
      LineNumberTable:
        line 6: 0
        line 7: 8

  static {};
    descriptor: ()V
    flags: ACC_STATIC
    Code:
      stack=1, locals=0, args_size=0
         0: iconst_0
         1: putstatic     #6 // Field TYPE:I
         4: return
      LineNumberTable:
        line 3: 0
}
SourceFile: "TestString.java"
复制代码
  1. Constant pool位置以前,依次是对当前编译的class、最后修改时间、MD5校验、Java的次要版本、主版本52(十进制的值,对应jdk 1.8,转换为16进制为34)、标记是不是public、是否调用超类构造方法
  2. Constant pool位置如下,从 #1 - #37 对应常量池区域,存放方法签名以及定义的String字面值,例如#2对应#23 便是Java代码中private String testStr="Manna Yang"; #29 NameAndType 对应#11:#12 便是Java代码中的public static int TYPE=0;
  3. 方法签名类型以下
签名字符 方法类型
B byte
C char
D double
F float
I int
J long
L 引用类型
S short
Z boolean
[ 数组类型
V Void类型
6. 字节码指令
public TestString();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1 // Method java/lang/Object."<init>":()V
         4: aload_0
         5: ldc           #2 // String Manna Yang
         7: putfield      #3 // Field testStr:Ljava/lang/String;
        10: return
      LineNumberTable:
        line 1: 0
        line 2: 4
复制代码
  • descriptor : 描述方法类型安全

  • flags : 描述修饰符bash

  • stack : 操做数堆栈大小多线程

  • locals : 局部变量数大小app

  • agrs_size : 方法参数个数函数

  • load类型指令 常见的有aload,fload,iload,dload,此处aload_0表示将本地变量推送到栈顶,a表示引用类型,i\d\f分别对应基本类型,结构基本遵循 : 类型|动做优化

  • const类型指令 常见有iconst,iconst,fconst,dconst,例如定义int testType=2;在父类构造方法中就会存在iconst_0(下划线后面为index,表示变量位置),表示将int型常量推送到栈顶;ui

  • ldc : 将int,float或String型常量从常量池中推送至栈顶this

  • putfield : 赋值操做,对应还有getfield

  • return : 返回void,对应还有ireturn、freturn,表示返回int\float类型

  • invokespecial : 调用父类无参无返回值构造方法

  • putstatic : 静态变量赋值,对应还有getstatic


String字符串equals、hashcode、intern方法

1.了解上述字节码结构以后,再来看看经常使用的字符串比较

public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String) anObject;
            int n = length();
            if (n == anotherString.length()) {
                int i = 0;
                while (n-- != 0) {
                    if (charAt(i) != anotherString.charAt(i))
                            return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }
复制代码

默认仍是比较常量池引用地址是否相等,不然对比类型,接着调用charAt()逐个字符比较,下面举例一些常见的比较场景,加深理解

String testStr1="Manna Yang";
String testStr2=new String("Manna Yang");
String testStr3="Manna Yang";

System.out.println(testStr1 == testStr2);           //false
System.out.println(testStr1.equals(testStr2));      //true
System.out.println(testStr1 == testStr3);           //true
System.out.println(testStr1.equals(testStr3));      //true
按照jdk中equals方法,此时==对比为false(地址不同),则继续采用charAt方式逐个比较字符,new关键字建立的
对象存放在heap堆,双引号""申明的常量放在常量池,testStr2引用指向常量池"Manna Yang"字符地址
复制代码

继续往下看 + 号的魅力

String testStr0 = new String("Test")+new String("Manna")+new String("Yang");编码后以下

0: new           #2 // class java/lang/StringBuilder
3: dup
4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V
7: new           #4 // class java/lang/String
10: dup
11: ldc           #5 // String Test
13: invokespecial #6 // Method java/lang/String."<init>":(Ljava/lang/String;)V
16: invokevirtual #7 //Method java/lang/StringBuilder.append:
                                (Ljava/lang/String;)Ljava/lang/StringBuilder;
19: new           #4 // class java/lang/String
22: dup
23: ldc           #8 // String Manna
25: invokespecial #6 // Method java/lang/String."<init>":(Ljava/lang/String;)V
28: invokevirtual #7 //Method java/lang/StringBuilder.append:
                                (Ljava/lang/String;)Ljava/lang/StringBuilder;
31: new           #4 // class java/lang/String
34: dup
35: ldc           #9 // String Yang
37: invokespecial #6 // Method java/lang/String."<init>":(Ljava/lang/String;)V
40: invokevirtual #7 //Method java/lang/StringBuilder.append:
                                (Ljava/lang/String;)Ljava/lang/StringBuilder;
43: invokevirtual #10 //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
46: astore_1
47: return
复制代码

在字节码中能够看到+号 StringBuilder对象也参与一次建立,而后调用父类初始化方法,接着调用append方法,最后再调用toString(),字节码中new的指令包含4次,ldc指令包含3次;实际上jdk优化后的+号,在处理字符串拼接时提供很大便利,例如String testStr1="Manna"+" Yang";那么在字节码里面已经拼接成一个字符串常量"Manna Yang";还有常见的在new String(""+"")这种方式,字符串也是会拼接,对应只new一次String对象;

2.继续看下hashcode, hash值(哈希)主要用于散列存储结构中确认对象的地址,像经常使用的HashMap\HashTable,若是两个对象相同则它们的hash值必定相同;反之hash值相同的两个对象不必定相同;在进行hash计算时咱们指望hash值的碰撞越少越好,提升查询效率,下面看下String的hashCode()方法源码

public int hashCode() {
    int h = hash;
    final int len = length();
    if (h == 0 && len > 0) {
        for (int i = 0; i < len; i++) {
            h = 31 * h + charAt(i);
        }
        hash = h;
    }
    return h;
}
复制代码

关于31这个系数我理解的更可能是散列分布的更为均匀,产生hash碰撞的概率更小,在源码说明里面也有计算公式推导 : s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1],charAt字符数组中字符对应的value值为ASCII值,null的ASCII值为0;

3.关于String类中的intern(),源码方法里有详细注释,来源于jdk1.8

When the intern method is invoked, if the pool already contains a
* string equal to this {@code String} object as determined by
* the {@link #equals(Object)} method, then the string from the pool is
* returned. Otherwise, this {@code String} object is added to the
* pool and a reference to this {@code String} object is returned.

//源码方法
public native String intern();
复制代码

String、StringBuffer、StringBuilder使用场景

字符串拼接效率,若是是字面常量拼接,则直接使用""+""+""这种方式,+号优化后只会生成一个对象,若是是字符串对象之间拼接,在多线程中使用时应采用StringBuffer,大部分方法线程安全;不然可以使用StringBuilder,后二者StringBuffer、StringBuilder的扩容机制为array.length+16,均继承抽象父类AbstractStringBuilder中的构造函数,源码以下

AbstractStringBuilder(int var1) {
    this.value = new char[var1];
}
...
复制代码

每次都是从新new,而后再进行array copy,建议在初始拼接时传入指定预计字符串长度值

以上涉及JDK源码部分均来自 JDK 1.8

个人我的新球

加入星球一块儿讨论项目、研究新技术,共同成长!

相关文章
相关标签/搜索