一个JAVA应用启动缓慢问题排查 --来自jdk securerandom 的问候

  开发某个项目过程当中,就需求,搭建了一套测试环境。很快完成!html

  后来代码中加入了许多新功能,会涉及到反复重启,而后就发现了启动特别慢。这给本来功能就很少的应用增添了许多的负担。java

  我决定改变这一切!找到启动缓慢的根源,加快启动速度!git

思路展开:spring

  1. 启动慢,先看日志,有没有什么异常?
  2. 看下是否是gc 有问题,是否占用很长时间?
  3. 日志卡住,是不是存在网络请求不通状况?
  4. 抓包查看卡住的时候,应用都作什么?
  5. 把线程堆栈打印出来,线程都在作什么?把内存dump 出来瞅瞅吧?
  6. 实在不行,本地debug不行,换一台测试机试试?
  7. 实在不行,远程debug吧?
  8. 定位到点后,再来思考解决好的解决方案?
  9. 除了解决这个问题,咱们还能思考点什么?网络

让我按思路一个个来看解决过程!app

1. 启动慢,先看日志,有没有什么异常?

  这个基本上很快就有答案了,看日志,等待期间没有问题!less

  可是,咱们每每会停留在这个阶段较长一段时间,由于咱们总想经过其余方式,打印点什么出来!dom

  因此,咱们能够会改日志级别,try... catch... 等等一顿操做,而后,才说算了吧,咱们想一想别的招!我我的仍是以为这种思路是ok的!ssh

 

2. 看下是否是gc 有问题,是否占用很长时间?

  这里可能就会涉及,以前并无打印gc日志,而后还得回去在启动参数里加上gc输出!姿式以下:tcp

-Xloggc:/opt/logs/a_gc.log -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/logs/ -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+ParallelRefProcEnabled -XX:ErrorFile=/run/dump/hs_err_pid%p.lo3g -XX:HeapDumpPath=/run/dump

  打开后,你看到的gc 日志多是这样的。 你想从里面发现点什么,好像频繁没有 FullGC 之类的操做,而后又悻悻而归!

10.224: [GC concurrent-root-region-scan-start]
10.249: [GC concurrent-root-region-scan-end, 0.0252691 secs]
10.249: [GC concurrent-mark-start]
10.250: [GC concurrent-mark-end, 0.0005479 secs]
10.250: [GC remark 10.251: [Finalize Marking, 0.0001862 secs] 10.251: [GC ref-proc, 0.0026452 secs] 10.253: [Unloading, 0.0125200 secs], 0.0156381 secs]
 [Times: user=0.06 sys=0.00, real=0.02 secs]
10.266: [GC cleanup 33M->33M(1024M), 0.0012152 secs]
 [Times: user=0.00 sys=0.00, real=0.00 secs]
13.768: [GC pause (G1 Evacuation Pause) (young), 0.0411037 secs]
   [Parallel Time: 34.5 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 13768.7, Avg: 13769.8, Max: 13773.1, Diff: 4.4]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 1.8, Max: 4.1, Diff: 4.1, Sum: 7.3]
      [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.5]
      [Code Root Scanning (ms): Min: 0.0, Avg: 3.1, Max: 8.8, Diff: 8.8, Sum: 12.5]
      [Object Copy (ms): Min: 21.1, Avg: 28.0, Max: 30.5, Diff: 9.4, Sum: 111.9]
      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]
         [Termination Attempts: Min: 1, Avg: 12.2, Max: 22, Diff: 21, Sum: 49]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 29.9, Avg: 33.2, Max: 34.3, Diff: 4.4, Sum: 132.6]
      [GC Worker End (ms): Min: 13803.0, Avg: 13803.0, Max: 13803.0, Diff: 0.0]
   [Code Root Fixup: 0.4 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.2 ms]
   [Other: 5.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 4.6 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.9 ms]
   [Eden: 544.0M(544.0M)->0.0B(573.0M) Survivors: 31.0M->41.0M Heap: 575.0M(1024.0M)->41.0M(1024.0M)]
 [Times: user=0.13 sys=0.02, real=0.04 secs]
17.178: [GC pause (G1 Evacuation Pause) (young), 0.0648366 secs]
   [Parallel Time: 59.7 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 17177.7, Avg: 17181.1, Max: 17184.4, Diff: 6.6]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 3.9, Max: 12.7, Diff: 12.7, Sum: 15.6]
      [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.2, Sum: 0.6]
      [Code Root Scanning (ms): Min: 0.0, Avg: 6.5, Max: 19.4, Diff: 19.4, Sum: 26.0]
      [Object Copy (ms): Min: 34.9, Avg: 45.5, Max: 51.2, Diff: 16.3, Sum: 182.1]
      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.3]
         [Termination Attempts: Min: 1, Avg: 16.5, Max: 24, Diff: 23, Sum: 66]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.2]
      [GC Worker Total (ms): Min: 52.9, Avg: 56.2, Max: 59.5, Diff: 6.6, Sum: 224.8]
      [GC Worker End (ms): Min: 17237.3, Avg: 17237.3, Max: 17237.3, Diff: 0.0]
   [Code Root Fixup: 0.5 ms]
20.648: [GC pause (G1 Evacuation Pause) (young), 0.0653569 secs]
   [Parallel Time: 55.9 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 20648.5, Avg: 20650.4, Max: 20652.3, Diff: 3.7]
      [Ext Root Scanning (ms): Min: 0.1, Avg: 2.9, Max: 9.4, Diff: 9.4, Sum: 11.7]
      [Update RS (ms): Min: 0.0, Avg: 0.2, Max: 0.5, Diff: 0.5, Sum: 0.7]
         [Processed Buffers: Min: 0, Avg: 2.5, Max: 7, Diff: 7, Sum: 10]
      [Scan RS (ms): Min: 0.0, Avg: 0.3, Max: 0.6, Diff: 0.6, Sum: 1.3]
      [Code Root Scanning (ms): Min: 0.0, Avg: 3.2, Max: 10.8, Diff: 10.8, Sum: 12.9]
      [Object Copy (ms): Min: 39.4, Avg: 46.4, Max: 52.1, Diff: 12.8, Sum: 185.5]
      [Termination (ms): Min: 0.0, Avg: 0.9, Max: 1.5, Diff: 1.5, Sum: 3.7]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [GC Worker Total (ms): Min: 52.1, Avg: 54.0, Max: 55.8, Diff: 3.7, Sum: 215.8]
      [GC Worker End (ms): Min: 20704.4, Avg: 20704.4, Max: 20704.4, Diff: 0.0]
   [Code Root Fixup: 0.2 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.1 ms]
   [Other: 9.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 8.5 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.2 ms]
   [Eden: 38.0M(8192.0K)->0.0B(610.0M) Survivors: 43.0M->4096.0K Heap: 83.3M(1024.0M)->54.4M(1024.0M)]
 [Times: user=0.24 sys=0.02, real=0.06 secs]
2826.037: [GC pause (Metadata GC Threshold) (young) (initial-mark), 0.0101504 secs]
   [Parallel Time: 7.1 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 2826037.4, Avg: 2826037.5, Max: 2826037.5, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 1.7, Avg: 2.0, Max: 2.2, Diff: 0.5, Sum: 7.8]
      [Update RS (ms): Min: 0.9, Avg: 1.0, Max: 1.0, Diff: 0.1, Sum: 3.8]
         [Processed Buffers: Min: 7, Avg: 14.2, Max: 25, Diff: 18, Sum: 57]
      [Scan RS (ms): Min: 0.3, Avg: 0.7, Max: 0.9, Diff: 0.6, Sum: 2.8]
      [Code Root Scanning (ms): Min: 0.2, Avg: 0.4, Max: 0.8, Diff: 0.6, Sum: 1.5]
      [Object Copy (ms): Min: 2.7, Avg: 2.9, Max: 3.2, Diff: 0.5, Sum: 11.7]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 6.9, Avg: 7.0, Max: 7.0, Diff: 0.1, Sum: 27.8]
      [GC Worker End (ms): Min: 2826044.4, Avg: 2826044.4, Max: 2826044.4, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.1 ms]
   [Other: 2.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 1.6 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.0 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.5 ms]
   [Eden: 354.0M(438.0M)->0.0B(603.0M) Survivors: 4096.0K->11.0M Heap: 408.3M(1024.0M)->61.4M(1024.0M)]
 [Times: user=0.03 sys=0.00, real=0.01 secs]
2826.047: [GC concurrent-root-region-scan-start]
2826.054: [GC concurrent-root-region-scan-end, 0.0066742 secs]
2826.054: [GC concurrent-mark-start]
2826.116: [GC concurrent-mark-end, 0.0616457 secs]
2826.116: [GC remark 2826.116: [Finalize Marking, 0.0002202 secs] 2826.116: [GC ref-proc, 0.0031923 secs] 2826.120: [Unloading, 0.0131458 secs], 0.0169584 secs]
 [Times: user=0.05 sys=0.00, real=0.01 secs]
2826.134: [GC cleanup 62M->62M(1024M), 0.0010641 secs]
 [Times: user=0.00 sys=0.00, real=0.00 secs]

  无论怎么样,你知道内存不是问题了。不过通常地,在启动时就存在内存问题的应用可能确实很少!

 

3. 日志卡住,是不是存在网络请求不通状况?

  通常来讲,咱们的应用,每作一些关键操做时,都会有相应的日志输出。因此,当你日志卡住的时候,应该就是哪里出现了问题了!

  而在排除了中间出现full gc 卡顿的问题后,咱们能够认为多是网络出现问题了。

  查询网络问题必备的工具: tcpdump, lsof ...

  用法也很简单,咱们能够在彻底没有任何信息的前提下,进行网络抓包,从而给排查问题一点提示:

tcpdump -iany -XX     # 这样就能够看到应用的网络io状况了, 若是量太大能够先把 -XX 选项去除,只看来往状况:

  大概结果以下:

16:54:25.770463 IP 10.5.3.215.63773 > zt-d-xx.ssh: Flags [.], ack 125340, win 552, length 0
16:54:25.770478 IP zt-d-xx.ssh > 10.5.3.215.63773: Flags [P.], seq 127152:127332, ack 1, win 561, length 180
16:54:25.770482 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 125340, win 552, options [nop,nop,sack 1 {125504:125684}], length 0
16:54:25.770487 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 125684, win 556, length 0
16:54:25.770554 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 127332:127864, ack 1, win 561, length 532
16:54:25.770579 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 127864:127916, ack 1, win 561, length 52
16:54:25.770612 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 127916:128208, ack 1, win 561, length 292
16:54:25.771813 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 125684, win 556, options [nop,nop,sack 1 {125864:126044}], length 0
16:54:25.771822 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 128208:128388, ack 1, win 561, length 180
16:54:25.771837 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 126044, win 555, length 0
16:54:25.771840 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 126792, win 552, length 0
16:54:25.771868 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 128388:128808, ack 1, win 561, length 420
16:54:25.771884 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 128808:128972, ack 1, win 561, length 164
16:54:25.771891 IP 10.1.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 127152, win 556, length 0
16:54:25.771934 IP zt-d-xx.ssh > 10.1.2.215.63773: Flags [P.], seq 128972:129376, ack 1, win 561, length 404
16:54:25.771959 IP zt-d-xx.ssh > 10.5.2.215.63773: Flags [P.], seq 129376:129556, ack 1, win 561, length 180
16:54:25.771982 IP zt-d-xx.ssh > 10.5.2.215.63773: Flags [P.], seq 129556:129736, ack 1, win 561, length 180
16:54:25.779454 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 127864, win 553, length 0
16:54:25.779469 IP zt-d-xx.ssh > 10.5.2.215.63773: Flags [P.], seq 129736:129916, ack 1, win 561, length 180
16:54:25.779474 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 128208, win 552, length 0
16:54:25.779476 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 128808, win 556, length 0
16:54:25.779478 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 129376, win 554, length 0
16:54:25.779480 IP 10.5.2.215.63773 > zt-d-xx.ssh: Flags [.], ack 129736, win 553, length 0
16:54:25.779529 IP zt-d-xx.ssh > 10.5.3.215.63773: Flags [P.], seq 129916:130496, ack 1, win 561, length 580
16:54:25.779567 IP zt-d-xx.ssh > 10.5.3.215.63773: Flags [P.], seq 130496:130660, ack 1, win 561, length 164

  其中,咱们主要关注 Flags 参数. man tcpdump 后能够看到解释:

  Tcpflags are some combination of S (SYN), F (FIN), P (PUSH), R (RST), U (URG), W (ECN CWR), E (ECN-Echo) or '.' (ACK)

  详细说明就是:

    # SYN 表示创建链接;
    # FIN 表示关闭链接;
    # ACK 表示响应;
    # PSH 表示有 DATA数据传输;
    # RST 表示链接重置;
    # URG 表示紧急中止位;

  因此,咱们在这里须要关注这么多吗? 其实不须要的,我以为我们能够先看下是否有 RST 链接重置的状况,若是存在,则说明这个网络是不通的,基本定位网络问题!~ 以下抓包:

17:30:41.635937 IP zt-d-xx.43062 > 172.1.0.17.http: Flags [S], seq 3503889751, win 29200, options [mss 1460,sackOK,TS val 1904549734 ecr 0,nop,wscale 7], length 0
17:30:41.636084 IP 172.1.0.17.http > zt-d-xx.43062: Flags [R.], seq 0, ack 3503889752, win 0, length 0

  另外,要查看下是否是某个网络请求 只有 [S] 信号,没有 [P] 的信号, 则说明只有一端在创建链接,另外一边则无响应,网络问题定位!以下请求一个不存在的地址:

17:13:57.744349 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903545842 ecr 0,nop,wscale 7], length 0
17:13:58.745584 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903546844 ecr 0,nop,wscale 7], length 0
17:14:00.749570 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903548848 ecr 0,nop,wscale 7], length 0
17:14:04.757571 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903552856 ecr 0,nop,wscale 7], length 0
17:14:12.765568 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903560864 ecr 0,nop,wscale 7], length 0
17:14:28.797584 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903576896 ecr 0,nop,wscale 7], length 0
17:15:00.861591 IP zt-d-xx.37604 > 172-1-1-2.lightspeed.hstntx.sbcglobal.net.http: Flags [S], seq 3312915935, win 29200, options [mss 1460,sackOK,TS val 1903608960 ecr 0,nop,wscale 7], length 0

   发了n个 [S] 包,都没有响应; 说明整个网络都是不通的;

  经过这个排查,咱们通常能够排除网络问题!

 

4. 抓包查看卡住的时候,应用都作什么?

  这个点其实能够早点作,使用 top 查看内存,cpu 占用状况!

  使用 vmstat 查看系统虚拟内存状态,也能够报告关于进程、内存、I/O等系统总体运行状态!

  使用 lsof -p <pid> 查看打开的文件状况!应该能够看到一些不同的东西,好比你会看到一些你不知道被打开的文件,你会看到一些你不知道的链接请求!

lsof -i   # 列出全部的网络链接
lsof -p <pid>     # 列出进程对应的文件信息
lsof -i4:8080 # 列表8080 端口对应的 ipv4 的链接

 

5. 把线程堆栈打印出来,线程都在作什么?把内存dump 出来瞅瞅吧?

  若是仍是没有发现什么,那咱们还得回来。看看线程卡在了哪里!

  使用 jstack 把线程堆栈打印出来,使用 jmap 把内存dump 出来,分析!

jstack <pid> | less        # 查看线程信息,是否存在死锁
jmap -dump:format=b,file=/tmp/dumpid.dump <pid>        # 查看堆信息,是否须要特别的对象

 

6. 实在不行,本地debug不行,换一台测试机试试?

  对于本地debug, 我只能说,通常环境都不通的,并且本地通常也很复现场景!

  换台机器测试的目的,实际上是想确认问题是否在全部机器上复现,由于若是是和机器自己相关的问题,每每是很难排查的。(我说的没有相关经验的同窗)

  因此,找一台另外机器,代码跑起来,若是其余地方正常,则该问题是机器相关的。

  那么机器相关的bug又该怎么办呢?其实,还得具体问题,具体分析!

  不过幸亏,咱们还有 remote debug 功能,直接链接上去就能够调试了。问题天然也就清晰起来!

 

7. 实在不行,远程debug吧?

  remote debug 请查看上一篇文章: 排障利器之远程调试与监控 --jmx & remote debug 

  链接上以后,先大概猜想可能会出问题的地方,进行断点。判断依据,应该是以前的排查结果,线程状况等等!

  进行大概估计后,在可能的地方进行单步,卡住的地方,其中必然出现了问题。只需再进入内部重复刚才的作法便可!

  优势是这样调试下来,一定能找到问题出现的地方。缺点是:可能须要反复重现问题(可能就是反复重启)。可是这样显得没有技术含量,还有就是在问题难以复现的场景,很难抓住机会解决问题。

  因此,经验真的很重要!

  卡住时的堆栈以下:

"main@1" prio=5 tid=0x1 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
      at java.io.FileInputStream.readBytes(FileInputStream.java:-1)
      at java.io.FileInputStream.read(FileInputStream.java:255)
      at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedBytes(SeedGenerator.java:539)
      at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:144)
      at sun.security.provider.SecureRandom$SeederHolder.<clinit>(SecureRandom.java:203)
      at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:221)
      - locked <0x1af3> (a sun.security.provider.SecureRandom)
      at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
      at java.security.SecureRandom.next(SecureRandom.java:491)
      at java.util.Random.nextLong(Random.java:424)
      at com.taobao.notify.utils.StrongRandom.<init>(StrongRandom.java:45)
      at com.taobao.notify.utils.UniqId.<init>(UniqId.java:38)
      at com.taobao.notify.utils.UniqId.<clinit>(UniqId.java:36)
      at com.taobao.notify.client.impl.DefaultNotifyClient.<init>(DefaultNotifyClient.java:91)
      at com.taobao.notify.client.impl.DefaultNotifyClient.<init>(DefaultNotifyClient.java:97)
      at com.taobao.notify.client.NotifyClientFactory.<clinit>(NotifyClientFactory.java:13)
      at com.taobao.notify.remotingclient.DefaultNotifyManager.<init>(DefaultNotifyManager.java:38)
      at com.taobao.notify.remotingclient.NotifyManagerBean.init(NotifyManagerBean.java:90)
      - locked <0x1a78> (a com.alipay.common.event.tbnotify.adapter.UniformEventPublisherAdapter)
      at com.alipay.common.event.tbnotify.client.NotifyManagerBeanEx.init(NotifyManagerBeanEx.java:64)
      at com.alipay.common.event.tbnotify.adapter.EventSendFacadeTBNotifyImpl.init(EventSendFacadeTBNotifyImpl.java:99)
      at com.alipay.common.event.tbnotify.client.NotifyManagerBeanEx.init(NotifyManagerBeanEx.java:84)
      at com.alipay.common.event.tbnotify.adapter.UniformEventPublisherAdapter.init(UniformEventPublisherAdapter.java:80)
      at com.alipay.boot.dms.client.util.InitializeUtil.initializePublisher(InitializeUtil.java:14)
      at com.alipay.boot.dms.spring.factory.DmsPublisherFactoryBean.afterPropertiesSet(DmsPublisherFactoryBean.java:59)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1642)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1579)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:553)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
      at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
      at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
      - locked <0x1af4> (a java.util.concurrent.ConcurrentHashMap)
      at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
      at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
      at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:351)
      at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:108)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1486)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1231)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:551)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
      at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
      at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
      at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
      at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.autowireResource(CommonAnnotationBeanPostProcessor.java:522)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.getResource(CommonAnnotationBeanPostProcessor.java:496)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor$ResourceElement.getResourceToInject(CommonAnnotationBeanPostProcessor.java:627)
      at org.springframework.beans.factory.annotation.InjectionMetadata$InjectedElement.inject(InjectionMetadata.java:169)
      at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:88)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.postProcessPropertyValues(CommonAnnotationBeanPostProcessor.java:318)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1219)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:551)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
      at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
      at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
      at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
      at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202)
      at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:207)
      at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1131)
      at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1059)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.autowireResource(CommonAnnotationBeanPostProcessor.java:518)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.getResource(CommonAnnotationBeanPostProcessor.java:496)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor$ResourceElement.getResourceToInject(CommonAnnotationBeanPostProcessor.java:627)
      at org.springframework.beans.factory.annotation.InjectionMetadata$InjectedElement.inject(InjectionMetadata.java:169)
      at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:88)
      at org.springframework.context.annotation.CommonAnnotationBeanPostProcessor.postProcessPropertyValues(CommonAnnotationBeanPostProcessor.java:318)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1219)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:551)
      at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
      at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
      at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
      at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
      at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
      at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:754)
      at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:866)
      at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:542)
      - locked <0x1af5> (a java.lang.Object)
      at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.refresh(EmbeddedWebApplicationContext.java:122)
      at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:761)
      at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:371)
      at org.springframework.boot.SpringApplication.run(SpringApplication.java:315)
      at org.springframework.boot.SpringApplication.run(SpringApplication.java:1186)
      at org.springframework.boot.SpringApplication.run(SpringApplication.java:1175)
      at com.xx.SOFABootSpringApplication.main(SOFABootSpringApplication.java:15)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
      at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
      at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
      at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:58)

  而根据堆栈进行的大概猜测,开始排查,过程以下:

    // 初始代码位置:
    // com.taobao.notify.remotingclient.NotifyManagerBean
    public synchronized void init() {
        if (inited.get()) {
            return;
        }

        if (StringUtil.isEmpty(this.name) || StringUtil.isEmpty(this.description)) {
            throw new RuntimeException("The name and description property is required.");
        }
        if (null == notifyManager) {

            AllNotifyClientProperties properties = new AllNotifyClientProperties();
            properties.setCheckMessageTPConfig(checkMessageTPConfig);
            properties.setMessageTPConfig(messageTPConfig);
            properties.setMessageProperties(messageProperties);
            properties.setAntVipConfig(antVipConfig);
            
            // 问题出在这里,new 这个对象时,会触发不少的其余类加载,而这里面就包含了有问题的类
            notifyManager = new DefaultNotifyManager(groupId, appName, name, description,
                messageListener, checkMessageListener, 1, enableCrc, properties);

            notifyManager.subscribeAllMessages(this.subscribeMessages, this.bindingList);

            /** ������ظ���groupId������ɾ��֮ǰ��publish topic*/
            //notifyManager.resetPublishTopics(publishTopics);
            for (String publishTopic : publishTopics) {
                notifyManager.addPublishTopic(publishTopic);
            }

            // ��ʼ��ʱ����servertag��url�Ķ�Ӧ�б�
            notifyManager.resetServerTagUrls();
        }

        inited.set(true);
    }
    
    // com.taobao.notify.remotingclient.DefaultNotifyManager  这个构造方法是 ok 的
    public DefaultNotifyManager(String groupId, String appName, String name, String description,
                                MessageListener msgListener, CheckMessageListener checkMsgListener,
                                int connCount, boolean enableCrc,
                                AllNotifyClientProperties properties) {
        initNotifyClient(properties);

        if (StringUtil.isBlank(groupId)) {
            throw new IllegalArgumentException("GroupID cannot be blank.");
        }
        if (StringTools.containsWhitespace(groupId)) {
            throw new IllegalArgumentException("GroupID cannot contain the blank character. ["
                                               + groupId + "]");
        }
        this.groupId = groupId.trim();
        this.appName = StringUtil.trim(appName);

        if (null == properties) {
            properties = new AllNotifyClientProperties();
        }
        this.notifyclient.addGroup(this.groupId, this.appName, name, description, msgListener,
            checkMsgListener, properties.getMessageProperties(), properties.getWaitForConnTime(),
            enableCrc, properties.getAntVipConfig());
        this.notifyclient.setConnectionCount(connCount);
    }
    // 可是问题在静态字段的初始化问题 
    // com.taobao.notify.remotingclient.DefaultNotifyManager
    public class DefaultNotifyManager implements NotifyManager, DefaultNotifyManagerMBean {
        //��־
        private static final Logger logger                   = MsgBrokerClientLoggerFactory
                                                                .getLogger(DefaultNotifyManager.class);

        // 这看起来像是个单例的 client
        private NotifyClient        notifyclient             = NotifyClientFactory
                                                               .getSingletonNotifyClient();
    }
    // com.taobao.notify.client.NotifyClientFactory
    public class NotifyClientFactory implements FactoryBean {
    
        // 初始化类时,会先初始化一个单例
        static NotifyClient notifyClient = new DefaultNotifyClient();

        /**
         * 
         * @return
         */
        public static NotifyClient getSingletonNotifyClient() {
            return notifyClient;
        }
    }
    // 到此,一切看起来都很美好,让我继续往下看
    // com.taobao.notify.client.impl.DefaultNotifyClient
    public class DefaultNotifyClient implements NotifyClient {
        private static final String               LogPrefix                 = LoggerPrefix
                                                                                .makeLogPrefix(DefaultNotifyClient.class);
        static final Logger                       logger                    = MsgBrokerClientLoggerFactory
                                                                                .getLogger(DefaultNotifyClient.class);

        private volatile NotifyClientConfig       notifyClientConfig        = null;

        private static final int                  MAX_TIMES                 = 3;

        protected final NotifyGroupManager        notifyGroupManager        = new NotifyGroupManager();

        private final RemotingService             remotingService;

        protected final ClientSubscriptionManager clientSubscriptionManager = new ClientSubscriptionManager();

        protected final PublishTopicsManager      publishTopicsManager      = new PublishTopicsManager();

        private final ThreadPoolExecutor          asynSendMessageWorkTP;

        private final LoggingService              loggingService            = LoggingService
                                                                                .getInstance();

        private final ReliableAsynSendManager     reliableAsynSendManager;

        private final MessageProperties           reliableMessageProperties = new MessageProperties();

        // 问题在于 UniqId 的生成
        private final UniqId                      uniqIdInstance            = UniqId.getInstance();

        /**
         * 
         */
        public DefaultNotifyClient() {
            this(new NotifyClientConfig());
        }

        /**
         * @param notifyClientConfig
         */
        public DefaultNotifyClient(final NotifyClientConfig notifyClientConfig) {
            if (null == this.notifyClientConfig) {
                this.notifyClientConfig = notifyClientConfig;
            }
            if (notifyClientConfig.isDebug()) {
                this.remotingService = new IntegratedMockRemotingService(
                    this.clientSubscriptionManager, notifyClientConfig.getDebugRemoteSubscribers(),
                    this.notifyGroupManager, notifyClientConfig.getDebugLocalPort());
                logger.warn(LogPrefix + "Note that, this is a MsgbrokerClient in the DEBUG mode.");
                logger.warn(LogPrefix + "In the DEBUG mode, local port ["
                            + notifyClientConfig.getDebugLocalPort() + "]");
                logger.warn(LogPrefix + "In the DEBUG mode, connections ["
                            + notifyClientConfig.getDebugRemoteSubscribers() + "]");
            } else {
                this.remotingService = new DefaultRemotingService(
                    this.notifyClientConfig.getRemotingType(), this.notifyGroupManager,
                    notifyClientConfig);
            }

            // ��ֹѹ��
            this.reliableMessageProperties.setCompressSize(Integer.MAX_VALUE);
            this.reliableMessageProperties.setMaxStringMessageSize(Integer.MAX_VALUE);

            this.asynSendMessageWorkTP = new ManagedThreadPoolExecutor(notifyClientConfig
                .getAsynSendMessageTPConfig().getCorePoolSize(), notifyClientConfig
                .getAsynSendMessageTPConfig().getMaxPoolSize(), notifyClientConfig
                .getAsynSendMessageTPConfig().getKeepAliveTime(), notifyClientConfig
                .getAsynSendMessageTPConfig().getMaxQueueSize(), "asynSendMsgTP-" + this.hashCode(),
                new ThreadPoolExecutor.AbortPolicy());

            this.reliableAsynSendManager = new ReliableAsynSendManager(this, notifyGroupManager,
                notifyClientConfig);
            if (notifyClientConfig.isPreInitializeReliableAsynSendManager()) {
                this.reliableAsynSendManager.init();
            }

            // DefaultNotifyClientʵ�������ж������Ҫ���䲻ͬ��ID
            try {
                ThreadPoolConfig messageTPConfig = notifyClientConfig.getMessageTPConfig();
                Lookout.registry().info(
                    Lookout.registry().createId("msgbroker.client.group.infos")
                        .withTag("hashcode", String.valueOf(this.hashCode()))
                        .withTag("corePoolSize", String.valueOf(messageTPConfig.getCorePoolSize()))
                        .withTag("maxPoolSize", String.valueOf(messageTPConfig.getMaxPoolSize()))
                        .withTag("queueSize", String.valueOf(messageTPConfig.getMaxQueueSize()))
                        .withTag("channelSize", String.valueOf(messageTPConfig.getChannelSize())),
                    new Info<Set<String>>() {
                        public Set<String> value() {
                            return notifyGroupManager.getGroupIds();
                        }
                    });
            } catch (Exception e) {
                logger.warn("register msgbroker.client.group.infos failed, " + e.getMessage());
            }
        }
    }
    // UniqId 生成以下
    // com.taobao.notify.utils.UniqId
    public class UniqId {
        private static final Logger            log       = MsgBrokerClientLoggerFactory.getLogger(UniqId.class);
        private static final String            LogPrefix = LoggerPrefix.makeLogPrefix(UniqId.class);

        private static char[]                  digits    = { '0', '1', '2', '3', '4', '5', '6', '7',
                '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'  };

        private static Map<Character, Integer> rDigits   = new HashMap<Character, Integer>(16);
        static {
            for (int i = 0; i < digits.length; ++i) {
                rDigits.put(digits[i], i);
            }
        }

        private static UniqId                  me        = new UniqId();
        private String                         hostAddr;
        // 实际问题在于这里了, StrongRandom 的问题
        private Random                         random    = new StrongRandom();
        private MessageDigest                  mHasher;
        private UniqTimer                      timer     = new UniqTimer();

        private ReentrantLock                  opLock    = new ReentrantLock();

        private UniqId() {
            try {
                InetAddress addr = InetAddress.getLocalHost();

                hostAddr = addr.getHostAddress();
            } catch (IOException e) {
                log.error(LogPrefix + "Get HostAddr Error", e);
                hostAddr = String.valueOf(System.currentTimeMillis());
            }

            if (Util.isBlank(hostAddr) || "127.0.0.1".equals(hostAddr)) {
                hostAddr = String.valueOf(System.currentTimeMillis());
            }

            if (log.isDebugEnabled()) {
                log.debug(LogPrefix + "hostAddr is:" + hostAddr);
            }

            try {
                mHasher = MessageDigest.getInstance("MD5");
            } catch (NoSuchAlgorithmException nex) {
                mHasher = null;
                log.error(LogPrefix + "new MD5 Hasher error", nex);
            }
        }

        /**
         * ��ȡUniqIDʵ��
         * 
         * @return UniqId
         */
        public static UniqId getInstance() {
            return me;
        }
    }
    
    // StrongRandom 生成缓慢
    // com.taobao.notify.utils.StrongRandom
    public StrongRandom() {
        try {
            // This operation may block on some systems with low entropy. See
            // this page
            // for workaround suggestions:
            // http://docs.codehaus.org.display.JETTY.Connectors+slow+to+startup
            random = SecureRandom.getInstance(SESSION_ID_RANDOM_ALGORITHM);
        } catch (NoSuchAlgorithmException e) {
            try {
                random = SecureRandom.getInstance(SESSION_ID_RANDOM_ALGORITHM_ALT);
                weakRandom = false;
            } catch (NoSuchAlgorithmException e_alt) {
                random = new Random();
                weakRandom = true;
            }
        }
        // 就是这里缓慢了,卡住大约一分钟,且没法调试
        random.setSeed(random.nextLong() ^ System.currentTimeMillis() ^ hashCode()
                       ^ Runtime.getRuntime().freeMemory());
    }
    

  最后,定位到是 随机数生成有问题了!

 

8. 定位到点后,再来思考解决好的解决方案?

  如上,咱们知道了是由于 SecureRandom 生成随机数时出现了问题! 

  那么,如何解决?我能想到的,就是网上搜索答案了。由于这时候已经没有什么能够参考的了。

  搜索内容就是, SecureRandom 生成随机缓慢解决方案? 而后就有答案了!

  固然,你能够咨询遇到过这些问题的前辈们,那样更快速!

  缘由以下:

java生成随机码时,会使用两个文件:
1. /dev/random , 随机性高,和真实的物理环境有关,阻塞模式。(本文出现的启动缓慢问题,就出在这个上面)
2. /dev/urandom ,伪随机模式,非阻塞,随机性不如 random。
因此,既然是生成random模式有问题,那么想办法换掉这个模式就ok了。

  jdk中,提供相应的替换方法,咱们先看 $JAVA_HOME/jre/lib/security/java.security 这个里面有个随机数的配置,其说明以下:

#
# Sun Provider SecureRandom seed source.
#
# Select the primary source of seed data for the "SHA1PRNG" and
# "NativePRNG" SecureRandom implementations in the "Sun" provider.
# (Other SecureRandom implementations might also use this property.)
#
# On Unix-like systems (for example, Solaris/Linux/MacOS), the
# "NativePRNG" and "SHA1PRNG" implementations obtains seed data from
# special device files such as file:/dev/random.
#
# On Windows systems, specifying the URLs "file:/dev/random" or
# "file:/dev/urandom" will enable the native Microsoft CryptoAPI seeding
# mechanism for SHA1PRNG.
#
# By default, an attempt is made to use the entropy gathering device
# specified by the "securerandom.source" Security property.  If an
# exception occurs while accessing the specified URL:
#
#     SHA1PRNG:
#         the traditional system/thread activity algorithm will be used.
#
#     NativePRNG:
#         a default value of /dev/random will be used.  If neither
#         are available, the implementation will be disabled.
#         "file" is the only currently supported protocol type.
#
# The entropy gathering device can also be specified with the System
# property "java.security.egd". For example:
#
#   % java -Djava.security.egd=file:/dev/random MainClass
#
# Specifying this System property will override the
# "securerandom.source" Security property.
#
# In addition, if "file:/dev/random" or "file:/dev/urandom" is
# specified, the "NativePRNG" implementation will be more preferred than
# SHA1PRNG in the Sun provider.
#
securerandom.source=file:/dev/random

  即 能够直接改这个文件,也能够经过命令行加参数修改,且命令行优先级更高,因此咱们使用命令行修改便可:

    -Djava.security.egd=file:/dev/./urandom # 在项目中添加java_opts,里面添加启动参数 urandom 

  如此,再次验证后,启动正常了。问题done。

 

9. 除了解决这个问题,咱们还能思考点什么?

  关于 SecureRandom 为何慢的缘由,能够看下这篇文章:  https://blog.csdn.net.zengdeqing2012/article.details/78133370

 

老话:说了不必定有机会,但不说必定没机会。

相关文章
相关标签/搜索