分析Thread Dump

当Java Web程序运行变慢,或者发生故障时,须要使用Thread Dumps. 若是你以为ThreadDumps很是复杂,这篇文章极可能帮助你。将会分析Java中的线程,线程如何建立的,如何管理线程,怎么从运行中的程序中dump 线程,最后怎么分析他们获得阻塞和存在瓶颈的线程。本文是在应用程序调试下获得的结果。java

    Java和线程linux

    一个web server使用几十到几百条线程去处理大量的并发用户。若是多条线程使用共享的资源,没法避免线程之间对数据的竞争,有时候还会发生死锁。web

    线程竞争是web程序上不一样的线程去访问共享资源,一条线程等待另外线程释放锁。例如,在记录log的时候,线程记录log时,必须先得到锁,而后去再访问共享资源。数据库

    死锁是一种特殊的线程竞争,两个或多个线程要完成本身的任务,都要必需要等待其余的线程完成他们的任务。apache

    线程竞争会带来各类不一样的问题,为了分析这些问题,须要使用Thread Dump。Thread Dump记录了每一个线程真正的状态。多线程

    Java线程的背景并发

    线程同步oracle

     多条线程之间能够同时执行,为了确保多线程在使用共享资源上面的通用性,使用线程同步保证在同一时间只能有一条线程能够访问共享资源。app

     线程同步在Java中可使用监视器。每一个Java对象都有一个监视器,这个监视器只能被一个线程拥有。当一个线程要得到另外线程拥有的监视器时,须要进入等待队列直到线程释放监视器。socket

    线程的状态

    为了分析Thread Dump ,须要先了解线程的状态。线程的状态是在java.lang.Thread.State中。

NEW:线程被建立可是尚未被执行

RUNNABLE:线程正在占用cpu而且在执行任务

BLOCKED:线程为了得到监视器须要等待其余线程释放锁

WAITING:调用了wait,join,park方法使线程等待-无限期等待

TIMED_WAITING:调用了sleep,wait,join,park方法使线程等待--有限期等待

线程类型

java中线程能够分为两种:

    1.    后台线程

    2.    非后台线程

当没有其余的非后台线程运行时后台线程将会终止。即便你不建立线程,java应用默认也会建立不少线程。这些大多数都是后台线程,主要为了执行gc或者jmx等类型的任务

从 'static void main(String[] args)’方法中开启的线程叫作非后台线程,当这些线程中止时,其余的全部后台线程也会中止()

得到一个Thread Dump

将会介绍三种经常使用的方法。请注意还会有其余不少方法能够获取Thread Dump。一个Thread dump仅仅能够显示测量时的线程状态。因此为了查看线程状态的变化,建议5到10次,每次间隔5秒。

使用jstack得到Thread Dump

经过使用jps命令来得到当前正在运行的Java程序的PID

[user@linux ~]$ jps -v
 
25780 RemoteTestRunner -Dfile.encoding=UTF-8
25590 sub.rmi.registry.RegistryImpl 2999 -Dapplication.home=/home1/user/java/jdk.1.6.0_24 -Xms8m
26300 sun.tools.jps.Jps -mlvV -Dapplication.home=/home1/user/java/jdk.1.6.0_24 -Xms8m

使用PID做为jstack的参数得到Thread Dump

[user@linux ~]$ jstack -f 5824

在Linux终端中生成

经过使用ps -ef命令去得到当前正在运行的Java进程

[user@linux ~]$ ps - ef | grep java

user      2477          1    0 Dec23 ?         00:10:45 ...

user    25780 25361   0 15:02 pts/3    00:00:02 ./jstatd -J -Djava.security.policy=jstatd.all.policy -p 2999

user    26335 25361   0 15:49 pts/3    00:00:00 grep java

Use the extracted pid as the parameter of kill –SIGQUIT(3) to obtain a thread dump.

Thread Information from the Thread Dump File

"pool-1-thread-13" prio=6 tid=0x000000000729a000 nid=0x2fb4 runnable [0x0000000007f0f000] java.lang.Thread.State: RUNNABLE
              at java.net.SocketInputStream.socketRead0(Native Method)
              at java.net.SocketInputStream.read(SocketInputStream.java:129)
              at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
              at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
              at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
              - locked <0x0000000780b7e688> (a java.io.InputStreamReader)
              at java.io.InputStreamReader.read(InputStreamReader.java:167)
              at java.io.BufferedReader.fill(BufferedReader.java:136)
              at java.io.BufferedReader.readLine(BufferedReader.java:299)
              - locked <0x0000000780b7e688> (a java.io.InputStreamReader)
              at java.io.BufferedReader.readLine(BufferedReader.java:362)

Thread name:当使用 Java.lang.Thread类去生成一个线程,将被命名为Thre-(Number),然而当使用java.util.concurrent,ThreadFactory类,将会被命名为pool-(Number)-thread-(Number)

Priority:表示线程的优先级

Thread ID:表明线程的惟一id。(经过线程id能够得到一些有用的信息,包括cpu使用率,或者内存使用率)

Thread status:表明线程的状态

Thread callstack:表明线程调用的堆栈信息

Thread Dump模式的类型

当没法得到一个锁(阻塞)

当一个线程占领住锁而其余线程没法得到这个锁,而致使应用程序全部的性能都降低。在下面的例子中,

BLOCKED_TEST pool-1-thread-1 线程运行时得到<0x0000000780a000b0>锁, 同时BLOCKED_TEST pool-1-thread-2 和 BLOCKED_TEST pool-1-thread-3正在等待得到<0x0000000780a000b0>锁

"BLOCKED_TEST pool-1-thread-1" prio=6 tid=0x0000000006904800 nid=0x28f4 runnable [0x000000000785f000]
   java.lang.Thread.State: RUNNABLE
                at java.io.FileOutputStream.writeBytes(Native Method)
                at java.io.FileOutputStream.write(FileOutputStream.java:282)
                at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
                at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
                - locked <0x0000000780a31778> (a java.io.BufferedOutputStream)
                at java.io.PrintStream.write(PrintStream.java:432)
                - locked <0x0000000780a04118> (a java.io.PrintStream)
                at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
                at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
                at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:85)
                - locked <0x0000000780a040c0> (a java.io.OutputStreamWriter)
                at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:168)
                at java.io.PrintStream.newLine(PrintStream.java:496)
                - locked <0x0000000780a04118> (a java.io.PrintStream)
                at java.io.PrintStream.println(PrintStream.java:687)
                - locked <0x0000000780a04118> (a java.io.PrintStream)
                at com.nbp.theplatform.threaddump.ThreadBlockedState.monitorLock(ThreadBlockedState.java:44)
                - locked <0x0000000780a000b0> (a com.nbp.theplatform.threaddump.ThreadBlockedState)
                at com.nbp.theplatform.threaddump.ThreadBlockedState$1.run(ThreadBlockedState.java:7)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)
   Locked ownable synchronizers:
                - <0x0000000780a31758> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
"BLOCKED_TEST pool-1-thread-2" prio=6 tid=0x0000000007673800 nid=0x260c waiting for monitor entry [0x0000000008abf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at com.nbp.theplatform.threaddump.ThreadBlockedState.monitorLock(ThreadBlockedState.java:43)
                - waiting to lock <0x0000000780a000b0> (a com.nbp.theplatform.threaddump.ThreadBlockedState)
                at com.nbp.theplatform.threaddump.ThreadBlockedState$2.run(ThreadBlockedState.java:26)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)
   Locked ownable synchronizers:
                - <0x0000000780b0c6a0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
"BLOCKED_TEST pool-1-thread-3" prio=6 tid=0x00000000074f5800 nid=0x1994 waiting for monitor entry [0x0000000008bbf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at com.nbp.theplatform.threaddump.ThreadBlockedState.monitorLock(ThreadBlockedState.java:42)
                - waiting to lock <0x0000000780a000b0> (a com.nbp.theplatform.threaddump.ThreadBlockedState)
                at com.nbp.theplatform.threaddump.ThreadBlockedState$3.run(ThreadBlockedState.java:34)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)
   Locked ownable synchronizers:
                - <0x0000000780b0e1b8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

当是死锁的状态

线程A须要得到线程B的锁才能继续执行任务,同时线程B须要得到线程A的锁才能继续执行任务。在Thread Dump中,能够发现 DEADLOCK_TEST-1 线程拥有0x00000007d58f5e48锁,而且试着去获取0x00000007d58f5e60这把锁。另外  DEADLOCK_TEST-2 线程拥有0x00000007d58f5e60锁,而且尝试获取0x00000007d58f5e78锁。,DEADLOCK_TEST-3 线程拥有0x00000007d58f5e78锁,而且尝试得到0x00000007d58f5e48锁。能够看得出来,每一个线程都在等待另外线程的锁,这种状态知道一个线程放弃锁以前都不会被改变。

"DEADLOCK_TEST-1" daemon prio=6 tid=0x000000000690f800 nid=0x1820 waiting for monitor entry [0x000000000805f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.goMonitorDeadlock(ThreadDeadLockState.java:197)
                - waiting to lock <0x00000007d58f5e60> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.monitorOurLock(ThreadDeadLockState.java:182)
                - locked <0x00000007d58f5e48> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.run(ThreadDeadLockState.java:135)
   Locked ownable synchronizers:
                - None
"DEADLOCK_TEST-2" daemon prio=6 tid=0x0000000006858800 nid=0x17b8 waiting for monitor entry [0x000000000815f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.goMonitorDeadlock(ThreadDeadLockState.java:197)
                - waiting to lock <0x00000007d58f5e78> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.monitorOurLock(ThreadDeadLockState.java:182)
                - locked <0x00000007d58f5e60> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.run(ThreadDeadLockState.java:135)
   Locked ownable synchronizers:
                - None
"DEADLOCK_TEST-3" daemon prio=6 tid=0x0000000006859000 nid=0x25dc waiting for monitor entry [0x000000000825f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.goMonitorDeadlock(ThreadDeadLockState.java:197)
                - waiting to lock <0x00000007d58f5e48> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.monitorOurLock(ThreadDeadLockState.java:182)
                - locked <0x00000007d58f5e78> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.run(ThreadDeadLockState.java:135)
   Locked ownable synchronizers:
                - None

持续等待来自远程服务的信息

线程看起来是正常的,由于它的状态一直都是RUNNABLE,然而当你将thread dump按时间有序的排列,你能够看出来socketReadThread线程一直在读socket

"socketReadThread" prio=6 tid=0x0000000006a0d800 nid=0x1b40 runnable [0x00000000089ef000]
   java.lang.Thread.State: RUNNABLE
                at java.net.SocketInputStream.socketRead0(Native Method)
                at java.net.SocketInputStream.read(SocketInputStream.java:129)
                at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
                at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
                at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
                - locked <0x00000007d78a2230> (a java.io.InputStreamReader)
                at sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:107)
                - locked <0x00000007d78a2230> (a java.io.InputStreamReader)
                at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:93)
                at java.io.InputStreamReader.read(InputStreamReader.java:151)
                at com.nbp.theplatform.threaddump.ThreadSocketReadState$1.run(ThreadSocketReadState.java:27)
                at java.lang.Thread.run(Thread.java:662)

当等待

线程保持wait状态,在thread dump中,ioWaitThread线程等待从LinkedBlockingQueue中接收信息,若是LinkedBlockQueue一直没有信息,那么线程的状态将不会改变。

"IoWaitThread" prio=6 tid=0x0000000007334800 nid=0x2b3c waiting on condition [0x000000000893f000]
   java.lang.Thread.State: WAITING (parking)
                at sun.misc.Unsafe.park(Native Method)
                - parking to wait for  <0x00000007d5c45850> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
                at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
                at java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:440)
                at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:629)
                at com.nbp.theplatform.threaddump.ThreadIoWaitState$IoWaitHandler2.run(ThreadIoWaitState.java:89)
                at java.lang.Thread.run(Thread.java:662)

当线程资源不能正常组织

当线程的资源没法正常组织,不用的线程将会堆积起来。若是发生了,建议监听下线程组织的过程或者检查线程终止的条件

如何利用Thread Dump解决问题

例子1:当cpu使用率不正常的偏高

    1 提取最高cpu使用率的线程

[user@linux ~]$ ps -mo pid.lwp.stime.time.cpu -C java
 
     PID         LWP          STIME                  TIME        %CPU
10029               -         Dec07          00:02:02           99.5
         -       10039        Dec07          00:00:00              0.1
         -       10040        Dec07          00:00:00           95.5

从应用中找出cpu使用率的线程

得到轻量级锁使用的cpu率,并将LWP对应的数字(10039) 转换成十六进制(0x2737)

    2 得到thread dump后,检测线程的状态

    提取出应用中thread dump中pid为10029 ,再找出nid为0x2737的线程

"NioProcessor-2" prio=10 tid=0x0a8d2800 nid=0x2737 runnable [0x49aa5000]
java.lang.Thread.State: RUNNABLE
                at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
                at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
                at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
                at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
                - locked <0x74c52678> (a sun.nio.ch.Util$1)
                - locked <0x74c52668> (a java.util.Collections$UnmodifiableSet)
                - locked <0x74c501b0> (a sun.nio.ch.EPollSelectorImpl)
                at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
                at external.org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65)
                at external.org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:708)
                at external.org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)

    每隔一小时提取几回,而后分析线程的状态以肯定问题。

例2:当性能不正常的降低

    在得到thread dump后,查询下面一组线程的状态是BLOCKED

" DB-Processor-13" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f000]
java.lang.Thread.State: BLOCKED (on object monitor)
                at beans.ConnectionPool.getConnection(ConnectionPool.java:102)
                - waiting to lock <0xe0375410> (a beans.ConnectionPool)
                at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111)
                at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43)
"DB-Processor-14" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f020]
java.lang.Thread.State: BLOCKED (on object monitor)
                at beans.ConnectionPool.getConnection(ConnectionPool.java:102)
                - waiting to lock <0xe0375410> (a beans.ConnectionPool)
                at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111)
                at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43)
" DB-Processor-3" daemon prio=5 tid=0x00928248 nid=0x8b waiting for monitor entry [0x000000000825d080]
java.lang.Thread.State: RUNNABLE
                at oracle.jdbc.driver.OracleConnection.isClosed(OracleConnection.java:570)
                - waiting to lock <0xe03ba2e0> (a oracle.jdbc.driver.OracleConnection)
                at beans.ConnectionPool.getConnection(ConnectionPool.java:112)
                - locked <0xe0386580> (a java.util.Vector)
                - locked <0xe0375410> (a beans.ConnectionPool)
                at beans.cus.Cue_1700c.GetNationList(Cue_1700c.java:66)
                at org.apache.jsp.cue_1700c_jsp._jspService(cue_1700c_jsp.java:120)

If the threads are BLOCKED, extract the threads related to the lock that the threads are trying to obtain.

经过thread dump,能够确认处在BLOCKED状态的线程,由于<0xe0375410>锁没法被得到,这种问题能够目前持有锁的堆栈跟踪中解决问题。

在使用dbms时,有两个缘由会致使上面描述的问题频繁发生。第一个缘由是:

不适当的配置:尽管这些线程仍然在工做,由于数据库链接池的不适当配置,使得没法显示出最好的性能。经过屡次收集分析thread dumps,将会发现一些以前处于BLOCKED状态的线程会处于不一样的状态。

第二个缘由是:不正当的链接。若是和数据库的链接保持异常,那么线程会一直等待超时。在这个例子中,经过屡次收集分析thread dumps,将会看到和数据库有关的线程会一直保持BLOCKED状态。经过适当的参数,好比超时时间,能够缩短问题发生的时间

简单的Thread Dump编码:

线程的命名

使用java.lang.Thread 类来建立一个线程对象,将被命名为Thread-(Number)。当使用java.util.concurrent.DefaultThreadFactory对象建立线程,将被命名为pool-(Number)-thread-(Number)。在分析应用中几十到几百的线程时,若是全部的线程都使用默认的名字,分析会变得很是困难,很难区别出线程。

所以,能够养成在建立线程时为线程命名的好习惯

经过java.lang.Thread建立线程,能够给线程构造参数中传入自定义的名

public Thread(Runnable target, String name);
public Thread(ThreadGroup group, String name);
public Thread(ThreadGroup group, Runnable target, String name);
public Thread(ThreadGroup group, Runnable target, String name, long stackSize);

经过java.util.concurrent.ThreadFactory建立线程,能够取本身ThredFactory的名。若是不须要特殊的功能,可使用下面的MyThreadFactory

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.atomic.AtomicInteger;
 
public class MyThreadFactory implements ThreadFactory {
  private static final ConcurrentHashMap<String, AtomicInteger> POOL_NUMBER =
                                                       new ConcurrentHashMap<String, AtomicInteger>();
  private final ThreadGroup group;
  private final AtomicInteger threadNumber = new AtomicInteger(1);
  private final String namePrefix;
  
  public MyThreadFactory(String threadPoolName) {
       
      if (threadPoolName == null) {
          throw new NullPointerException("threadPoolName");
      }
            POOL_NUMBER.putIfAbsent(threadPoolName, new AtomicInteger());
       
      SecurityManager securityManager = System.getSecurityManager();
      group = (securityManager != null) ? securityManager.getThreadGroup() :
                                                    Thread.currentThread().getThreadGroup();
       
      AtomicInteger poolCount = POOL_NUMBER.get(threadPoolName);
 
      if (poolCount == null) {
            namePrefix = threadPoolName + " pool-00-thread-";
      } else {
            namePrefix = threadPoolName + " pool-" + poolCount.getAndIncrement() + "-thread-";
      }
  }
  
  public Thread newThread(Runnable runnable) {
      Thread thread = new Thread(group, runnable, namePrefix + threadNumber.getAndIncrement(), 0);
 
      if (thread.isDaemon()) {
            thread.setDaemon(false);
      }
 
      if (thread.getPriority() != Thread.NORM_PRIORITY) {
            thread.setPriority(Thread.NORM_PRIORITY);
      }
 
      return thread;
  }
}

经过MBean来得到更多的信息

能够经过MBean来得到更多的ThreadInfo的信息。一样使用ThreadInfo也能够得到更多的在thread dumps中难以获取的信息。

ThreadMXBean mxBean = ManagementFactory.getThreadMXBean();
long[] threadIds = mxBean.getAllThreadIds();
ThreadInfo[] threadInfos =
                mxBean.getThreadInfo(threadIds);
 
for (ThreadInfo threadInfo : threadInfos) {
  System.out.println(
      threadInfo.getThreadName());
  System.out.println(
      threadInfo.getBlockedCount());
  System.out.println(
      threadInfo.getBlockedTime());
  System.out.println(
      threadInfo.getWaitedCount());
  System.out.println(
      threadInfo.getWaitedTime());
}

使用ThreaInfo中的信息能够得到等待线程或者阻塞线程所用的时间,也能够得到已经被停用了的异常线程的列表

相关文章
相关标签/搜索