问题描述php
生产环境下有几台tomcat,但忽然某个时候发现全部的请求都不能响应了,因为咱们的web server使用的是nginx,会将请求反向到tomcat上,因此起初怀疑是nginx就没有收到请求,但查看日志后发现,nginx中大量出现499的返回,这说明问题仍是出在tomcat上.java
问题排查nginx
首先我想到的是否是CPU跑满了,虽然说CPU没有报警但仍是本能的top命令看下系统负载,发现系统只有0.x的负载,cpu,内存消耗都是正常的.
因为CPU没有出现异常,因此应该不是GC出现了问题,但仍是检查了下GC log,果真GC也没问题
此时必须让jstack上场了,果真在使用jstack后发现不少线程都是WAITING状态web
"http-nio-127.0.0.1-801-exec-498" daemon prio=10 tid=0x00002ada7c14f800 nid=0x16a6 waiting on condition [0x00002ada9c905000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007873e6990> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at org.apache.http.pool.PoolEntryFuture.await(PoolEntryFuture.java:133)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:282)
at org.apache.http.pool.AbstractConnPool.access$000(AbstractConnPool.java:64)
at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:177)
at org.apache.http.pool.AbstractConnPool$2.getPoolEntry(AbstractConnPool.java:170)
at org.apache.http.pool.PoolEntryFuture.get(PoolEntryFuture.java:102)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:240)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:227)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:173)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:85)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at com.weimai.utils.HttpClientUtil.doGet(HttpClientUtil.java:105)
at com.weimai.utils.HttpClientUtil.doGet(HttpClientUtil.java:87)
at com.weimai.utils.WeiBoUtil.checkUser(WeiBoUtil.java:214)
at com.weimai.web.UserInfoController.newWeiboLogin(UserInfoController.java:1223)
at sun.reflect.GeneratedMethodAccessor390.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
...
此时意识到问题应该出现http链接上,立刻用netstat查看下801端口的链接状态,果真发现不少请求都是CLOSE_WAIT,这里简单解释下CLOSE_WAIT状态,若是咱们的client程序处于CLOSE_WAIT状态的话,说明套接字是被动关闭的,整个流程应该是这样apache
由于若是是server端主动断掉当前链接的话,那么双方关闭这个TCP链接共须要四个packet
server -> FIN -> client
server <- ACK <- client
这时候server端处于FIN_WAIT_2状态,而咱们的程序处于CLOSE_WAIT状态
server <- FIN <- client
这时client发送FIN给server,client就置为LAST_ACK状态。
server -> ACK -> client
server回应了ACK,那么client的套接字才会真正置为CLOSED状态
咱们的请求处于CLOSE_WAIT状态,而不是LAST_ACK状态,说明尚未发FIN给server,那么很简单,去看HttpClientUtil中如何处理就知道了,果真在查看HttpClientUtil代码中发现对于非正常关闭的http链接没有作abort,补充完善好try catch finally块后问题获得解决.tomcat