支付清结算系统"外部对帐"部分任务在早上6:00~7:00时间段内对帐速度缓慢,且常常形成对帐统计结果不许确问题,致使结转记帐不许确。java
因为外部对帐是通过性能优化的,且在压测环境表现良好,因此最开始初步认为是因为数据库性能瓶颈致使,初步处理方式为将线上6:00~7:00以前的慢SQL进行了一番排查,并对涉及外部系统的慢SQL进行了优化,这个步骤理论上是能够解决此类问题,但RDS(阿里云Mysql可能存在慢SQL统计不许确问题)。mysql
特定时间段线程栈分析,因为SQL优化部分并无解决此类问题,为了查看在早上6:00~7:00这个时间段内程序JVM内部到底发生了什么,因此采起了定时抓取线程栈日志的手段(具体方式为在Linux服务器添加cron任务,并累计输出日志的方式,脚本如👇)。git
#clearjstack.shgithub
#!/bin/sh
DATE='/bin/date'
timestamp() {
$DATE +'%Y-%m-%d %H:%M:%S'
}
ts=`timestamp`
echo "start log $ts" >> /tmp/clearjstack.log
/usr/java/jdk1.7.0_67/bin/jstack -l `/usr/java/jdk1.7.0_67/bin/jps -lvm | grep tomcat-clear | awk '{print $1}'` >> /tmp/clearjstack.log
ts=`timestamp`
echo "end log $ts" >> /tmp/clearjstack.log
复制代码
Linux下添加定时任务的方式:spring
//添加计划任务
[root@123.57.28.241/10.172.233.8 tmp]cd /var/spool/cron/
[root@123.57.28.241/10.172.233.8 cron]pwd
/var/spool/cron/具体用户名(如为root用户)
[root@123.57.28.241/10.172.233.8 cron]vim root
*/5 * * * * /bin/sh /tmp/clearjstack.sh > /dev/null 2>&1
复制代码
//查看计划任务
[root@123.57.28.241/10.172.233.8 cron]crontab -l
0 0 * * * /usr/local/script/Cutlog.sh >/dev/null 2>&1
*/5 * * * * /bin/sh /tmp/clearjstack.sh > /dev/null 2>&1
复制代码
#clearjstack.log
start log 2016-11-24 06:20:01
2016-11-24 06:20:02
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode):
"Thread-70" prio=10 tid=0x00007f21f4022800 nid=0x12da runnable [0x00007f216cace000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
- locked <0x0000000786facbc8> (a com.mysql.jdbc.util.ReadAheadInputStream)
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
at com.alibaba.druid.pool.DruidPooledPreparedStatement.execute(DruidPooledPreparedStatement.java:493)
at sun.reflect.GeneratedMethodAccessor169.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(PreparedStatementLogger.java:55)
at com.sun.proxy.$Proxy84.execute(Unknown Source)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:56)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:70)
at org.apache.ibatis.executor.ReuseExecutor.doQuery(ReuseExecutor.java:54)
at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:259)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:132)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:115)
at sun.reflect.GeneratedMethodAccessor165.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.ibatis.plugin.Invocation.proceed(Invocation.java:46)
at com.github.miemiedev.mybatis.paginator.OffsetLimitInterceptor.intercept(OffsetLimitInterceptor.java:66)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:57)
at com.sun.proxy.$Proxy82.query(Unknown Source)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:104)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:98)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:62)
at sun.reflect.GeneratedMethodAccessor203.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:358)
at com.sun.proxy.$Proxy29.selectOne(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.selectOne(SqlSessionTemplate.java:163)
at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:63)
at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:43)
at com.sun.proxy.$Proxy87.countOrgSettleFlow(Unknown Source)
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl.isExistOrgSettleFlow(OrgSettleFlowServiceImpl.java:45)
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl$$FastClassBySpringCGLIB$$c921c9ca.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:700)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:96)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:260)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:94)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:633)
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl$$EnhancerBySpringCGLIB$$6b2e2fc1.isExistOrgSettleFlow(<generated>)
at com.ninefbank.smallpay.clear.util.FeeCountUtil.createOrgSettleFlow(FeeCountUtil.java:235)
- locked <0x000000078c795ed8> (a java.lang.Class for com.ninefbank.smallpay.clear.util.FeeCountUtil)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveTaskErrorInfo(OuterReconWriter.java:126)
- locked <0x0000000788061c30> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:70)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x000000078deebbf0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"Thread-69" prio=10 tid=0x00007f21f401c000 nid=0x12d9 waiting for monitor entry [0x00007f216ccd1000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveReconData(OuterReconWriter.java:169)
- waiting to lock <0x0000000788061c30> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:68)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x000000078defb298> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"Thread-68" prio=10 tid=0x00007f21f401b000 nid=0x12d8 waiting for monitor entry [0x00007f216c7cc000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveReconData(OuterReconWriter.java:169)
- waiting to lock <0x0000000788061c30> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:68)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x000000078deec880> (a java.util.concurrent.ThreadPoolExecutor$Worker)
复制代码
从上述日志分析线程Thread-6八、Thread-69都为阻塞状态,且阻塞状态sql
smallpay.clear.writer.OuterReconWriter.saveReconData(OuterReconWriter.java:169)
复制代码
所对应的代码主要体如今对帐结果写入部分,另外从日志数据库
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
复制代码
因此此时很容易以为问题出如今链接池与MySQL数据库自己的问题上。按照这个思路(JVM日志很丰富,去伪存真,每每须要尝试与不断试错),开始了对链接池自己及MySQL自己问题的排查(Google),如下为几种Google结果连接及缘由概述。apache
1)、网络缘由&MySQL设置&换链接池vim
在以上连接所述的缘由中,重点是说了MySQL及网络方面的缘由。针对这种叙述,与运维大哥沟通后调取了阿里云RDS MySQL实例的参数设置,以下:tomcat
show global variables like '%timeout%'
复制代码
最后一个参数wait_timeout代表Mysql服务器与客户端链接空闲自动断开的超时时间为24小时,也就是说只有链接在与MySQL服务器链接后24小时内无任何操做,MySQL服务器才会主动断开与客户端(这里为应用链接池中与MySQL数据库创建的链接),然而对于应用来讲,这种可能性基本不存在,何况大多数链接池配置都有对链接是否可用以及自动探测链接的配置。
查看应用链接池配置:
<property name="timeBetweenEvictionRunsMillis" value="3000" />
<property name="minEvictableIdleTimeMillis" value="300000" />
<property name="validationQuery" value="SELECT 'x'" />
<property name="testWhileIdle" value="true" />
<property name="testOnBorrow" value="true" />
<property name="testOnReturn" value="true" />
<property name="poolPreparedStatements" value="false" />
<property name="maxPoolPreparedStatementPerConnectionSize" value="-1" />
复制代码
按照上述配置基本上不会出现链接长时间与MySQL服务器无任何交互的可能性,除非链接池自己的处理存在问题(druid链接池为阿里高性能数据库链接池),可是出于怀疑的态度,按照连接中的说法druid在处理长时间链接等待方面可能不如c3p0好,也不排除druid自己存在问题。但出于谨慎角度,并无马上采用对链接池进行替换的措施,而是将焦点转移到应用服务器与MySQL服务器网络链接上(也确实有人是由于网络防火墙设置致使出现此类问题的)。
2)、网络链接参数调整
因为阿里云底层网络结构并不对用户透明,且遇到问题的应用存在批量大数据插入的使用场景,确实是存在TCP缓冲区不够的可能性,经过对MySQL官方文档相关章节的阅读,MySQL链接串参数中确实提供了tcpRcvBuf这样的参数,其对该参数的说明是:“connecting using TCP/IP, should the driver set SO_RCV_BUF to the given value? The default value of '0', means use the platform default value for this property)”。 因而以为这种可能性会相对大些,因此在线上调整了链接参数而并无更换链接池(如若不行,可再尝试替换链接池),参数调整以下:
jdbc:mysql://xxx.xxx.xx.xx:3306/xx?useUnicode=true&characterEncoding=UTF-8&tcpRcvBuf=1024000&autoReconnect=true&failOverReadOnly=false&connectTimeout=0
复制代码
链接TCP缓冲区的大小被调整为1M。
在4中经过对网络链接参数的调整后,并无起到做用,因而对线程栈日志再次进行了调取,并进行了地毯式阅读,终于发现了问题的症结:
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl$$EnhancerBySpringCGLIB$$5775a83.isExistOrgSettleFlow(<generated>)
at com.ninefbank.smallpay.clear.util.FeeCountUtil.createOrgSettleFlow(FeeCountUtil.java:235)
- locked <0x000000078b4d8e98> (a java.lang.Class for com.ninefbank.smallpay.clear.util.FeeCountUtil)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveTaskErrorInfo(OuterReconWriter.java:126)
- locked <0x00000007880399f0> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:70)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterPara
复制代码
经过对比多日问题时段的线程栈日志,发现都存在两处代码都指向同一个方法,即isExistOrgSettleFlow,因而找到代码所对应的方法,发现是因为该方法高频访问了一个未加相关索引的大数据表,致使方法自己的执行存在阻塞及锁住的状况,占用了大量链接资源,从而致使4中所看到的其余方法拿不到数据库链接出现长时间等待,甚至超时的状况。加上索引后,隔日观察问题消失。
对陌生问题的分析,须要经过各种分析手段从而找到问题所在并进行解决(这其中须要运用各种分析工具Linux命令、JDK命令、网络命令等等);
要心存思辨之心,对传统的权威的东西在敬畏的同时也要大胆的怀疑并进行分析、尝试,切不可畏惧不前;
在编码方面,对代码类、方法的命名必定要合理、规范,否则往后出现疑难问题在排查时,会给人形成错觉从而延长解决时间;