咱们线上5台solr读服务器,配置同样,可是相同的请求,其中一台响应时间明显比其余4台慢,咱们想经过arthas来定位具体哪里执行慢。html
阿里开源的java调试工具,能解决以下的问题:java
从咱们实际使用经验来讲,能够解决两个常见的问题:git
咱们分别在2台测试环境和1台线上环境测试arthas,发如今2台测试环境中出现以下两个问题:github
建议: 上述2问题,在线上生产环境没遇到,可是不建议在线上正常运行环境中使用arthas;但线上java程序遇到重大问题,能够考虑使用arthas快速定位问题,可是前提是能接受咱们遇到的两个坑。apache
# 安装 curl -L https://alibaba.github.io/arthas/install.sh | sh # 若是无法正常安装,提示SSL connect error,执行:yum -y update nss,更新nss # 打开 ./as.sh # 若是java进程使用非root帐户启动,如solr用户启动,则使用以下启动脚本 sudo -u solr -EH ./as.sh # 查看cpu占用高的3个线程 相似top -H,可是能看到堆栈信息 thread -n 3 # 跟踪某个方法的调用时间 trace 包名.类名 方法名 # 退出arthas exit
solr全部请求入口都是org.apache.solr.handler.StandardRequestHandler,咱们跟踪此入口,可是arthas无法跟踪到完整代码堆栈,尤为是遇到抽象方法之类,得从新跟踪。tomcat
# 启动 [root@ arthas]# sudo -u solr -EH ./as.sh Arthas script version: 3.0.4.2 ERROR: '' does not exist. JAVA_HOME: /usr/local/java/jdk1.8.0_05 Found existing java process, please choose one and hit RETURN. * [1]: 10433 org.apache.catalina.startup.Bootstrap [2]: 10786 org.apache.catalina.startup.Bootstrap [3]: 11170 org.apache.catalina.startup.Bootstrap [4]: 11372 org.apache.catalina.startup.Bootstrap [5]: 10606 org.apache.catalina.startup.Bootstrap [6]: 10383 org.apache.catalina.startup.Bootstrap [7]: 64118 org.apache.catalina.startup.Bootstrap [8]: 10967 org.apache.catalina.startup.Bootstrap [9]: 11037 org.apache.catalina.startup.Bootstrap [10]: 77756 org.apache.catalina.startup.Bootstrap [11]: 11646 org.apache.catalina.startup.Bootstrap [12]: 11487 org.apache.catalina.startup.Bootstrap # 选择定位问题的进程 7 Calculating attach execution time... Attaching to 64118 using version 3.0.4... real 0m0.294s user 0m1.354s sys 0m0.143s Attach success. telnet connecting to arthas server... current timestamp is 1542786758 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. ,---. ,------. ,--------.,--. ,--. ,---. ,---. / O \ | .--. ''--. .--'| '--' | / O \ ' .-' | .-. || '--'.' | | | .--. || .-. |`. `-. | | | || |\ \ | | | | | || | | |.-' | `--' `--'`--' '--' `--' `--' `--'`--' `--'`-----' wiki: https://alibaba.github.io/arthas version: 3.0.4 pid: 64118 timestamp: 1542786758978 $ trace org.apache.solr.handler.StandardRequestHandler The argument 'method-pattern' is required # 跟踪org.apache.solr.handler.StandardRequestHandler handleRequest $ trace org.apache.solr.handler.StandardRequestHandler handleRequest No class or method is affected, try: 1. sm CLASS_NAME METHOD_NAME to make sure the method you are tracing actually exists (it might be in your parent class). 2. reset CLASS_NAME and try again, your method body might be too large. 3. visit middleware-container/arthas/issues/278 for more detail $ trace org.apache.solr.handler.RequestHandlerBase handleRequest Press Ctrl+C to abort. Affect(class-cnt:21 , method-cnt:1) cost in 8197 ms. `---ts=2018-11-21 15:53:08;thread_name=http-nio-10070-exec-51;id=3a973;is_daemon=true;priority=5;TCCL=org.apache.catalina.loader.WebappClassLoader@67c5a31d `---[193.784245ms] org.apache.solr.handler.RequestHandlerBase:handleRequest() +---[0.026761ms] java.util.concurrent.atomic.AtomicLong:incrementAndGet() +---[0.010342ms] org.apache.solr.util.stats.Timer:time() +---[0.010381ms] org.apache.solr.util.SolrPluginUtils:setDefaults() +---[0.010746ms] org.apache.solr.response.SolrQueryResponse:setHttpCaching() # 该方法最耗时,须要进一步跟踪 +---[193.214175ms] org.apache.solr.handler.RequestHandlerBase:handleRequestBody() +---[0.01285ms] org.apache.solr.response.SolrQueryResponse:getResponseHeader() +---[0.036761ms] org.apache.solr.common.util.NamedList:get() `---[0.063972ms] org.apache.solr.util.stats.TimerContext:stop() # 继续跟踪 $ trace org.apache.solr.handler.RequestHandlerBase handleRequestBody Press Ctrl+C to abort. Affect(class-cnt:21 , method-cnt:16) cost in 8933 ms. `---ts=2018-11-21 15:53:35;thread_name=http-nio-10070-exec-35;id=2f44f;is_daemon=true;priority=5;TCCL=org.apache.catalina.loader.WebappClassLoader@67c5a31d `---[175.443917ms] org.apache.solr.handler.component.SearchHandler:handleRequestBody() +---[0.019261ms] org.apache.solr.request.SolrQueryRequest:getContentStreams() +---[0.009544ms] org.apache.solr.handler.component.ResponseBuilder:<init>() +---[0.015031ms] org.apache.solr.request.SolrRequestInfo:setResponseBuilder() +---[min=6.36E-4ms,max=0.003173ms,total=0.005898ms,count=3] org.apache.solr.request.SolrQueryRequest:getParams() +---[min=7.02E-4ms,max=0.024157ms,total=0.024859ms,count=2] org.apache.solr.common.params.SolrParams:getBool() +---[0.004623ms] org.apache.solr.handler.component.ResponseBuilder:setDebug() +---[0.012386ms] org.apache.solr.common.params.SolrParams:getParams() +---[0.006289ms] org.apache.solr.util.SolrPluginUtils:getDebugInterests() +---[min=7.88E-4ms,max=0.003469ms,total=0.004257ms,count=2] org.apache.solr.handler.component.ResponseBuilder:isDebug() +---[0.021305ms] org.apache.solr.handler.component.ShardHandlerFactory:getShardHandler() +---[0.040045ms] org.apache.solr.handler.component.ShardHandler:checkDistributed() +---[min=7.07E-4ms,max=0.015151ms,total=0.015858ms,count=2] java.util.List:iterator() +---[min=5.35E-4ms,max=0.019616ms,total=0.029199ms,count=16] java.util.Iterator:hasNext() +---[min=4.93E-4ms,max=0.004724ms,total=0.011896ms,count=14] java.util.Iterator:next() +---[min=9.53E-4ms,max=0.69593ms,total=0.735595ms,count=7] org.apache.solr.handler.component.SearchComponent:prepare() `---[min=8.73E-4ms,max=173.993692ms,total=174.025895ms,count=7] org.apache.solr.handler.component.SearchComponent:process() ...... 重复上面的工做
最后定位到是org.apache.lucene.search.Weight类的scorer方法慢,该方法功能是计算排序分。bash