欢迎关注微信公众号:石杉的架构笔记(id:shishan100)面试

个人新课**《C2C 电商系统微服务架构120天实战训练营》在公众号儒猿技术窝**上线了,感兴趣的同窗,能够点击下方连接了解详情:算法
《C2C 电商系统微服务架构120天实战训练营》性能优化
“ 这篇文章给你们聊一次线上生产系统事故的解决经历,其背后表明的是线上生产系统的JVM FullGC可能引起的严重故障。微信
1、业务场景介绍
先简单说说线上生产系统的一个背景,由于仅仅是文章做为案例来说,因此弱化大量的业务背景。markdown
简单来讲,这是一套分布式系统,系统A须要将一个很是核心以及关键的数据经过网络请求,传输给另一个系统B。网络
因此这里其实就考虑到了一个问题,若是系统A刚刚将核心数据传递给了系统B,结果系统B莫名其妙宕机了,岂不是会致使数据丢失?架构
因此在这个分布式系统的架构设计中,采起了很是经典的一个Quorum算法。并发
这个算法简单来讲,就是系统B必需要部署奇数个节点,好比说至少部署3台机器,或者是5台机器,7台机器,相似这样子。app
而后系统A每次传输一个数据给系统,都必需要对系统B部署的所有机器都发送请求,将一份数据传输给系统B部署的全部机器。异步
要断定系统A对系统B的一次数据写是成功的,要求系统A必须在指定时间范围内对超过Quorum数量的系统B所在机器传输成功。
举个例子,假设系统B部署了3台机器,那么他的Quorum数量就是:3 / 2 + 1 = 2,也就是说系统B的Quorum数量就是:全部机器数量 / 2 + 1。
因此系统A要断定一个核心数据是否写成功,若是系统B一共部署了3台机器的话,那么系统A必须在指定时间内收到2台系统B所在机器返回的写成功的响应。
此时系统A才能认为这条数据对系统B是写成功了。这个就是所谓的Quorum机制。
也就是说,分布式架构下,系统之间传输数据,一个系统要确保本身给另一个系统传输的数据不会丢失,必需要在指定时间内,收到另一个系统Quorum(大多数)数量的机器响应说写成功。
这套机制实际上在不少分布式系统、中间件系统中都有很是普遍的使用,咱们线上的分布式系统也是采用了这个Quorum机制在两个系统之间传输数据。
给你们上一张图,一块儿来看一下这套架构长啥样。
如上图所示,图中很清晰的展现了系统A和系统B之间传输一份数据时的Quorum机制。
接下来,咱们用代码给你们展现一下,上面的Quorum写机制在代码层面大概是什么样子的。
PS:由于实际这套机制涉及大量的底层网络传输、通讯、容错、优化的东西,因此下面代码通过了大幅度简化,仅仅表达出了一个核心的意思。
上面就是通过大幅精简后的代码,不过核心的意思是表达清晰了。你们能够仔细看两遍,其实仍是很容易弄懂的。
这段代码其实含义很简单,说白了就是异步开启线程发送数据给系统B全部的机器,同时进入一个while循环等待系统B的Quorum数量的机器返回响应结果。
若是超过指定超时时间还没收到预期数量的机器返回结果,那么就断定系统B部署的集群出现故障,接着让系统A直接退出,至关于系统A宕机。
整个代码,就是这么个意思!
2、问题凸现
光是看代码其实没啥难的,可是问题就在于线上运行的时候,可不是跟你写代码的时候想的同样简单。
有一次线上生产系统运行的过程当中,总体系统负载都很平稳,原本是不该该有什么问题,可是结果忽然收到报警,说系统A忽然宕机了。
而后就开始进行排查,左排查右排查,发现系统B集群都好好的,不该该有问题。
而后再查查系统A,发现系统A别的地方也没什么问题。
最后结合系统A自身的日志,以及系统A的JVM FullGC进行垃圾回收的日志,咱们才算是搞清楚了具体的故障缘由。
3、定位问题
其实缘由很是的简单,就是系统A在线上运行一段时间后,会偶发性的进行长时间Stop the World的JVM FullGC,也就是大面积垃圾回收。
可是,此时会形成系统A内部的工做线程大量的卡顿,再也不工做。要等JVM FullGC结束以后,工做线程才会恢复运做。
咱们来看下面那个代码片断:

可是这种系统A的莫名宕机是不正确的,由于若是没有JVM FullGC,原本上面那个if语句是不会成立的。
他会停顿1秒钟进入下一轮while循环,接着就能够收到系统B返回的Quorum数量的结果,这个while循环就能够中断,继续运行了。
结果由于出现了JVM FullGC卡顿了几十秒,致使莫名其妙就触发了if判断的执行,系统A莫名其妙就退出宕机了。
因此,线上的JVM FullGC致使的系统长时间卡顿,真是形成系统不稳定运行的隐形杀手之一啊!
4、解决问题
至于上述代码稳定性的优化,也很简单。咱们只要在代码里加入一些东西,监控一下上述代码中是否发生了JVM FullGC。
若是发生了JVM FullGC,就自动延长expireTime就能够了。
好比下面代码的改进:
经过上述代码的改进,就能够有效的优化线上系统的稳定性,保证其在JVM FullGC发生的状况下,也不会随意出现异常宕机退出的状况了。
END
若有收获,请帮忙转发,您的鼓励是做者最大的动力,谢谢!
一大波微服务、分布式、高并发、高可用的原创系列文章正在路上
欢迎扫描下方二维码,持续关注:
石杉的架构笔记(id:shishan100)
十余年BAT架构经验倾囊相授
**> **推荐阅读:** > > 一、[拜托!面试请不要再问我Spring Cloud底层原理](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5be13b83f265da6116393fc7) > > 二、[【双11狂欢的背后】微服务注册中心如何承载大型系统的千万级访问?](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5be3f8dcf265da613a5382ca) > > 三、[【性能优化之道】每秒上万并发下的Spring Cloud参数优化实战](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5be83e166fb9a049a7115580) > > 四、[微服务架构如何保障双11狂欢下的99.99%高可用](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5be99a68e51d4511a8090440) > > 五、[兄弟,用大白话告诉你小白都能听懂的Hadoop架构原理](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5beaf02ce51d457e90196069) > > 六、[大规模集群下Hadoop NameNode如何承载每秒上千次的高并发访问](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5bec278c5188253e64332c76) > > 七、【[性能优化的秘密】Hadoop如何将TB级大文件的上传性能优化上百倍](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Flink.juejin.im%2525252525252F%2525252525253Ftarget%2525252525253Dhttps%252525252525253A%252525252525252F%252525252525252Flink.juejin.im%252525252525252F%252525252525253Ftarget%252525252525253Dhttps%25252525252525253A%25252525252525252F%25252525252525252Fjuejin.im%25252525252525252Fpost%25252525252525252F5bed82a9e51d450f9461cfc7) > > [八、](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Fjuejin.im%2525252525252Fpost%2525252525252F5bf2c6b6e51d456693549af4)[拜托,面试请不要再问我TCC分布式事务的实现原理坑爹呀!](https://juejin.cn/post/6844903716089233416) [](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Fjuejin.im%2525252525252Fpost%2525252525252F5bf2c6b6e51d456693549af4) > > 九、[【坑爹呀!】最终一致性分布式事务如何保障实际生产中99.99%高可用?](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Fjuejin.im%2525252525252Fpost%2525252525252F5bf2c6b6e51d456693549af4) > > 十、[拜托,面试请不要再问我Redis分布式锁的实现原理!](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Fjuejin.im%2525252525252Fpost%2525252525252F5bf3f15851882526a643e207) > > **十一、****[【眼前一亮!】看Hadoop底层算法如何优雅的将大规模集群性能提高10倍以上?](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Flink.juejin.im%25252525253Ftarget%25252525253Dhttps%2525252525253A%2525252525252F%2525252525252Fjuejin.im%2525252525252Fpost%2525252525252F5bf5396f51882509a768067e)** > > **十二、****[亿级流量系统架构之如何支撑百亿级数据的存储与计算](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Flink.juejin.im%252525252F%252525253Ftarget%252525253Dhttps%25252525253A%25252525252F%25252525252Fjuejin.im%25252525252Fpost%25252525252F5bfab59fe51d4551584c7bcf)** > > 1三、[亿级流量系统架构之如何设计高容错分布式计算系统](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Flink.juejin.im%2525253Ftarget%2525253Dhttps%252525253A%252525252F%252525252Fjuejin.im%252525252Fpost%252525252F5bfbeeb9f265da61407e9679) > > 1四、[亿级流量系统架构之如何设计承载百亿流量的高性能架构](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Flink.juejin.im%25253Ftarget%25253Dhttps%2525253A%2525252F%2525252Fjuejin.im%2525252Fpost%2525252F5bfd2df1e51d4574b133dd3a) > > 1五、[亿级流量系统架构之如何设计每秒十万查询的高并发架构](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Flink.juejin.im%252F%253Ftarget%253Dhttps%25253A%25252F%25252Fjuejin.im%25252Fpost%25252F5bfe771251882509a7681b3a) > > 1六、[亿级流量系统架构之如何设计全链路99.99%高可用架构](https://link.juejin.im/?target=https%3A%2F%2Flink.juejin.im%3Ftarget%3Dhttps%253A%252F%252Fjuejin.im%252Fpost%252F5bffab686fb9a04a102f0022) > > 1七、[七张图完全讲清楚ZooKeeper分布式锁的实现原理](https://link.juejin.im/?target=https%3A%2F%2Fjuejin.im%2Fpost%2F5c01532ef265da61362232ed) > > 1八、[大白话聊聊Java并发面试问题之volatile究竟是什么?](https://juejin.cn/post/6844903730303746061) > > 1九、[大白话聊聊Java并发面试问题之Java 8如何优化CAS性能?](https://juejin.cn/post/6844903731234865160) > > 20、[大白话聊聊Java并发面试问题之谈谈你对AQS的理解?](https://juejin.cn/post/6844903732061159437) > > 2一、[大白话聊聊Java并发面试问题之公平锁与非公平锁是啥?](https://juejin.cn/post/6844903732883226637) > > 2二、[大白话聊聊Java并发面试问题之微服务注册中心的读写锁优化](https://juejin.cn/post/6844903734267510798) > > 2三、[互联网公司的面试官是如何360°无死角考察候选人的?(上篇)](https://juejin.cn/post/6844903734930046989) > > 2四、[互联网公司面试官是如何360°无死角考察候选人的?(下篇)](https://juejin.cn/post/6844903735655661581) > > 2五、[Java进阶面试系列之一:哥们,大家的系统架构中为何要引入消息中间件?](https://juejin.cn/post/6844903736444207117) > > 2六、[【Java进阶面试系列之二】:哥们,那你说说系统架构引入消息中间件有什么缺点?](https://juejin.cn/post/6844903737123667975) > > 2七、[【行走的Offer收割机】记一位朋友斩获BAT技术专家Offer的面试经历](https://juejin.cn/post/6844903741213130765) > > 2八、[【Java进阶面试系列之三】哥们,消息中间件在大家项目里是如何落地的?](https://juejin.cn/post/6844903742114906125) > > 2九、[【Java进阶面试系列之四】扎心!线上服务宕机时,如何保证数据100%不丢失?](https://juejin.cn/post/6844903742928601095)**