忽然收到告警,提示mysql宕机了,该服务器是从库。因而尝试登陆服务器看看可否登陆,发现能够登陆,查看mysql进程也存在,尝试登陆提示php
ERROR 1040 (HY000): Too many connections
最大链接数设置的3000,怎么会链接数不够了呢。因而使用gdb修改一下最大链接数:mysql
gdb -p $(cat pid_mysql.pid) -ex "set max_connections=5000" -batch
修改之后能够登陆了,因而show processlist看看是啥状况:sql
发现监控程序执行show slave status都被卡住了,最后把最大链接数用完,致使Too many connections。复制卡在了Waiting for commit lock。查阅资料之后发现是触发了bug。https://bugs.mysql.com/bug.php?id=70307,改bug在5.6.23已经修复。个人版本是 5.6.17服务器
mysql> SELECT a.trx_id, trx_state, trx_started, b.id AS thread_id, b.info, b.user, b.host, b.db, b.command, b.state FROM information_schema.`INNODB_TRX` a, information_schema.`PROCESSLI ST` b WHERE a.trx_mysql_thread_id = b.id ORDER BY a.trx_started; +----------+-----------+---------------------+-----------+------+-------------+------+------+---------+-------------------------+ | trx_id | trx_state | trx_started | thread_id | info | user | host | db | command | state | +----------+-----------+---------------------+-----------+------+-------------+------+------+---------+-------------------------+ | 51455154 | RUNNING | 2017-08-02 02:20:07 | 6404 | NULL | system user | | NULL | Connect | Waiting for commit lock | +----------+-----------+---------------------+-----------+------+-------------+------+------+---------+-------------------------+ 1 row in set (0.03 sec)
能够看到在凌晨2点左右的时候卡住的,忽然发现凌晨2点这个时候正是xtrabackup备份数据的时间。xtrabackup备份的时候执行flushs tables with read lock和show slave status会有可能和SQL Thread造成死锁,致使SQL Thread一直被卡主。缘由是SQL Thread的DML操做完成以后,持有rli->data_lock锁,commit的时候等待MDL_COMMIT,而flush tables with read lock以后执行的show slave status会等待rli->data_lock;修复方法是rli->data_lock锁周期只在DML操做期间持有。
stop slave没有用,正常中止没有用,最后只能kill -9,问题仍是比较严重的,解决的方法就是升级新版本。spa