MySQL死锁

时间 2019-12-20

标签 mysql 死锁栏目 MySQL 繁體版

原文原文链接

https://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks.htmlhtml

什么是mysql的死锁？python

A deadlock is a situation where different transactions are unable to proceed because each holds a lock that the other needs. Because both transactions are waiting for a resource to become available, neither ever release the locks it holds.
mysql

简单来讲能够提炼出2个词：环路等待（each holds a lock that the other needs）和不可剥夺（neither ever release the locks it holds）。sql

其实普遍意义上死锁的四个必要条件也能够直接简化为上述两个条件，剩下的互斥和请求保持条件只是两个众所周知的补充。数据库

1、一个简单的死锁示例：服务器

会话A：网络

mysql> CREATE TABLE t (i INT) ENGINE = InnoDB;
Query OK, 0 rows affected (1.07 sec)

mysql> INSERT INTO t (i) VALUES(1);
Query OK, 1 row affected (0.09 sec)

mysql> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t WHERE i = 1 LOCK IN SHARE MODE;
+------+
| i    |
+------+
| 1    |
+------+

会话B：并发

mysql> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

mysql> DELETE FROM t WHERE i = 1;

此时会话B会被阻塞（直到锁请求超时）。app

此时会话A继续执行：

DELETE FROM t WHERE i = 1;

会话B会被立马rollback，由于产生了死锁，最近的死锁信息能够经过show engine innodb status\G看到。负载均衡

打开innodb_print_all_deadlocks参数以后，死锁信息还会在error日志里打印。鉴于本例过于简单就不占用篇幅分析死锁信息了。

set @@global.innodb_print_all_deadlocks=on;

innodb会选择耗费资源较少的事务进行回滚（取决于DML涉及的行数和size）。

2、一个实际的死锁示例：

error日志里显示的死锁日志为：

InnoDB: transactions deadlock detected, dumping detailed information.
*** (1) TRANSACTION:
TRANSACTION 209262583957, ACTIVE 1 sec starting index read
mysql tables in use 2, locked 2
LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 129183854, OS thread handle 0x7f1aeae7a700, query id 68320628504 <服务器A信息> updating
update  tb_authorize_info set account_balance=account_balance-  100.00 
     where (SELECT a.account_balance from 
(select account_balance from tb_authorize_info a where appId =  '49E5BD695F853DC3' )a)  -  100.00 > 0 
 and appId = '49E5BD695F853DC3'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1845 page no 4 n bits 96 index `PRIMARY` of table `xxx`.`tb_authorize_info` trx id 209262583957 lock_mode X locks rec but not gap waiting
Record lock, heap no 18 PHYSICAL RECORD: n_fields 32; compact format; info bits 0
......

*** (2) TRANSACTION:
TRANSACTION 209262584968, ACTIVE 1 sec starting index read
mysql tables in use 2, locked 2
4 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 129183879, OS thread handle 0x7f198b208700, query id 68320632234 <服务器B信息> updating
update  tb_authorize_info set account_balance=account_balance-  100.00 
     where (SELECT a.account_balance from 
(select account_balance from tb_authorize_info a where appId =  '49E5BD695F853DC3' )a)  -  100.00 > 0 
 and appId = '49E5BD695F853DC3'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1845 page no 4 n bits 96 index `PRIMARY` of table `xxx`.`tb_authorize_info` trx id 209262584968 lock mode S locks rec but not gap
Record lock, heap no 18 PHYSICAL RECORD: n_fields 32; compact format; info bits 0
......

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1845 page no 4 n bits 96 index `PRIMARY` of table `xxx`.`tb_authorize_info` trx id 209262584968 lock_mode X locks rec but not gap waiting
Record lock, heap no 18 PHYSICAL RECORD: n_fields 32; compact format; info bits 0
......

*** WE ROLL BACK TRANSACTION (2)

这个死锁属于简单的死锁，因为网络或其余延迟致使应用请求发送到了2台负载均衡的应用服务器，两个应用程序同时请求数据库执行SQL，二者都根据where条件先获取到了S锁，而后准备升级为X锁以便更新，可是各自被对方的S锁阻塞，所以造成死锁，不过死锁很快被mysql杀掉，事务1正常执行完毕，事务二回滚，前台业务除了一点点延迟基本没啥影响。

3、stackoverflow上另外一个死锁：

有人在stackoverflow上发了一个死锁的信息，尝试直接解析此类信息对分析高并发下的SQL卡慢会有帮助所以尝试本身解析，因为时间久远如今我已经找不到相关连接也懒得去找了。

LATEST DETECTED DEADLOCK
------------------------
130409  0:40:58
*** (1) TRANSACTION:
TRANSACTION 3D61D41F, ACTIVE 3 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 43 lock struct(s), heap size 6960, 358 row lock(s), undo log entries 43
MySQL thread id 17241690, OS thread handle 0x7ffd3469a700, query id 860259163 localhost root update
#############
INSERT INTO `notification` (`other_grouped_notifications_count`, `user_id`, `notifiable_type`, `action_item`, `action_id`, `created_at`, `status`, `updated_at`) 
VALUES (0, 4442, 'MATCH', 'MATCH', 224716, 1365448255, 1, 1365448255)
#############
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 272207 n bits 1272 index `user_id` of table `notification` trx id 3D61D41F lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 69 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 8000115b; asc    [;;
 1: len 4; hex 0005e0bb; asc     ;;
-- 事务1欲插入数据user_id=4442，所以首先获取了对应主键(lower_bound,4443]范围上的插入意向锁，而后想要在辅助索引(lower_bound,4443]的范围上加insert intention lock，但被阻塞，推断这个范围上已经有了其余事务的行锁
-- 事务1须要获取2个插入意向锁后才会开始插入操做，这两个锁的获取是不可分割的
*** (2) TRANSACTION:
TRANSACTION 3D61C472, ACTIVE 15 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1248, 2 row lock(s)
MySQL thread id 17266704, OS thread handle 0x7ffd34b01700, query id 860250374 localhost root Updating
#############
UPDATE `notification` SET `status`=0 WHERE user_id = 4443 and status=1
#############
*** (2) HOLDS THE LOCK(S):
-- 事务2的update语句要更新user_id=4443的记录，所以首先在user_id索引的(lower_bound,4443]范围添加了X模式的next-key行锁，事务1就是被这个next-key行锁阻塞的
RECORD LOCKS space id 0 page no 272207 n bits 1272 index `user_id` of table `notification` trx id 3D61C472 lock_mode X
Record lock, heap no 69 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 8000115b; asc    [;;
 1: len 4; hex 0005e0bb; asc     ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
-- 当事务2尝试更新主键数据时要获取user_id=4443对应主键的行锁，可是发现主键的(lower_bound,4443]范围上已经被事务1加了insert intention lock，所以被阻塞
-- 一样事务2获取辅助索引的next-key和主键的record锁也是不可分割的，只有都获取完毕才能进行update
RECORD LOCKS space id 0 page no 261029 n bits 248 index `PRIMARY` of table `notification` trx id 3D61C472 lock_mode X locks rec but not gap waiting
Record lock, heap no 161 PHYSICAL RECORD: n_fields 16; compact format; info bits 0
 0: len 4; hex 0005e0bb; asc     ;;
 1: len 6; hex 00000c75178f; asc    u  ;;
 2: len 7; hex 480007c00c1d10; asc H      ;;
 3: len 4; hex 8000115b; asc    [;;
 4: len 8; hex 5245474953544552; asc REGISTER;;
 5: SQL NULL;
 6: SQL NULL;
 7: SQL NULL;
 8: len 4; hex d117dd91; asc     ;;
 9: len 4; hex d117dd91; asc     ;;
 10: len 1; hex 80; asc  ;;
 11: SQL NULL;
 12: SQL NULL;
 13: SQL NULL;
 14: SQL NULL;
 15: len 4; hex 80000000; asc     ;;

*** WE ROLL BACK TRANSACTION (2)

因此这个死锁的出现就很容易理解了，事务1先获取了4442位置主键的插入意向锁，在获取辅助索引上的插入意向锁时被事务2 update语句的next-key行锁阻塞致使插入意向锁获取失败，而事务2的update获取了索引的next-key行锁后尝试更新主键(即在主键上加非gap行锁)却被事务1的插入意向锁阻塞。

两个事务都不能放弃本身已有的资源，都请求与对方不兼容的锁，不可剥夺且造成环路等待所以死锁。

这个死锁的根源就在于事务2的update语句持续的时间过长，致使后继insert语句卡死。

4、如何避免死锁？

其实官网有一篇完整的介绍：https://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks-handling.html

可是内容有点多，我仍是习惯用几句话总结下：

一、尽量优化SQL的查询性能使得事务尽量的短小。

二、若是不介意幻读可使用read committed隔离级别以禁止范围锁。

三、若是前二者都作不到或者SQL优化的空间比较小，那么尽可能分表分库，经过增长资源（或者叫分散资源）减小资源冲突的概率。

5、总结：

因为mysql innodb特殊的行锁机制，死锁一般都是涉及到插入意向锁和next-key锁的，由于这两个锁是范围锁，范围锁涉及的目的就是为避免幻读，这会锁定一些本身不须要操做的记录。

不过在mysql中死锁历来都不是大问题，死锁一般都是数据库卡慢的果，而非因。并且因为数据库中广泛存在的死锁查杀机制，死锁产生后会很快被查杀。

真正可能引起数据库性能问题的，是高并发下的长事务，这种事务会致使undo等资源的争用，会占用binlog的提交队列致使后继事务处于commit阶段没法提交，即使强制kill也会引起长时间的rollback操做。

所以高并发下的长事务和低性能SQL才是死锁的主因，由于他们慢且做为一个总体在完成以前不会释放资源产生环路等待。