本文针对MySQL InnoDB中在Repeatable Read的隔离级别下使用select for update可能引起的死锁问题进行分析。html
业务中须要对各类类型的实体进行编号,例如对于x类实体的编号多是x201712120001,x201712120002,x201712120003相似于这样。能够观察到这类编号有两个部分组成:x+日期做为前缀,以及流水号(这里是四位的流水号)。java
若是用数据库表实现一个可以分配流水号的需求,无外乎就能够创建一个相似于下面的表:mysql
CREATE TABLE number ( prefix VARCHAR(20) NOT NULL DEFAULT '' COMMENT '前缀码', value BIGINT NOT NULL DEFAULT 0 COMMENT '流水号', UNIQUE KEY uk_prefix(prefix) );
那么在业务层,根据业务规则获得编号的前缀好比x20171212,接下去就能够在代码中起事务,用select for update进行以下的控制。sql
@Transactional long acquire(String prefix) { SerialNumber current = dao.selectAndLock(prefix); if (current == null) { dao.insert(new Record(prefix, 1)); return 1; } else { current.number++; dao.update(current); return current.number; } }
这段代码作的事情其实就是加锁筛选,有则更新,无则插入,然而在Repeatable Read的隔离级别下这段代码是有潜在死锁问题的。(另外一处与事务传播行为相关的问题也会在下文说起)。数据库
当能够经过select for update的where条件筛出记录时,上面的代码是不会有deadlock问题的。然而当select for update中的where条件没法筛选出记录时,这时在有多个线程执行上面的acquire方法时是可能会出现死锁的。session
下面经过一个比较简单的例子复现一下这个场景
首先给表里初始化3条数据。并发
insert into number select 'bbb',2; insert into number select 'hhh',8; insert into number select 'yyy',25;
接着按照以下的时序进行操做:ui
session 1 | session 2 |
---|---|
begin; | |
begin; | |
select * from number where prefix='ddd' for update; | |
select * from number where prefix='fff' for update | |
insert into number select 'ddd',1 | |
锁等待中 | insert into number select 'fff',1 |
锁等待解除 | 死锁,session 2的事务被回滚 |
经过查看show engine innodb status
的信息,咱们慢慢地观察每一步的状况:spa
------------
TRANSACTIONS
------------
Trx id counter 238435
Purge done for trx's n:o < 238430 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479459589696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479459588792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 238434, ACTIVE 3 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 160, OS thread handle 123145573965824, query id 69153 localhost root
TABLE LOCK tabletest
.number
trx id 238434 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest
.number
trx id 238434 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;线程
事务238434拿到了hhh前的gap锁,也就是('bbb', 'hhh')的gap锁。
------------
TRANSACTIONS
------------
Trx id counter 238436
Purge done for trx's n:o < 238430 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479459589696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 238435, ACTIVE 3 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 161, OS thread handle 123145573408768, query id 69155 localhost root
TABLE LOCK tabletest
.number
trx id 238435 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest
.number
trx id 238435 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
---TRANSACTION 238434, ACTIVE 30 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 160, OS thread handle 123145573965824, query id 69153 localhost root
TABLE LOCK tabletest
.number
trx id 238434 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest
.number
trx id 238434 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
事务238435也拿到了hhh前的gap锁。
截自InnoDB的lock_rec_has_to_wait
方法实现,能够看到的LOCK_GAP类型的锁只要不带有插入意向标识,没必要等待其它锁(表锁除外)
------------
TRANSACTIONS
------------
Trx id counter 238436
Purge done for trx's n:o < 238430 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479459589696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 238435, ACTIVE 28 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 161, OS thread handle 123145573408768, query id 69155 localhost root
TABLE LOCK tabletest
.number
trx id 238435 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest
.number
trx id 238435 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
---TRANSACTION 238434, ACTIVE 55 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 160, OS thread handle 123145573965824, query id 69157 localhost root executing
insert into number select 'ddd',1
------- TRX HAS BEEN WAITING 2 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest
.number
trx id 238434 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
------------------
TABLE LOCK tabletest
.number
trx id 238434 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest
.number
trx id 238434 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest
.number
trx id 238434 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
能够看到,这时候事务238434在尝试插入'ddd',1时,因为发现其余事务(238435)已经有这个区间的gap锁,所以innodb给事务238434上了插入意向锁,锁的模式为LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION
,等待事务238435释放掉gap锁。
截取自InnoDB的lock_rec_insert_check_and_lock
方法实现
------------------------
LATEST DETECTED DEADLOCK
------------------------
2017-12-21 22:50:40 0x70001028a000
*** (1) TRANSACTION:
TRANSACTION 238434, ACTIVE 81 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 160, OS thread handle 123145573965824, query id 69157 localhost root executing
insert into number select 'ddd',1
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest
.number
trx id 238434 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
*** (2) TRANSACTION:
TRANSACTION 238435, ACTIVE 54 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 161, OS thread handle 123145573408768, query id 69159 localhost root executing
insert into number select 'fff',1
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest
.number
trx id 238435 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest
.number
trx id 238435 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 238436
Purge done for trx's n:o < 238430 undo n:o < 0 state: running but idle
History list length 13
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479459589696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479459588792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 238434, ACTIVE 84 sec
3 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 160, OS thread handle 123145573965824, query id 69157 localhost root
TABLE LOCK tabletest
.number
trx id 238434 lock mode IX
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest
.number
trx id 238434 lock_mode X locks gap before rec
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
Record lock, heap no 7 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 646464; asc ddd;;
1: len 6; hex 00000003a362; asc b;;
2: len 7; hex de000001e60110; asc ;;
3: len 8; hex 8000000000000001; asc ;;
RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest
.number
trx id 238434 lock_mode X locks gap before rec insert intention
Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 3; hex 686868; asc hhh;;
1: len 6; hex 00000003a350; asc P;;
2: len 7; hex d2000001ff0110; asc ;;
3: len 8; hex 8000000000000008; asc ;;
到了这里,咱们能够从死锁信息中看出,因为事务238435在插入时也发现了事务238434的gap锁,一样加上了插入意向锁,等待事务238434释放掉gap锁。所以出现死锁的状况。
接下来经过debug MySQL的源码来从新复现上面的场景。
这里session2的事务4445加锁的type_mode为515,也即(LOCK_X | LOCK_GAP),与session1事务的锁4444的gap锁lock2->type_mode=547(LOCK_X | LOCK_REC | LOCK_GAP)的lock_mode是不兼容的(二者皆为LOCK_X)。然而因为type_mode知足LOCK_GAP且不带有LCK_INSERT_INTENTION的标识位,这里会断定为不须要等待。所以,第二个session执行select for update也一样成功加上gap锁了。
这里sesion1事务4444执行insert时type_mode为2563(LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION),因为带有LOCK_INSERT_INTENTION标识位,所以须要等待session2事务释放4445的gap锁。后续session1事务4444得到了一个插入意向锁,而且在等待session2事务4445释放gap锁。
这里session2事务4445一样执行了insert操做,插入意向锁须要等待session1的事务4444的gap锁释放。在死锁检测时,被探测到造成等待环。所以InnoDB会选择一个事务做为victim进行回滚。
其过程大体以下:
咱们已经知道,这种状况出现的缘由是:两个session同时经过select for update,而且未命中任何记录的状况下,是有可能获得相同gap的锁的(要看where筛选条件是否落在同一个区间。若是上面的案例若是一个session准备插入'ddd'另外一个准备插入'kkk'则不会出现冲突,由于不是同一个gap)。此时再进行并发插入,其中一个会进入锁等待,待第二个session进行插入时,会出现死锁。MySQL会根据事务权重选择一个事务进行回滚。
那么如何避免这个状况呢?
一种解决办法是将事务隔离级别下降到Read Committed,这时不会有gap锁,对于上述场景,若是where中条件不一样即最终要插入的键不一样,则不会有问题。若是业务代码中可能不一样线程会尝试对相同键进行select for update,则可在业务代码中捕获索引冲突异常进行重试。
此外,上面代码示例中的代码还有一处值得注意的地方是事务注解@Transactional的传播机制,对于这类与主流程事务关系不大的方法,应当将事务传播行为改成REQUIRES_NEW
。
缘由有两点:
InnoDB手册
数据库内核月报 - 2016 / 01 MySQL InnoDB源码