delete\update where… in语句导致死锁问题分析 – 因为跑不快不敢删库跑路的DBA

1、问题描述

测试中发现delete\update语句在并发测试时经常会导致发生死锁。

2、问题分析

在mariadb中，update或delete使用 where(…,…,…) in (…,…,..) 的写法会导致全表扫描，加大锁冲突的概率，造成死锁。

例1：

MariaDB [lzk]> show create table b\G
*************************** 1. row ***************************
Table: b
Create Table: CREATE TABLE `b` (
`a` int(11) NOT NULL DEFAULT '0',
`b` int(11) NOT NULL DEFAULT '0',
`c` int(11) NOT NULL DEFAULT '0',
`d` int(11) NOT NULL DEFAULT '0',
`e` int(11) NOT NULL DEFAULT '0',
`f` int(11) DEFAULT NULL,
PRIMARY KEY (`a`,`b`,`c`,`d`,`e`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
&nbsp;
MariaDB [lzk]> explain delete from b where a=1 and b=2 and c=1 and d=1 and e=1;
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | b | range | PRIMARY | PRIMARY | 20 | NULL | 1 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec) --直接根据主键定位到记录
&nbsp;
MariaDB [lzk]> explain delete from b where (a,b,c,d,e) in ((1,1,1,1,1));
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 18 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec) --全表扫描

MariaDB [lzk]> show create table b\G

*************************** 1. row ***************************

Table: b

Create Table: CREATE TABLE `b` (

`a` int(11) NOT NULL DEFAULT '0',

`b` int(11) NOT NULL DEFAULT '0',

`c` int(11) NOT NULL DEFAULT '0',

`d` int(11) NOT NULL DEFAULT '0',

`e` int(11) NOT NULL DEFAULT '0',

`f` int(11) DEFAULT NULL,

PRIMARY KEY (`a`,`b`,`c`,`d`,`e`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin

MariaDB [lzk]> explain delete from b where a=1 and b=2 and c=1 and d=1 and e=1;

+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

1 row in set (0.00 sec) --直接根据主键定位到记录

MariaDB [lzk]> explain delete from b where (a,b,c,d,e) in ((1,1,1,1,1));

+------+-------------+-------+------+---------------+------+---------+------+------+-------------+

+------+-------------+-------+------+---------------+------+---------+------+------+-------------+

+------+-------------+-------+------+---------------+------+---------+------+------+-------------+

1 row in set (0.00 sec) --全表扫描

场景1：

当会话1：start transaction;

delete from b where (a,b,c,d,e) in ((1,1,1,1,1));

会话2：start transaction;

delete from b where (a,b,c,d,e) in ((1,1,2,1,1));

此时会话2会产生锁等待。

而场景2：

当会话1：start transaction;

delete from b where where a=1 and b=1 and c=1 and d=1 and e=1;

会话2：start transaction;

delete from b where where a=1 and b=1 and c=2 and d=1 and e=1;

不会产生锁等待。

例2：

MariaDB [test]> explain update b set c='qwe' where (a,b) in ((1,10));
+------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
| id&nbsp;&nbsp; | select_type | table | type&nbsp; | possible_keys | key&nbsp;&nbsp;&nbsp;&nbsp; | key_len | ref&nbsp; | rows&nbsp; &nbsp;&nbsp;| Extra&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
+------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
|&nbsp;&nbsp;&nbsp; 1 | SIMPLE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | b&nbsp;&nbsp;&nbsp;&nbsp; | index | NULL&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | PRIMARY | 8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | NULL | 9278400 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+
1 row in set (0.00 sec)--全索引扫描
&nbsp;
MariaDB [test]> explain update b set c='qwe' where a=1 and b=10;
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id&nbsp;&nbsp; | select_type | table | type&nbsp; | possible_keys | key&nbsp;&nbsp;&nbsp;&nbsp; | key_len | ref&nbsp; | rows | Extra&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|&nbsp;&nbsp;&nbsp; 1 | SIMPLE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | b&nbsp; &nbsp;&nbsp;&nbsp;| range | PRIMARY&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | PRIMARY | 8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | NULL |&nbsp;&nbsp;&nbsp; 1 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)--直接根据主键定位到记录

MariaDB [test]> explain update b set c='qwe' where (a,b) in ((1,10));

+------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+

+------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+

+------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+

1 row in set (0.00 sec)--全索引扫描

MariaDB [test]> explain update b set c='qwe' where a=1 and b=10;

+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

1 row in set (0.00 sec)--直接根据主键定位到记录

3、解决办法

不使用delete\update where… in的写法，改为delete\update where … and …or

4、半一致性读仍会死锁问题

在read-committed隔离级别下，update会使用半一致性读，为什么还是有几率发生死锁？

根据场景复现后，show engine innodb status查看死锁信息，发现发生死锁的事务会占有很多锁，不符合innodb半一致读的特性：

------------------------
LATEST DETECTED DEADLOCK
------------------------
2016-03-30 15:19:46 7f79e7fec700
*** (1) TRANSACTION:
TRANSACTION 132876, ACTIVE 9 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 252 lock struct(s), heap size 96696, 226 row lock(s), undo log entries 1
MySQL thread id 5, OS thread handle 0x7f79e8bb5700, query id 15 redhat64-26 10.47.160.26 root updating
update test.c set c = '94414' where (a,b) in ((1,'5167'))
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 6 page no 6 n bits 496 index `PRIMARY` of table `test`.`c` trx id 132876 lock_mode X locks rec but not gap waiting
Record lock, heap no 211 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
&nbsp;0: len 4; hex 80000001; asc&nbsp;&nbsp;&nbsp;&nbsp; ;;
&nbsp;1: len 4; hex 80000358; asc&nbsp;&nbsp;&nbsp; X;;
&nbsp;2: len 6; hex 00000000c7fb; asc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ;;
&nbsp;3: len 7; hex 06000001671e10; asc&nbsp;&nbsp;&nbsp;&nbsp; g&nbsp; ;;
&nbsp;4: len 5; hex 3533343732; asc 53472;;
&nbsp;5: len 3; hex 383536; asc 856;;
&nbsp;
*** (2) TRANSACTION:
TRANSACTION 132878, ACTIVE 9 sec fetching rows
mysql tables in use 1, locked 1
98 lock struct(s), heap size 96696, 72 row lock(s), undo log entries 1
MySQL thread id 6, OS thread handle 0x7f79e7fec700, query id 17 redhat64-26 10.47.160.26 root updating
update test.c set c = '33764' where (a,b) in ((1,'5250'))
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 6 page no 6 n bits 496 index `PRIMARY` of table `test`.`c` trx id 132878 lock_mode X locks rec but not gap
Record lock, heap no 211 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
&nbsp;0: len 4; hex 80000001; asc&nbsp;&nbsp;&nbsp;&nbsp; ;;
&nbsp;1: len 4; hex 80000358; asc&nbsp;&nbsp;&nbsp; X;;
&nbsp;2: len 6; hex 00000000c7fb; asc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ;;
&nbsp;3: len 7; hex 06000001671e10; asc&nbsp;&nbsp;&nbsp;&nbsp; g&nbsp; ;;
&nbsp;4: len 5; hex 3533343732; asc 53472;;
&nbsp;5: len 3; hex 383536; asc 856;;
&nbsp;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 6 page no 6 n bits 496 index `PRIMARY` of table `test`.`c` trx id 132878 lock_mode X locks rec but not gap waiting
Record lock, heap no 166 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
&nbsp;0: len 4; hex 80000001; asc&nbsp;&nbsp;&nbsp;&nbsp; ;;
&nbsp;1: len 4; hex 8000032b; asc&nbsp;&nbsp;&nbsp; +;;
&nbsp;2: len 6; hex 00000000c5e1; asc&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;;
&nbsp;3: len 7; hex 790000019f170a; asc y&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ;;
&nbsp;4: len 5; hex 3734313133; asc 74113;;
&nbsp;5: len 3; hex 383131; asc 811;;
&nbsp;
*** WE ROLL BACK TRANSACTION (2)

------------------------

LATEST DETECTED DEADLOCK

------------------------

2016-03-30 15:19:46 7f79e7fec700

*** (1) TRANSACTION:

TRANSACTION 132876, ACTIVE 9 sec fetching rows

mysql tables in use 1, locked 1

LOCK WAIT 252 lock struct(s), heap size 96696, 226 row lock(s), undo log entries 1

MySQL thread id 5, OS thread handle 0x7f79e8bb5700, query id 15 redhat64-26 10.47.160.26 root updating

update test.c set c = '94414' where (a,b) in ((1,'5167'))

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 6 page no 6 n bits 496 index `PRIMARY` of table `test`.`c` trx id 132876 lock_mode X locks rec but not gap waiting

Record lock, heap no 211 PHYSICAL RECORD: n_fields 6; compact format; info bits 0

 0: len 4; hex 80000001; asc     ;;

 1: len 4; hex 80000358; asc    X;;

 2: len 6; hex 00000000c7fb; asc       ;;

 3: len 7; hex 06000001671e10; asc     g  ;;

 4: len 5; hex 3533343732; asc 53472;;

 5: len 3; hex 383536; asc 856;;

*** (2) TRANSACTION:

TRANSACTION 132878, ACTIVE 9 sec fetching rows

mysql tables in use 1, locked 1

98 lock struct(s), heap size 96696, 72 row lock(s), undo log entries 1

MySQL thread id 6, OS thread handle 0x7f79e7fec700, query id 17 redhat64-26 10.47.160.26 root updating

update test.c set c = '33764' where (a,b) in ((1,'5250'))

*** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 6 page no 6 n bits 496 index `PRIMARY` of table `test`.`c` trx id 132878 lock_mode X locks rec but not gap

Record lock, heap no 211 PHYSICAL RECORD: n_fields 6; compact format; info bits 0

 0: len 4; hex 80000001; asc     ;;

 1: len 4; hex 80000358; asc    X;;

 2: len 6; hex 00000000c7fb; asc       ;;

 3: len 7; hex 06000001671e10; asc     g  ;;

 4: len 5; hex 3533343732; asc 53472;;

 5: len 3; hex 383536; asc 856;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 6 page no 6 n bits 496 index `PRIMARY` of table `test`.`c` trx id 132878 lock_mode X locks rec but not gap waiting

Record lock, heap no 166 PHYSICAL RECORD: n_fields 6; compact format; info bits 0

 0: len 4; hex 80000001; asc     ;;

 1: len 4; hex 8000032b; asc    +;;

 2: len 6; hex 00000000c5e1; asc       ;;

 3: len 7; hex 790000019f170a; asc y      ;;

 4: len 5; hex 3734313133; asc 74113;;

 5: len 3; hex 383131; asc 811;;

*** WE ROLL BACK TRANSACTION (2)

5、Mariadb处理逻辑

在RC隔离级别下，在执行计划为全表扫描或全索引扫描的情况下innodb处理update的流程如下图：

6、源码分析

遍历记录的主函数为row_search_for_mysql()

调用sel_set_rec_lock()尝试加锁

如果加锁时产生锁等待，即：

case DB_LOCK_WAIT:
    /* Never unlock rows that were part of a conflict. */
    prebuilt->new_rec_locks = 0;   //设置锁定的行数为0

case DB_LOCK_WAIT:

/* Never unlock rows that were part of a conflict. */

prebuilt->new_rec_locks = 0; //设置锁定的行数为0

进行半一致性读：

row_sel_build_committed_vers_for_mysql()

1	row_sel_build_committed_vers_for_mysql()

如果在半一致读时，等待的锁已被释放，则锁读当前版本

/* The lock was granted while we were
    searching for the last committed version.
    Do a normal locking read. */

/* The lock was granted while we were

searching for the last committed version.

Do a normal locking read. */

如果读取到历史版本则设置did_semi_consistent_read = TRUE;

根据did_semi_consistent_read的值设置prebuilt->row_read_type

 if (UNIV_UNLIKELY(did_semi_consistent_read)) {
//读取历史版本，即使用了半一致性读
    prebuilt->row_read_type = ROW_READ_DID_SEMI_CONSISTENT;
 } else {
//锁读当前版本
    prebuilt->row_read_type = ROW_READ_TRY_SEMI_CONSISTENT;
 }

if (UNIV_UNLIKELY(did_semi_consistent_read)) {

//读取历史版本，即使用了半一致性读

prebuilt->row_read_type = ROW_READ_DID_SEMI_CONSISTENT;

} else {

//锁读当前版本

prebuilt->row_read_type = ROW_READ_TRY_SEMI_CONSISTENT;

}

调用解锁函数ha_innobase::unlock_row()

switch (prebuilt->row_read_type)
 
case ROW_READ_TRY_SEMI_CONSISTENT:
    //未使用半一致性读
    row_unlock_for_mysql(prebuilt, FALSE);
    break;
case ROW_READ_DID_SEMI_CONSISTENT:
    prebuilt->row_read_type = ROW_READ_TRY_SEMI_CONSISTENT;
    break;
 
row_unlock_for_mysql()
 
if (prebuilt->new_rec_locks >= 1) {
//解锁记录
lock_rec_unlock()
}

switch (prebuilt->row_read_type)

case ROW_READ_TRY_SEMI_CONSISTENT:

//未使用半一致性读

row_unlock_for_mysql(prebuilt, FALSE);

break;

case ROW_READ_DID_SEMI_CONSISTENT:

prebuilt->row_read_type = ROW_READ_TRY_SEMI_CONSISTENT;

break;

row_unlock_for_mysql()

if (prebuilt->new_rec_locks >= 1) {

//解锁记录

lock_rec_unlock()

}

但是由于在上面的代码中prebuilt->new_rec_locks的值为0，所以不会对记录解锁！！！

测试MySQL不存在上述问题，应该是这块逻辑与MariaDB不同。

发表评论 取消回复

发表评论取消回复