梳理下MySQL崩溃恢复过程

阅读目录(Content)php

基于MySQL5.7版本,5.7版本在恢复过程作了优化,本文描述不考虑以前版本。html

1 初始化崩溃恢复

    数据库关闭只有2种状况,正常关闭,非正常关闭(包括数据库实例crash及服务器crash)。mysql

    正常关闭状况,全部buffer pool里边的脏页都会都会刷新一遍到磁盘,同时记录最新LSN到ibdata文件的第一个page中。而非正常关闭来不及作这些操做,也就是没及时把脏数据flush到磁盘,也没有记录最新LSN到ibdata file。sql

    当咱们重启数据库实例的时候,数据库作2个阶段性操做:redo log处理,undo log及binlog 处理。数据库

1.1 redo log处理

  • 打开系统表空间ibdata,读取第一个page中的LSN,若第一个页损坏,则依次日后面的page读,知道有个完整的page可以提供LSN,这个LSN看成上次shutdown时的checkpoint点,后续恢复,从这个LSN点开始恢复
  • 进入redo log文件,读取第一个redo log文件头的checkpoint LSN, 并根据该LSN定位到redo日志文件中对应的位置,从该checkpoint点开始扫描,进行3次redo log文件扫描:
    • 第一次,找 MLOG_CHECKPOINT日志
      • 若是是正常关闭,这个日志是不作记录的,也就是扫描的过程当中不回找到对应的MLOG_CHECKPOINT日志,不会进行接下来的两次扫描,由于属于正常关闭数据库服务,不须要考虑奔溃恢复状况;
      • 若是是非正常关闭,则会查找到 MLOG_CHECKPOINT (若是是多个,则说明redo文件已损坏,恢复报错),获取MLOG_FILE_NAME中指定后续须要恢复的ibd文件;
    • 第二次,从redo log读到的LSN,找到checkpoint点开始重复扫描存储日志对象
      • 根据MLOG_CHECKPOINT日志,读取对应LSN以后的日志解析到hash表中,若是剩下的日志解析结束后尚未填满hash表格,则不须要进行第三次扫描;
      • 进行到这里,则说明数据库是非正常关闭,会在errorlog中提示:Database was not shutdown normally!详见下图。
    • 第三次,若第二次扫描hash表空间不足,则发起第三次扫描,清空hash表空间,从新重新的checkpoint点开始扫描
      • 若是第二次扫扫描就把hash表填满,那么会先把hash表里边的记录重作到buffer pool中的数据页,而后再来加载redo log 记录到被清空的hash表中,hash表满后当即执行恢复操做,知道全部须要redo 的redo log 被应用结束。

    恢复的过程当中,注意两个点:打开ibd文件形式,读取数据库到buffer pool的改进。vim

    根据hash表中的相应信息读取数据页, 读数据页的时候,5.7以前版本采用把全部表空间都打开,全部表格仅执行ReadOnly,5.7版本作了优化,新增了 MLOG_FILE_NAME 记录在checkpoint以后,全部被修改过的信息,根据这些信息,在恢复过程当中,只须要打开相应的ibd文件便可,不涉及恢复的表格支持正常DML跟DDL操做,涉及恢复的表格则仅执行ReadOnly功能。安全

    当把数据页读取到buffer pool中,以往版本是只读取对应的单个页面,而如今的是直接读取与该页面相邻的32个data page 也一块儿加载的buffer pool,由于一个数据页的修改,可能周围的页面也被修改,一次性读取,能够避免后面根据hash表中再从新读取其相邻的页面。服务器

1.2 undo log及binlog 处理

      上一阶段中,把redo log中的操做都apply到数据页中,可是对于prepare状态的事务却尚未进行回滚处理,这个阶段则是针对prepare状态的事务进行处理,须要使用到binlog和undo log。app

  • 根据最后一个binlog文件,为啥不是全部binlog文件呢?由于每个binlog文件切换的时候,都会确保当前binlog文件的全部操做已落盘,因此只须要考虑最后一个binlog文件。跟进最后一个binlog文件,获取全部可能没有提交事务的xid列表;
  • 根据undo log中的 insert_undo_list,upddate_undo_list事务链,构建undo_list,在根据undo_list构建未提交事务链表;
  • 从未提交事务链表中,提取出xid,凡是存在于xid列表中的,则须要提交,不存在的,则回滚。

      整个恢复过程,能够参考下来自 www.sysdb.cn 网站做者  boyce 画的说明图,图片版权属于该做者,本处仅为引用分享给你们,做图很详细,一言不合开源码分析!遗憾的是,做者只写了2篇博文就中止更新了,心疼默哀十分钟.....socket

2 innodb关闭恢复涉及参数

 

2.1 关闭参数:innodb_fast_shutdown

    数据库关闭的时候,innodb须要完成全部full purge何insert buffer操做,这须要一些时间,甚至几个小时完成

  • 0
    • 关闭时,须要完成全部full purge,insert buffer, flush dirty pages操做
    • 作 innodb plugin升级时,会设置为0,再正常关闭数据库。
  • 1
    • 默认值,表示不须要完成full purge和insert buffer操做,可是缓冲池的脏数据须要刷新到磁盘
  • 2
    • 既不完成full purge和insert buffer操做,也不把缓冲池的脏数据刷新到磁盘,而是将日志写入日志文件,这样不回有数据丢失,可是下次mysql启动的时候,须要根据日志文件执行恢复操做

 

2.2 恢复参数:innodb_force_recovery

    这里注意,数字越小,则忽略检查的内容越少,每一个大的数字都包含了前面小数字忽略检查的内容。当参数设置大于0后,能够对表格进行DML操做,可是DDL操做时不容许的。

    当innodb_force_recovery 值为1-3时,容许SELECT TABLE ,DROP or CREATE tables;innodb_force_recovery 值为>=4时,5.7.17以前版本支持DROP TABLE,5.7.18后版本不支持。

    参数说明以下图:

3 测试状况

    测试内容:

    begin;insert into orders select * from orders.orders limit 100000;

    插入结束后不提交事务,执行 kill mysql进程

    tailf error.log查看

 

3.1 默认配置测试:innodb_fast_shutdown=1,innodb_force_recovery=0

   配置说明:异常中止DB服务时,不须要完成 full purge 和 insert buffer 操做,可是缓冲池的脏数据须要刷新到磁盘;服务启动的时候,作全部须要的检查跟事务操做。

 

3.1.1 操做过程

复制代码

1 #1 tailf error.log查看mysql错误日志,动态滚动查看
 2 mysql> show global variables like 'log_error';
 3 tailf /data/mysql/mysql3310/data/error.log
 4 
 5 #2 测试库中开启事务,insert 10w行记录,不提交事务
 6 mysql> begin;insert into orders select * from orders.orders limit 100000;
 7 Query OK, 0 rows affected (0.00 sec)
 8 
 9 Query OK, 100000 rows affected (37.55 sec)
10 Records: 100000  Duplicates: 0  Warnings: 0
11 
12 #3 查找mysql进程号,杀进程
13 [root@localhost opt]# ps axu | grep mysql
14 [root@localhost opt]# kill -9 mysql的进程号
15 
16 #4 启动mysql服务
17 mysqld --defaults-file=/data/mysql/mysql3310/mysql3310.cnf &
18 
19 #5 到tailf error.log窗口查看错误

复制代码

 

3.1.2 error log解析

启动mysqld进程

2017-03-16T16:28:13.812074Z 0 [Note] mysqld (mysqld 5.7.14-log) starting as process 28091 ...

 

读取ibdata的LSN

读取redo文件的checkpoint,从该点的LSN开始查找MLOG_CHECKPOINT

发现MLOG_CHECKPOINT日志,提示:Database was not shutdown normally !  

2017-03-16T16:28:16.572276Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 898634772645

2017-03-16T16:28:16.572418Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 898634772654

2017-03-16T16:28:16.693771Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 898634772654

2017-03-16T16:28:16.693880Z 0 [Note] InnoDB: Database was not shutdown normally!

2017-03-16T16:28:16.693913Z 0 [Note] InnoDB: Starting crash recovery.

 

检测须要重作跟回滚的事务

因为没有须要redo的事务,可是有undo事务,开始undo操做

加载redo log中的96个系统回滚段 rollbak segment,32个不须要redo 的临时表空间的回滚段

2017-03-16T16:28:17.360953Z 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 100000 row operations to undo

 

获取binlog回滚xid列表,结合undo log作回滚操做

2017-03-16T16:28:17.361122Z 0 [Note] InnoDB: Trx id counter is 5716480

2017-03-16T16:28:17.625441Z 0 [Note] InnoDB: Last MySQL binlog file position 0 37774, file name bin_log.000016

2017-03-16T16:28:17.787684Z 0 [Note] InnoDB: Starting in background the rollback of uncommitted transactions

2017-03-16T16:28:17.788022Z 0 [Note] InnoDB: Rolling back trx with id 5715975, 100000 rows to undo

 

启动innodb,加载buffer pool

2017-03-16T16:28:17.838323Z 0 [Note] InnoDB: Waiting for purge to start

2017-03-16T16:28:18.015792Z 0 [Note] InnoDB: 5.7.14 started; log sequence number 898634772654

 

mysqld正常运行提供用户使用

2017-03-16T16:28:18.772035Z 0 [Note] mysqld: ready for connections.

Version: '5.7.14-log'  socket: '/tmp/mysql3310.sock'  port: 3310  MySQL Community Server (GPL)

 

这个时候,recovery可能还没结束,好比这个例子中,recovery就还在进行,其涉及的表仅提供ReadOnly操做,其余表格可正常操做 ,慢慢恢复,最后undo回滚事务完成,Rollback of non-prepared transactions completed

 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 1002017-03-16T16:29:24.353056Z 0 [Note] InnoDB: Rollback of trx with id 5715975 completed

2017-03-16T16:29:24.353267Z 0 [Note] InnoDB: Rollback of non-prepared transactions completed

 

整个过程16:28:13启动,16:28:18 DB提供部分服务,16:29:24全库恢复,db部分提供服务耗时5s,全库提供服务耗时71s 。

1 2017-03-16T16:28:13.812074Z 0 [Note] mysqld (mysqld 5.7.14-log) starting as process 28091 ...
 2 2017-03-16T16:28:14.250956Z 0 [Note] InnoDB: PUNCH HOLE support not available
 3 2017-03-16T16:28:14.251212Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
 4 2017-03-16T16:28:14.251311Z 0 [Note] InnoDB: Uses event mutexes
 5 2017-03-16T16:28:14.251359Z 0 [Note] InnoDB: GCC builtin __sync_synchronize() is used for memory barrier
 6 2017-03-16T16:28:14.251401Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.3
 7 2017-03-16T16:28:14.251311Z 0 [Note] InnoDB: Uses event mutexes
 8 2017-03-16T16:28:14.251359Z 0 [Note] InnoDB: GCC builtin __sync_synchronize() is used for memory barrier
 9 2017-03-16T16:28:14.251401Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.3
10 2017-03-16T16:28:14.251440Z 0 [Note] InnoDB: Using Linux native AIO
11 2017-03-16T16:28:14.252570Z 0 [Note] InnoDB: Number of pools: 1
12 2017-03-16T16:28:14.253079Z 0 [Note] InnoDB: Not using CPU crc32 instructions
13 2017-03-16T16:28:14.258163Z 0 [Note] InnoDB: Initializing buffer pool, total size = 9G, instances = 8, chunk size = 128M
14 2017-03-16T16:28:16.101862Z 0 [Note] InnoDB: Completed initialization of buffer pool
15 2017-03-16T16:28:16.406890Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
16 2017-03-16T16:28:16.447145Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
17 2017-03-16T16:28:16.572276Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 898634772645
18 2017-03-16T16:28:16.572418Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 898634772654
19 2017-03-16T16:28:16.693771Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 898634772654
20 2017-03-16T16:28:16.693880Z 0 [Note] InnoDB: Database was not shutdown normally!
21 2017-03-16T16:28:16.693913Z 0 [Note] InnoDB: Starting crash recovery.
22 2017-03-16T16:28:17.360953Z 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 100000 row operations to undo
23 2017-03-16T16:28:17.361122Z 0 [Note] InnoDB: Trx id counter is 5716480
24 2017-03-16T16:28:17.625441Z 0 [Note] InnoDB: Last MySQL binlog file position 0 37774, file name bin_log.000016
25 2017-03-16T16:28:17.787684Z 0 [Note] InnoDB: Starting in background the rollback of uncommitted transactions
26 2017-03-16T16:28:17.788022Z 0 [Note] InnoDB: Rolling back trx with id 5715975, 100000 rows to undo
27 
28 InnoDB: Progress in percents: 12017-03-16T16:28:17.788272Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
29 2017-03-16T16:28:17.788603Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
30 2017-03-16T16:28:17.788603Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
31 2017-03-16T16:28:17.788937Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
32 2017-03-16T16:28:17.831078Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
33 2017-03-16T16:28:17.834673Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
34 2017-03-16T16:28:17.834822Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
35 2017-03-16T16:28:17.838323Z 0 [Note] InnoDB: Waiting for purge to start
36 2017-03-16T16:28:18.015792Z 0 [Note] InnoDB: 5.7.14 started; log sequence number 898634772654
37 2017-03-16T16:28:18.018091Z 0 [Note] InnoDB: Loading buffer pool(s) from /data/mysql/mysql3310/data/ib_buffer_pool
38 2017-03-16T16:28:18.018464Z 0 [Note] Plugin 'FEDERATED' is disabled.
39 2017-03-16T16:28:18.119301Z 0 [Note] Recovering after a crash using /data/mysql/mysql3310/logs/bin_log
40 2017-03-16T16:28:18.119442Z 0 [Note] Starting crash recovery...
41 2017-03-16T16:28:18.119442Z 0 [Note] Starting crash recovery...
42 2017-03-16T16:28:18.119624Z 0 [Note] Crash recovery finished.
43 2017-03-16T16:28:18.203723Z 0 [Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key
44 2017-03-16T16:28:18.203835Z 0 [Note] Server hostname (bind-address): '*'; port: 3310
45 2017-03-16T16:28:18.204188Z 0 [Note] IPv6 is available.
46 2017-03-16T16:28:18.204339Z 0 [Note]   - '::' resolves to '::';
47 2017-03-16T16:28:18.204667Z 0 [Note] Server socket created on IP: '::'.
48 2017-03-16T16:28:18.377789Z 0 [Warning] 'user' entry 'mysql.sys@localhost' ignored in --skip-name-resolve mode.
49 2017-03-16T16:28:18.378054Z 0 [Warning] 'db' entry 'sys mysql.sys@localhost' ignored in --skip-name-resolve mode.
50 2017-03-16T16:28:18.378170Z 0 [Warning] 'proxies_priv' entry '@ root@localhost' ignored in --skip-name-resolve mode.
51 2017-03-16T16:28:18.392680Z 0 [Warning] 'tables_priv' entry 'sys_config mysql.sys@localhost' ignored in --skip-name-resolve mode.
52 2017-03-16T16:28:18.771122Z 0 [Note] Event Scheduler: Loaded 0 events
53 2017-03-16T16:28:18.772035Z 0 [Note] mysqld: ready for connections.
54 Version: '5.7.14-log'  socket: '/tmp/mysql3310.sock'  port: 3310  MySQL Community Server (GPL)
55 2017-03-16T16:28:21.404135Z 0 [Note] InnoDB: Buffer pool(s) load completed at 170317  0:28:21
56  2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 1002017-03-16T16:29:24.353056Z 0 [Note] InnoDB: Rollback of trx with id 5715975 completed
57 2017-03-16T16:29:24.353267Z 0 [Note] InnoDB: Rollback of non-prepared transactions completed

3.2 innodb_fast_shutdown=1,innodb_force_recovery=3

    配置说明:异常中止DB服务时,不须要完成full purge和insert buffer操做,可是缓冲池的脏数据须要刷新到磁盘;服务启动的时候,忽略检查corrupt页,也阻止主线程的运行,而且不执行事务回滚操做。

    当innodb_force_recovery 值为1-3时,容许SELECT TABLE ,DROP or CREATE tables;innodb_force_recovery 值为>=4时,5.7.17以前版本支持DROP TABLE,5.7.18后版本不支持。

    这个配置适用于这种状况:有某些大表执行alter或者大量数据修改操做生成大量undo宕机,那么这个状况下,能够进入cnf配置文件修改innodb_force_recovery=3,而后把再作undo操做的表格数据导出来或者利用备份文件的数据,再drop掉这个表格,进入cnf配置文件,修改参数为0,重启DB服务,会发现,rollback过程很是快执行结束,由于找不到对应表格,因此rollback很是快。

 

    Tips: 如何查找在undo操做的表格?

    数据库crash后,第一个正常启动恢复链接,连进数据库里边,操做全部表格,不支持insert update 的表格都属于正在undo的表。

 

 

3.2.1 操做过程

1 #1 tailf error.log查看mysql错误日志,动态滚动查看
 2 mysql> show global variables like 'log_error';
 3 tailf /data/mysql/mysql3310/data/error.log
 4 
 5 #2 测试库中开启事务,insert 10w行记录,不提交事务
 6 mysql> begin;insert into orders select * from orders.orders limit 100000;
 7 Query OK, 0 rows affected (0.00 sec)
 8 
 9 Query OK, 100000 rows affected (37.55 sec)
10 Records: 100000  Duplicates: 0  Warnings: 0
11 
12 #3 查找mysql进程号,杀进程
13 [root@localhost opt]# ps axu | grep mysql
14 [root@localhost opt]# kill -9 mysql的进程号
15 
16 #4 在cnf文件中指定innodb_force_recovery=3,启动服务,检查是否修改为功
17 [root@localhost ~]# vim /data/mysql/mysql3310.cnf
18 #添加innodb_force_recovery参数设置
19 [mysqld]
20 innodb_force_recovery=3
21 
22 #5 启动mysql服务
23 [root@localhost ~]# mysqld --defaults-file=/data/mysql/mysql3310.cnf &
24 
25 #6 到tailf error.log窗口查看错误
26 #发现没有进行undo操做,同时全库仅支持select drop create,不支持其余全部操做

 

3.2.2 error log解析

启动mysqld进程

2017-03-17T15:12:14.317391Z 0 [Note] mysqld (mysqld 5.7.14-log) starting as process 32094 ...

 

读取ibdata的LSN

读取redo文件的checkpoint,从该点的LSN开始查找MLOG_CHECKPOINT

发现MLOG_CHECKPOINT日志,提示:Database was not shutdown normally !  

2017-03-17T15:12:16.478403Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 898935661420

2017-03-17T15:12:16.478579Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 898935661429

2017-03-17T15:12:16.532923Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 898935661429

2017-03-17T15:12:16.533063Z 0 [Note] InnoDB: Database was not shutdown normally!

2017-03-17T15:12:16.533105Z 0 [Note] InnoDB: Starting crash recovery.

 

检测须要重作跟回滚的事务

因为没有须要redo的事务,可是有undo事务,开始undo操做

加载redo log中的96个系统回滚段 rollbak segment,32个不须要redo 的临时表空间的回滚段

2017-03-17T15:12:17.052734Z 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 100000 row operations to undo

 

获取binlog回滚xid列表,可是这里注意,没有作 回滚操做

2017-03-17T15:12:17.053009Z 0 [Note] InnoDB: Trx id counter is 5718016

2017-03-17T15:12:17.321987Z 0 [Note] InnoDB: Last MySQL binlog file position 0 37774, file name bin_log.000016

 

启动innodb,加载buffer pool ,警告当前 innodb_force_recovery 设置为3

2017-03-17T15:12:17.518732Z 0 [Note] InnoDB: 5.7.14 started; log sequence number 898935661429

2017-03-17T15:12:17.518789Z 0 [Note] InnoDB: !!! innodb_force_recovery is set to 3 !!!

2017-03-17T15:12:17.519305Z 0 [Note] InnoDB: Loading buffer pool(s) from /data/mysql/mysql3310/data/ib_buffer_pool

 

mysqld正常运行提供用户使用,可是,全库仅提供readonly

2017-03-17T15:12:18.521680Z 0 [Note] mysqld: ready for connections.

Version: '5.7.14-log'  socket: '/tmp/mysql3310.sock'  port: 3310  MySQL Community Server (GPL)

2017-03-17T15:13:03.125686Z 2 [ERROR] InnoDB: innodb_force_recovery is on. We do not allow database modifications by the user. Shut down mysqld and edit my.cnf to set innodb_force_recovery=0

 

    整个过程15:12:14启动,15:12:18 DB提供部分服务,耗时4s。可是因为没有作undo操做,错误日志提醒 We do not allow database modifications by the user. 不支持当前用户进行数据库修改操做,仅支持select table,drop table,create table操做,其余全部操做都不支持。

    这种状况下启动数据库后,能够支持你dump出正在作undo的表格数据,而后drop掉它,若是你有备份文件恢复,那么则不须要dump,直接drop,而后修改cnf配置文件,设置innodb_force_recovery=0 ,重启数据库服务,这个时候,innodb 的undo recovery很是快,由于找不到对应的表格操做,因此都是直接挑过处理。正常恢复后,赶忙把drop掉的表格从备份文件或者dump出来的sql文件中恢复回来。注意注意!!!!这种状况要权衡利弊,从时间耗时跟操做安全数据安全考虑,是undo recovery快仍是你如此操做快,而后必定必定要跟你的上级或者领导报备操做的风险性,不要默默干,要报告!!!

 实验2 详细error log记录

 

参考文章:

https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html

http://www.sysdb.cn/index.php/2016/01/14/innodb-recovery/?utm_source=tuicool&utm_medium=referral

http://mysql.taobao.org/monthly/2015/06/01/

相关文章
相关标签/搜索