最近遇到了一个比较奇怪的问题,在你们都在睡午觉的时候,忽然手机响了起来,我为了避免吵醒其余人拿起了手机看了看监控信息,我去,竟然是数据库down了,这是一台运行好久的数据库服务器,当我登进去服务器的时候,尝试重启mysql,可是报(Starting MySQL..... ERROR! The server quit without updating PID file (/usr/local/mysql/data/BigData_ZT_PY_92.pid).)错误,而后就去看错误日志和其余排查方法,就在排查期间忽然又来监控告警,提示xxx主机 has just been restarted,我尝试ping一下主机结果ping不通,我当场就懵逼了,服务器无故端的就本身重启了,并且后面连续重启了几回。最后联系机房人员,帮忙链接显示屏查看什么状况。html
通过一番折腾,机器终于起来了,咱们就开始排查了。查看错误日志发现mysql
InnoDB: End of page dumpsql
2018-05-23 21:10:08 7f6786710700 InnoDB: uncompressed page, stored checksum in field1 2222046951, calculated checksums for field1: crc32 2624418990, innodb 12552数据库
80539, none 3735928559, stored checksum in field2 1914065653, calculated checksums for field2: crc32 2624418990, innodb 3045085343, none 3735928559, page LSN 555缓存
2748030571, low 4 bytes of LSN at page end 2748030571, page number (if stored to page already) 84692, space id (if created with >= MySQL-4.1.1 and stored alread安全
y) 2618服务器
InnoDB: Page may be an index page where index id is 8005数据结构
InnoDB: Database page corruption on disk or a failedide
InnoDB: file read of page 84692.性能
InnoDB: You may have to recover from a backup.
InnoDB: It is also possible that your operating
InnoDB: system has corrupted its own file cache
InnoDB: and rebooting your computer removes the
InnoDB: error.
InnoDB: If the corrupt page is an index page
InnoDB: you can also try to fix the corruption
InnoDB: by dumping, dropping, and reimporting
InnoDB: the corrupt table. You can use CHECK
InnoDB: TABLE to scan your table for corruption.
InnoDB: See also http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
InnoDB: Ending processing because of a corrupt database page.
2018-05-23 21:10:08 7f6786710700 InnoDB: Assertion failure in thread 140082613913344 in file buf0buf.cc line 4201
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
13:10:08 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=0
max_threads=1024
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 415416 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
63 /usr/local/mysql/bin/mysqld(my_print_stacktrace+0x2c)[0x8f339c]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x364)[0x66e3e4]
/lib64/libpthread.so.0(+0xf5e0)[0x7f6b9c5b45e0]
/lib64/libc.so.6(gsignal+0x37)[0x7f6b9b3ba1f7]
/lib64/libc.so.6(abort+0x148)[0x7f6b9b3bb8e8]
/usr/local/mysql/bin/mysqld[0xa9c5c5]
/usr/local/mysql/bin/mysqld[0xadecd6]
/usr/local/mysql/bin/mysqld[0xa400c8]
/lib64/libpthread.so.0(+0x7e25)[0x7f6b9c5ace25]
/lib64/libc.so.6(clone+0x6d)[0x7f6b9b47d34d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
180523 21:10:09 mysqld_safe mysqld from pid file /usr/local/mysql/data/BigData_ZT_PY_92.pid ended
180523 21:44:59 mysqld_safe Starting mysqld daemon with databases from /usr/local/mysql/data
2018-05-23 21:44:59 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
以上能够看出点信息就是回滚信息的时候出错了,后来去查了一下资料发现,多是二进制文件被损坏了。
后来决定使用强制InnoDB恢复,
这里解析下用法:
[mysqld]
innodb_force_recovery = 1
警告
只有在紧急状况下将innodb_force_recovery设为大于0的值,你才能启动InnoDB并转储表。在进行此操做以前,确保你有数据库的备份副本,以备须要重建它。4及以上的值能够永久破坏数据文件。只有在数据库的独立物理副本的成功地测试了设置,才能在生产服务器实例使用4及以上的innodb_force_recovery设置。当强制InnoDB恢复,你应该老是以innodb_force_recovery=1启动,且仅在须要时增长值。
innodb_force_recovery默认为0(没有强制恢复的正常启动)。对于innodb_force_recovery容许的非零值是1至6。较大值包括较小值的功能。例如,为3的值包括全部的值1和2的功能。
若是你能以innodb_force_recovery为3或更低值转储你的表,那么你是比较安全的,只有在损坏的我的页的一些数据会丢失。4或更大的值被认为是危险的,由于数据文件能够被永久地损坏。值6被认为是严重的,数据库页被留在一个陈旧的状态,这反过来又可能带给B-trees和其它数据库结构更多的损坏。
做为一个安全措施,InnoDB 在innodb_force_recovery大于0时阻止INSERT,UPDATE或DELETE操做。对于MySQL5.6.15,将innodb_force_recovery设为4或更高会让InnoDB处于只读模式。
1 (SRV_FORCE_IGNORE_CORRUPT)
即便服务器检测到损坏的页仍让它运行。试图使SELECT* FROM tbl_name跳过损坏的索引记录和页,这样有助于转储表。
2 (SRV_FORCE_NO_BACKGROUND)
阻止主线程和任何清除线程的运行。若是崩溃会在清除操做中发生,该恢复值会阻止它。
3 (SRV_FORCE_NO_TRX_UNDO)
不要在崩溃恢复后运行事务回滚。
4 (SRV_FORCE_NO_IBUF_MERGE)
阻止插入缓冲合并操做。若是它们会致使崩溃,不要作这些。不计算表统计。这个值能够永久损坏数据文件。使用这个值后,准备号删除并重建全部辅助索引。在MySQL5.6.15中,设置InnoDB为只读。
5 (SRV_FORCE_NO_UNDO_LOG_SCAN)
在启动数据库时不查看撤消日志:InnoDB将即便未完成的事务也做为已提交。这个值能够永久损坏数据文件。在MySQL5.6.15中,设置InnoDB为只读。
6 (SRV_FORCE_NO_LOG_REDO)
不要经过恢复对重作日志进行前滚。这个值可能永久损坏数据文件。数据库页被留在一个陈旧的状态,这反过来又可能带给B-trees和其它数据库结构更多的损坏。在MySQL5.6.15中,设置InnoDB为只读。
你能够从表中SELECT来转储它们。innodb_force_recovery的值为3或更低,你能够DROP或CREATE表。在MySQL 5.6.27中,DROP TABLE还受大于3的innodb_force_recovery值支持。
若是你知道一个给定表在回滚形成崩溃,你能够将其删除。若是遇到所形成失败的大规模导入的失控回滚或ALTER TABLE,你能够杀掉mysqld进程,并设置innodb_force_recovery为3使数据库启动而不回滚,而后DROP致使失控回滚的表。
若是表数据中的损坏阻止你转储整个表的内容,带ORDER BY primary_key DESC子句的查询可以转储损坏部分后的表的部分。
若是一个高innodb_force_recovery值须要启动InnoDB,可能有被破坏的数据结构,可能致使复杂查询(含有WHERE,ORDER BY或其余子句的查询)失败。在这种状况下,你可能只能运行基本的SELECT* FROM t查询。
而后启动下数据库:
[root@databases ~]# /etc/init.d/mysql start
启动数据库之后进去数据库show slave status\G;看到从库没起来,而后把/etc/my.cnf文件中innodb_force_recovery = 1注释叼重启数据库就没问题了。
后来排查多是服务器硬件发生故障,从而使数据库被中止,也可能顺坏了二进制文件。
并且在/etc/my.cnf配置文件里面设置了
innodb_flush_log_at_trx_commit = 1
# 关键参数,0表明大约每秒写入到日志并同步到磁盘,数据库故障会丢失1秒左右事务数据。1为每执行一条SQL后写入到日志并同步到磁盘,I/O开销大,执行完SQL要等待日志读写,效率低。2表明只把日志写入到系统缓存区,再每秒同步到磁盘,效率很高,若是服务器故障,才会丢失事务数据。
假如设置为1时io性能会不好,因此这台主机只能设置为2.