文章来源: 陶老师运维笔记- 微信公众号node
- MHA 架构介绍:github.com/yoshinorim/…
- github下载地址:github.com/yoshinorim/…
MHA 简介:mysql
MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本的youshimaton开发,是一套优秀的做为MySQL高可用性环境下故障切换和主从提高的高可用软件。 在MySQL故障切换过程当中,MHA能作到在0~30秒以内自动完成数据库的故障切换操做,而且在进行故障切换的过程当中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。git
MHA优势:github
MHA不支持的场景:sql
MHA工做原理总结为如下几条: (1)从宕机崩溃的master保存二进制日志事件(binlog events); (2)识别含有最新更新的slave; (3)应用差别的中继日志(relay log) 到其余slave; (4)应用从master保存的二进制日志事件(binlog events); (5)提高一个slave为新master; (6)使用其余的slave链接新的master进行复制。数据库
在MHA自动故障切换过程当中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不老是可行的。例如,若是主服务器硬件故障或没法经过ssh访问,MHA无法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,能够大大下降数据丢失的风险。MHA能够与半同步复制结合起来。若是只有一个slave已经收到了最新的二进制日志,MHA能够将最新的二进制日志应用于其余全部的slave服务器上,所以能够保证全部节点的数据一致性。vim
主库故障:centos
这是一种最理想的状况,可是事情常不可能这样幸运。 bash
场景2:Master有事务没有同步到从库 使用了半同步复制能够避免这个风险。 服务器
场景3:部分从库缺失binlog event
主库Failover的困难点,最近的从库仍是缺失了主库binlog event。
目标实现:
保存binlog event
找出最近master 的slave:
识别出各从库丢失的event
实施恢复
MHA manager: 管理节点,一般单独部署在一台独立的服务器上,用来管理多个master/slave集群,也可部署在一台slave节点上,每一个master/slave集群称为一个application。 MHA Manager会定时探测集群中的master节点,当发现master节点出现故障时,它能够自动将具备最新数据的slave节点提高为新的master节点,而后将全部其它 的slave节点从新指向新的master节点。
MHA node: 数据节点,运行在每台MariaDB服务器上(manager/master/slave),它经过监控具有解析和清理logs功能的脚原本加快故障转移。
Manager工具:
Manager工具包:
masterha_manger 启动MHA
masterha_check_ssh 检查MHA的SSH配置情况
masterha_check_repl 检查MySQL复制情况
masterha_master_monitor 检测master是否宕机
masterha_check_status 检测当前MHA运行状态
masterha_master_switch 控制故障转移(自动或者手动)
masterha_conf_host 添加或删除配置的server信息
复制代码
Node工具:
Node工具(全部集群节点):
这些工具一般由MHA Manager的脚本触发,无需人为操做
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差别的中继日志事件并将其差别的事件应用于其余的
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
复制代码
====== monitor node 监控节点======
(1) 监控全部节点,重点是master
(2) 监控到master宕机(实例(ssh能),主机(ssh不能连))
(3) 监控主从状态
====== failover 故障转移 ======
(3) 对比各节点的GTID号码。
(3) 数据补偿1:若是ssh能连,从节点当即保存本身缺失部分的二进制日志
(4) 选主:对比各节点的GTID号码便可,选一个最接近于主库数据的从节点,恢复缺失的日志,并将从库切换为主库 stop slave reset slave all
(5) 数据补偿2:若是ssh不能连,计算两个从库的relaylog的差别,恢复到数据少的从库中.
(6) 2号从库change master to 到 新主,开启新的主从关系
====== 应用透明=====
(7) 使用vip机制实现应用透明
====== 补充功能 ======
(8) 自动修复主库(加入集群)待开发...
(9) 二次数据补偿的问题 (binlog server)
(10) 提醒功能(send_report)
(11) 权重的问题
复制代码
使用三台机器来作一个简易的MHA环境,MHA软件版本为mha-0.56。
IP | Port | DB角色 | MHA角色 | 软件版本 |
---|---|---|---|---|
192.124.64.212 | 3307 | DB1 master | mha-node | centos6,mha-0.56 |
192.124.64.213 | 3307 | DB2 slave | mha-node | centos6,mha-0.56 |
192.124.64.214 | 3307 | DB3 slave | mha-node node-manager | centos6,mha-0.56 |
安装建议:
1.manager能够单独装在任意一台机器上; 2.一个manager能够管理多套mysql集群; 3.建议不要将manager装在主库上(防止主库断电,断网); 4.全部数据库必须安装node包; 5.manager的依赖有node
#各节点执行以下操做
ssh-keygen -t rsa -P '' -f /root/.ssh/id_rsa
#
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.124.64.213
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.124.64.212
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.124.64.214
复制代码
1.安装MySQL
#使用自已写的脚本安装MySQL
mysql_install -P 3307 -r m -b 2G -v 5.6.27
复制代码
2.搭建主从关系
DB1作为主,DB2,DB3为从库
#受权
grant replication client,replication slave on *.* to 'repl'@'10.%' IDENTIFIED BY 'repl123';
grant all privileges on *.* to mha@'10.%' identified by 'mha123';
#DB2,DB3创建主从关系
CHANGE MASTER TO
MASTER_HOST='192.124.64.212',
MASTER_PORT=3307,
MASTER_USER='repl',
MASTER_PASSWORD='repl123',
MASTER_AUTO_POSITION = 1;
#
start slave ;
show slave status\G
复制代码
说明:
下载软件并安装: 全部节点(数据库master,slave,MHA manager节点)都须要安装MHA node。由于MHA manager也须要依赖MHA node。
#软件下载
mha官网:https://code.google.com/archive/p/mysql-master-ha/
github下载地址:https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
全部节点安装Node软件依赖包
yum install perl-DBD-MySQL -y
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
#在DB3节点上安装mha-manager
yum install mha4mysql-manager-0.56-0.el6.noarch.rpm
复制代码
为保证MHA正常工做,须要配置MHA的配置文件,为参数设置合理正确的值,这些参数包括服务器IP,数据库用户名密码,工做目录与日志等。 MHA源码安装,则会有两个配置文件模板,在路径 $MHA_BASE/samples/conf/ 下的app1.cnf 和 masterha_default .cnf。
建立目录:
mkdir /etc/mha/script -p
建立日志目录
mkdir -p /var/log/mha/
复制代码
编辑mha配置文件:
vim /etc/mha/mysql3307.cnf
[server default]
manager_log=/var/log/mha/mysql3307/manager
manager_workdir=/var/log/mha/mysql3307
master_binlog_dir=/data1/mysql_3307/
user=mha
password=mha123
ping_interval=2
repl_user=repl
repl_password=repl123
ssh_user=root
#master_ip_failover_script=/etc/mha/script/master_ip_failover
#shutdown_script= /etc/mha/script/power_manager
#report_script= /etc/mha/script/send_master_failover_mail
[server1]
hostname=192.124.64.212
port=3307
[server2]
hostname=192.124.64.213
port=3307
[server3]
hostname=192.124.64.214
port=3307
复制代码
1.互信检查
$masterha_check_ssh --conf=/etc/mha/mysql3307.cnf
Sat Mar 21 23:14:28 2020 - [warning] Global configuration file /etc/masterha_default .cnf not found. Skipping.
Sat Mar 21 23:14:28 2020 - [info] Reading application default configuration from /etc/mha/mysql3307.cnf..
Sat Mar 21 23:14:28 2020 - [info] Reading server configuration from /etc/mha/mysql3307.cnf..
Sat Mar 21 23:14:28 2020 - [info] Starting SSH connection tests..
...
Sat Mar 21 23:14:29 2020 - [info] All SSH connection tests passed successfully.
复制代码
2.检查复制
masterha_check_repl --conf=/etc/mha/mysql3307.cnf
Sat Mar 21 23:17:00 2020 - [warning] Global configuration file /etc/masterha_default .cnf not found. Skipping.
Sat Mar 21 23:17:00 2020 - [info] Reading application default configuration from /etc/mha/mysql3307.cnf..
Sat Mar 21 23:17:00 2020 - [info] Reading server configuration from /etc/mha/mysql3307.cnf..
Sat Mar 21 23:17:00 2020 - [info] MHA::MasterMonitor version 0.56.
Sat Mar 21 23:17:01 2020 - [info] GTID failover mode = 1
Sat Mar 21 23:17:01 2020 - [info] Dead Servers:
Sat Mar 21 23:17:01 2020 - [info] Alive Servers:
Sat Mar 21 23:17:01 2020 - [info] 192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:17:01 2020 - [info] 192.124.64.213(192.124.64.213:3307)
Sat Mar 21 23:17:01 2020 - [info] 192.124.64.214(192.124.64.214:3307)
Sat Mar 21 23:17:01 2020 - [info] Alive Slaves:
Sat Mar 21 23:17:01 2020 - [info] 192.124.64.213(192.124.64.213:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Mar 21 23:17:01 2020 - [info] GTID ON
Sat Mar 21 23:17:01 2020 - [info] Replicating from 192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:17:01 2020 - [info] 192.124.64.214(192.124.64.214:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Mar 21 23:17:01 2020 - [info] GTID ON
Sat Mar 21 23:17:01 2020 - [info] Replicating from 192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:17:01 2020 - [info] Current Alive Master: 192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:17:01 2020 - [info] Checking slave configurations..
Sat Mar 21 23:17:01 2020 - [info] read_only=1 is not set on slave 192.124.64.213(192.124.64.213:3307).
Sat Mar 21 23:17:01 2020 - [info] read_only=1 is not set on slave 192.124.64.214(192.124.64.214:3307).
Sat Mar 21 23:17:01 2020 - [info] Checking replication filtering settings..
Sat Mar 21 23:17:01 2020 - [info] binlog_do_db= , binlog_ignore_db=
Sat Mar 21 23:17:01 2020 - [info] Replication filtering check ok.
Sat Mar 21 23:17:01 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Sat Mar 21 23:17:01 2020 - [info] Checking SSH publickey authentication settings on the current master..
Warning: Permanently added '192.124.64.212' (RSA) to the list of known hosts.
Sat Mar 21 23:17:01 2020 - [info] HealthCheck: SSH to 192.124.64.212 is reachable.
Sat Mar 21 23:17:01 2020 - [info]
192.124.64.212(192.124.64.212:3307) (current master)
+--192.124.64.213(192.124.64.213:3307)
+--192.124.64.214(192.124.64.214:3307)
Sat Mar 21 23:17:01 2020 - [info] Checking replication health on 192.124.64.213..
Sat Mar 21 23:17:01 2020 - [info] ok.
Sat Mar 21 23:17:01 2020 - [info] Checking replication health on 192.124.64.214..
Sat Mar 21 23:17:01 2020 - [info] ok.
Sat Mar 21 23:17:01 2020 - [warning] master_ip_failover_script is not defined.
Sat Mar 21 23:17:01 2020 - [warning] shutdown_script is not defined.
Sat Mar 21 23:17:01 2020 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
复制代码
启动MHA:
#查看MHA manager监控状态,这里没有运行
# masterha_check_status --conf=/etc/mha/mysql3307.cnf
mysql3307 is stopped(2:NOT_RUNNING).
#启动MHA监控 --remove_dead_master_conf --ignore_last_failover
$nohup masterha_manager --conf=/etc/mha/mysql3307.cnf --remove_dead_master_conf --ignore_last_failover >> /var/log/mha/mysql3307/mha-3307.log 2>&1 &
#检查状态
$masterha_check_status --conf=/etc/mha/mysql3307.cnf
mysql3307 (pid:10265) is running(0:PING_OK), master:192.124.64.212
复制代码
中止MHA监控:
中止MHA监控
masterha_stop --conf=/etc/mha/mysql3307.cnf
复制代码
$masterha_check_status --conf=/etc/mha/mysql3307.cnf
mysql3307 (pid:10265) is running(0:PING_OK), master:192.124.64.212
$mysql -h 192.124.64.214 -P 3307 -e "set global relay_log_purge=0"
$mysql -h 192.124.64.214 -P 3307 -e "show global variables like '%relay_log_purge%'"
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| relay_log_purge | OFF |
+-----------------+-------+
$mysql -h 192.124.64.214 -P 3307 -e "show slave status\G" |egrep 'Master_Host|Master_Port|Slave_IO_Running|Slave_SQL_Running|Seconds_Behind_Master' -i
Master_Host: 192.124.64.212
Master_Port: 3307
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 0
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
复制代码
1. 测试主库故障,自动切换。
#kill master DB1
$kill mysql_pid;
复制代码
2. 查看详细日志:
观察manager 日志,末尾必须显示successfully,才算正常切换成功。 tail -f /var/log/mha/mysql3307/manager.log
cat /var/log/mha/mysql3307/manager
Sat Mar 21 23:24:59 2020 - [info] MHA::MasterMonitor version 0.56.
Sat Mar 21 23:25:01 2020 - [info] GTID failover mode = 1
Sat Mar 21 23:25:01 2020 - [info] Dead Servers:
Sat Mar 21 23:25:01 2020 - [info] Alive Servers:
Sat Mar 21 23:25:01 2020 - [info] 192.124.64.212(192.124.64.212:3307)
Sat Mar 21 23:25:01 2020 - [info] 192.124.64.213(192.124.64.213:3307)
Sat Mar 21 23:25:01 2020 - [info] 192.124.64.214(192.124.64.214:3307)
Sat Mar 21 23:25:01 2020 - [info] Alive Slaves:
Sat Mar 21 23:25:01 2020 - [info] 192.124.64.213(192.124.64.213:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Sat Mar 21 23:25:01 2020 - [info] GTID ON
...
----- Failover Report -----
mysql3307: MySQL Master failover 192.124.64.212(192.124.64.212:3307) to 192.124.64.213(192.124.64.213:3307) succeeded
Master 192.124.64.212(192.124.64.212:3307) is down!
Check MHA Manager logs at LeDB-VM-124064214:/var/log/mha/mysql3307/manager for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.124.64.212(192.124.64.212:3307)
Selected 192.124.64.213(192.124.64.213:3307) as a new master.
192.124.64.213(192.124.64.213:3307): OK: Applying all logs succeeded.
192.124.64.213(192.124.64.213:3307): OK: Activated master IP address.
192.124.64.214(192.124.64.214:3307): OK: Slave started, replicating from 192.124.64.213(192.124.64.213:3307)
192.124.64.213(192.124.64.213:3307): Resetting slave info succeeded.
Master failover to 192.124.64.213(192.124.64.213:3307) completed successfully.
复制代码
3.检查结果
DB2变为了主库,DB3成为了DB2的从库。
masterha_check_status --conf=/etc/mha/mysql3307.cnf
mysql3307 is stopped(2:NOT_RUNNING).
$mysql -h 192.124.64.213 -P 3307 -e "show slave status\G"
$mysql -h 192.124.64.214 -P 3307 -e "show slave status\G" |egrep 'Master_Host|Master_Port|Slave_IO_Running|Slave_SQL_Running|Seconds_Behind_Master' -i
Master_Host: 192.124.64.213
Master_Port: 3307
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 0
复制代码
MHA Manager 必须没有运行。 手动failover,这种场景意味着在业务上没有启用MHA自动切换功能,当主服务器故障时,人工手动调用MHA来进行故障切换操做,具体命令以下: 注意:若是,MHA manager检测到没有dead的server,将报错,并结束failover。
#当前DB2 是master。关停MHA并kill 主库192.124.64.213:3307。
#手动切换
$masterha_master_switch --master_state=dead --conf=/etc/mha/mysql3307.cnf --dead_master_host=192.124.64.213 --dead_master_port=3307 --new_master_host=192.124.64.212 --new_master_port=3307 --ignore_last_failover
复制代码
输出信息是交互式,会询问你是否进行切换:建议阅读输出以理解切换手动切换过程。
Sat Mar 21 23:55:43 2020 - [info] MHA::MasterFailover version 0.56.
Sat Mar 21 23:55:43 2020 - [info] Starting master failover.
Sat Mar 21 23:55:43 2020 - [info]
Sat Mar 21 23:55:43 2020 - [info] * Phase 1: Configuration Check Phase..
...
Sat Mar 21 23:55:46 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Mar 21 23:55:46 2020 - [info]
Sat Mar 21 23:55:46 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
----- Failover Report -----
mysql3307: MySQL Master failover 192.124.64.213(192.124.64.213:3307) to 192.124.64.212(192.124.64.212:3307) succeeded
Master 192.124.64.213(192.124.64.213:3307) is down!
Check MHA Manager logs at LeDB-VM-124064214 for details.
Started manual(interactive) failover.
Invalidated master IP address on 192.124.64.213(192.124.64.213:3307)
Selected 192.124.64.212(192.124.64.212:3307) as a new master.
192.124.64.212(192.124.64.212:3307): OK: Applying all logs succeeded.
192.124.64.212(192.124.64.212:3307): OK: Activated master IP address.
192.124.64.214(192.124.64.214:3307): OK: Slave started, replicating from 192.124.64.212(192.124.64.212:3307)
192.124.64.212(192.124.64.212:3307): Resetting slave info succeeded.
Master failover to 192.124.64.212(192.124.64.212:3307) completed successfully.
复制代码