MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就任于Facebook公司)开发,是一套优秀的做为MySQL高可用性环境下故障切换和主从提高的高可用软件。mysql
在MySQL故障切换过程当中,MHA能作到在0~30秒以内自动完成数据库的故障切换操做,而且在进行故障切换的过程当中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。linux
该软件由两部分组成:git
MHA Manager能够单独部署在一台独立的机器上管理多个master-slave集群,也能够部署在一台slave节点上。github
MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它能够自动将最新数据的slave提高为新的master,而后将全部其余的slave从新指向新的master。sql
整个故障转移过程对应用程序彻底透明。数据库
能够将MHA工做原理总结为以下vim
Manager工具包centos
组件名称 | 组件说明 |
---|---|
masterha_check_ssh | 检查MHA的SSH配置情况 |
masterha_check_repl | 检查MySQL复制情况 |
masterha_manger | 启动MHA |
masterha_check_status | 检测当前MHA运行状态 |
masterha_master_monitor | 检测master是否宕机 |
masterha_master_switch | 控制故障转移(自动或者手动) |
masterha_conf_host | 添加或删除配置的server信息 |
Node工具包api
这些工具一般由MHA Manager的脚本触发,无需人为操做
组件名称 | 组件说明 |
---|---|
save_binary_logs | 保存和复制master的二进制日志 |
apply_diff_relay_logs | 识别差别的中继日志事件并将其差别的事件应用于其余的slave |
filter_mysqlbinlog | 去除没必要要的ROLLBACK事件(MHA已再也不使用这个工具) |
purge_relay_logs | 清除中继日志(不会阻塞SQL线程) |
注意:
为了尽量的减小主库硬件损坏宕机形成的数据丢失,所以在配置MHA的同时建议配置成MySQL 5.5的半同步复制。关于半同步复制原理各位本身进行查阅。(不是必须)
操做系统 | 内核版本 | 主机名 | MySQL 版本 | ip地址 | 角色 |
---|---|---|---|---|---|
centos 7.5 | 5.1.3-1.el7 | manager.mha | MySQL 5.7.18 | 10.0.20.200 | Manager |
centos 7.5 | 5.1.3-1.el7 | node01.mha | MySQL 5.7.18 | 10.0.20.201 | node01 mysql-master |
centos 7.5 | 5.1.3-1.el7 | node02.mha | MySQL 5.7.18 | 10.0.20.202 | node02 mysql-slave |
centos 7.5 | 5.1.3-1.el7 | node03.mha | MySQL 5.7.18 | 10.0.20.203 | node03 mysql-slave |
centos 7.5 | 5.1.3-1.el7 | node04.mha | MySQL 5.7.18 | 10.0.20.204 | node04 mysql-slave |
MHA Manager 版本 | GitHub下载地址 | 百度网盘下载地址 |
---|---|---|
v0.58 | GitHub下载地址 | 百度网盘地址 提取码:lzb0 |
MHA Node 版本 | GitHub下载地址 | 百度网盘下载地址 |
---|---|---|
v0.58 | GitHub下载地址 | 百度网盘地址 提取码:4e6h |
配置全部机器相互之间
root
用户秘钥互信
在全部机器上执行:
ssh-keygen -t dsa -f ~/.ssh/id_rsa -P ""
ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.20.200 ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.20.201 ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.20.202 ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.20.203 ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.20.204
此时全部的机器之间以完成互信,无需密码等便可ssh
登录
在全部机器上执行:
yum install -y perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-CPAN perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
在全部节点上执行
[root@node01 ~]# cd /opt/soft [root@node01 soft]# ll total 639152 -rw-r--r-- 1 root root 56220 Jun 12 17:59 mha4mysql-node-0.58.tar.gz -rw-r--r-- 1 root root 654430368 Jun 11 11:21 mysql-5.7.18-linux-glibc2.5-x86_64.tar.gz
解压安装
具体命令执行输出就不复制出来了
[root@node01 soft]# tar xf mha4mysql-node-0.58.tar.gz [root@node01 soft]# cd mha4mysql-node-0.58 [root@node01 mha4mysql-node-0.58]# perl Makefile.PL [root@node01 mha4mysql-node-0.58]# make && make install
Node安装完成后会获得四个工具
[root@node01 mha4mysql-node-0.58]# ll /usr/local/bin/ total 48 -r-xr-xr-x 1 root root 17639 Jun 13 15:00 apply_diff_relay_logs -r-xr-xr-x 1 root root 4807 Jun 13 15:00 filter_mysqlbinlog -r-xr-xr-x 1 root root 8337 Jun 13 15:00 purge_relay_logs -r-xr-xr-x 1 root root 7525 Jun 13 15:00 save_binary_logs
在 Manager 节点执行安装
不用在Node节点上安装
[root@manager soft]# tar xf mha4mysql-manager-0.58.tar.gz [root@manager soft]# cd mha4mysql-manager-0.58 [root@manager mha4mysql-manager-0.58]# ls AUTHORS bin COPYING debian inc lib Makefile.PL MANIFEST META.yml README rpm samples t tests [root@manager mha4mysql-manager-0.58]# perl Makefile.PL [root@manager mha4mysql-manager-0.58]# make && make install
查看 Manager 工具
[root@manager mha4mysql-manager-0.58]# ll /usr/local/bin/ total 88 -r-xr-xr-x 1 root root 17639 Jun 13 15:10 apply_diff_relay_logs -r-xr-xr-x 1 root root 4807 Jun 13 15:10 filter_mysqlbinlog -r-xr-xr-x 1 root root 1995 Jun 13 15:13 masterha_check_repl -r-xr-xr-x 1 root root 1779 Jun 13 15:13 masterha_check_ssh -r-xr-xr-x 1 root root 1865 Jun 13 15:13 masterha_check_status -r-xr-xr-x 1 root root 3201 Jun 13 15:13 masterha_conf_host -r-xr-xr-x 1 root root 2517 Jun 13 15:13 masterha_manager -r-xr-xr-x 1 root root 2165 Jun 13 15:13 masterha_master_monitor -r-xr-xr-x 1 root root 2373 Jun 13 15:13 masterha_master_switch -r-xr-xr-x 1 root root 5172 Jun 13 15:13 masterha_secondary_check -r-xr-xr-x 1 root root 1739 Jun 13 15:13 masterha_stop -r-xr-xr-x 1 root root 8337 Jun 13 15:10 purge_relay_logs -r-xr-xr-x 1 root root 7525 Jun 13 15:10 save_binary_logs
本文章主要实现是MHA
集群,MySQL
集群直接贴命令和my.cnf
配置
在 四台 Node 节点上,实现,node01 为 master,剩下三个 node 为 slave 。
[root@node01 mysql-5.7]# rpm -qa |grep mariadb | xargs rpm -e --nodeps [root@node01 soft]# useradd -s /sbin/nologin -M mysql [root@node01 soft]# tar xf mysql-5.7.18-linux-glibc2.5-x86_64.tar.gz [root@node01 soft]# mv mysql-5.7.18-linux-glibc2.5-x86_64 mysql-5.7 [root@node01 soft]# mv mysql-5.7 /usr/local/ [root@node01 soft]# ln -s /usr/local/mysql-5.7 /usr/local/mysql [root@node01 soft]# cd /usr/local/mysql-5.7 [root@node01 mysql-5.7]# echo 'export PATH=$PATH:/usr/local/mysql-5.7/bin' >> /etc/profile [root@node01 mysql-5.7]# source /etc/profile [root@node01 mysql-5.7]# mysql -V mysql Ver 14.14 Distrib 5.7.18, for linux-glibc2.5 (x86_64) using EditLine wrapper [root@node01 mysql-5.7]# cp support-files/mysql.server /etc/init.d/mysqld [root@node01 mysql-5.7]# sed -i 's@/etc/my.cnf@/usr/local/mysql-5.7/my.cnf@g' /etc/init.d/mysqld [root@node01 mysql-5.7]# sed -i 's@/usr/local/mysql/data@/opt/mysql_data@g' /etc/init.d/mysqld [root@node01 mysql-5.7]# chkconfig mysqld on [root@node01 mysql-5.7]# mkdir /opt/mysql_data [root@node01 mysql-5.7]# chown -R mysql.mysql /usr/local/mysql-5.7 [root@node01 mysql-5.7]# chown -R mysql.mysql /opt/mysql_data [root@node01 mysql-5.7]#ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog [root@node01 mysql-5.7]#ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
my.cnf
配置文件
注意 须要把my.cnf
中的server-id
的的值四台node不能重复,不然主从会创建失败。
[root@node04 mysql-5.7]# cat my.cnf [client] socket = /tmp/mysql.sock port=3306 [mysql] default-character-set=utf8 socket = /tmp/mysql.sock [mysqld] socket = /tmp/mysql.sock character-set-server=utf8 basedir=/usr/local/mysql-5.7 datadir=/opt/mysql_data port=3306 pid-file=/opt/mysql_data/mysqld.pid # 四台node不可重复 server-id=204 skip-name-resolve default-storage-engine=INNODB explicit_defaults_for_timestamp = true gtid_mode = on enforce_gtid_consistency = 1 log_slave_updates = 1 plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so" loose_rpl_semi_sync_master_enabled = 1 loose_rpl_semi_sync_slave_enabled = 1 loose_rpl_semi_sync_master_timeout = 5000 relay-log = mysql-relay-bin replicate-wild-ignore-table=mysql.% replicate-wild-ignore-table=test.% replicate-wild-ignore-table=information_schema.% max_connections=2000 query_cache_size=0 table_open_cache=2000 tmp_table_size=246M thread_cache_size=300 thread_stack = 192k key_buffer_size=512M read_buffer_size=4M read_rnd_buffer_size=32M innodb_data_home_dir = /opt/mysql_data innodb_flush_log_at_trx_commit=0 innodb_log_buffer_size=16M # 此选项修改成实际运行mysql机器内存的%60 - %80 innodb_buffer_pool_size=13G innodb_log_file_size=128M innodb_thread_concurrency=128 innodb_autoextend_increment=1000 innodb_buffer_pool_instances=8 innodb_concurrency_tickets=5000 innodb_old_blocks_time=1000 innodb_open_files=300 innodb_stats_on_metadata=0 innodb_file_per_table=1 innodb_checksum_algorithm=0 back_log = 80 flush_time = 0 join_buffer_size = 128M max_allowed_packet = 1024M max_connect_errors = 2000 open_files_limit = 4161 query_cache_type = 0 sort_buffer_size = 32M table_definition_cache = 1400 binlog_row_event_max_size = 8K sync_master_info = 10000 sync_relay_log = 10000 sync_relay_log_info = 10000 bulk_insert_buffer_size = 64M interactive_timeout = 120 wait_timeout = 120 log-bin-trust-function-creators=1 sql_mode = NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES [mysqld_safe] log-error = /opt/mysql_data/error.log pid-file = /opt/mysql_data/mysqld.pid
node01
[root@node01 mysql-5.7]# mysqld --initialize --user=mysql --basedir=/usr/local/mysql-5.7 --datadir=/opt/mysql_data 2019-06-13T07:59:00.947482Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2019-06-13T07:59:01.056859Z 0 [Warning] InnoDB: New log files created, LSN=45790 2019-06-13T07:59:01.076218Z 0 [Warning] InnoDB: Creating foreign key constraint system tables. 2019-06-13T07:59:01.129463Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 1ae29152-8db1-11e9-9d54-005056990727. 2019-06-13T07:59:01.129873Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. 2019-06-13T07:59:01.130247Z 1 [Note] A temporary password is generated for root@localhost: 1qGoEiI7ga#U
node02
[root@node02 mysql-5.7]# mysqld --initialize --user=mysql --basedir=/usr/local/mysql-5.7 --datadir=/opt/mysql_data 2019-06-13T07:59:00.952176Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2019-06-13T07:59:01.092736Z 0 [Warning] InnoDB: New log files created, LSN=45790 2019-06-13T07:59:01.116696Z 0 [Warning] InnoDB: Creating foreign key constraint system tables. 2019-06-13T07:59:01.171324Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 1ae8f47b-8db1-11e9-b8bb-0050569972c0. 2019-06-13T07:59:01.171711Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. 2019-06-13T07:59:01.172126Z 1 [Note] A temporary password is generated for root@localhost: qTwtKAOue7:o
node03
[root@node03 mysql-5.7]# mysqld --initialize --user=mysql --basedir=/usr/local/mysql-5.7 --datadir=/opt/mysql_data 2019-06-13T07:59:00.949924Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2019-06-13T07:59:01.090890Z 0 [Warning] InnoDB: New log files created, LSN=45790 2019-06-13T07:59:01.116166Z 0 [Warning] InnoDB: Creating foreign key constraint system tables. 2019-06-13T07:59:01.171335Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 1ae8f4ef-8db1-11e9-b6ae-0050569975f7. 2019-06-13T07:59:01.171753Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. 2019-06-13T07:59:01.172159Z 1 [Note] A temporary password is generated for root@localhost: XIu,h#*HQ5&M
node04
2019-06-13T07:59:00.955598Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2019-06-13T07:59:01.090420Z 0 [Warning] InnoDB: New log files created, LSN=45790 2019-06-13T07:59:01.113972Z 0 [Warning] InnoDB: Creating foreign key constraint system tables. 2019-06-13T07:59:01.166754Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 1ae84210-8db1-11e9-b6fe-005056992c6b. 2019-06-13T07:59:01.167145Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. 2019-06-13T07:59:01.167537Z 1 [Note] A temporary password is generated for root@localhost: 26jvaV)XAy>G
执行完初始化操做后,最后会给予root的默认密码,使用此密码登录后,要第一时间修改root密码,不然不容许操做数据库;
# /etc/init.d/mysqld start Starting MySQL.Logging to '/opt/mysql_data/error.log'. .. SUCCESS!
登录MySQL 并修改密码
[root@node01 mysql-5.7]# mysql -uroot -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.18 Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> alter user user() identified by "123456"; Query OK, 0 rows affected (0.00 sec)
全部mysql增长主从用户
mysql> grant replication slave on *.* to 'repl'@'10.0.20.%' identified by '123456'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> grant all on *.* to 'root'@'%' identified by '123456'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.00 sec)
node01 的MySQL执行
mysql> show master status; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | mysql-bin.000002 | 154 | | | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec)
node0二、node0三、node04 都执行下列语句
change master to master_host='10.0.20.201',master_user='repl',master_password='123456',master_log_file='mysql-bin.000002',master_log_pos=463;
show slave status\G; #查看slave IO和slave sql是否都正常
下面开始配置Manager机器,本人的全部机器,均作了bond网卡绑定,全部机器的网卡名都为bond0,你们根据本身的网卡名称自行修改,还有发送邮件的邮箱以及微信公众号的相关配置,均须要修改成本身的。
本次是用vip 是: 10.0.20.199
你们根据本身的状况,作出对应的修改。
下面配置,均在manager机器上操做。
# 建立MHA配置文件目录 mkdir /etc/mha # 建立MHA脚本目录 mkdir /etc/mha/scripts # 建立MHA日志目录 mkdir /var/log/mha/ # 建立日志目录 mkdir /var/log/mha/app1 -p # 建立日志文件 touch /var/log/mha/app1/manager.log
[root@manager mha]# cat /etc/masterha_default.cnf [server default] user=root password=SIjiayong.123 repl_user=repl repl_password=SIjiayong.123 ssh_user=root ping_interval=1 master_binlog_dir=/opt/mysql_data manager_workdir=/var/log/mha/app1.log manager_log=/var/log/mha/manager.log master_ip_failover_script="/etc/mha/scripts/master_ip_failover" master_ip_online_change_script="/etc/mha/scripts/master_ip_online_change" report_script="/etc/mha/scripts/send_report" remote_workdir=/tmp secondary_check_script= /usr/local/bin/masterha_secondary_check -s 10.0.20.201 -s 10.0.20.202 -s 10.0.20.203 -s 10.0.20.204 shutdown_script=""
[root@manager ~]# cat /etc/mha/app1.cnf [server1] hostname=10.0.20.201 port=3306 [server2] hostname=10.0.20.202 port=3306 candidate_master=1 check_repl_delay=0 [server3] hostname=10.0.20.203 port=3306 [server4] hostname=10.0.20.204 port=3306
MHA主要配置文件说明
#为了防止脑裂发生,推荐生产环境采用脚本的方式来管理虚拟 ip,而不是使用 keepalived来完成
vim /etc/mha/scripts/master_ip_failover
#!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port ); my $vip = '10.0.20.199/24'; my $key = '1'; my $ssh_start_vip = "/sbin/ifconfig bond0:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig bond0:$key down"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; exit 0; } else { &usage(); exit 1; } } sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } sub stop_vip() { return 0 unless ($ssh_user); `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
# 安装发送邮件的工具 yum install mailx -y
mail邮件发送程序,须要先配置好发送这信息
vim /etc/mail.rc
set from=*****@163.com set smtp=smtp.163.com set smtp-auth-user=***** #拿163邮箱来讲这个不是密码,而是受权码 set smtp-auth-password=***** set smtp-auth=login
这是具体的邮件和微信发送脚本
vim /etc/mha/scripts/send_report
#!/bin/bash source /root/.bash_profile # 解析变量 orig_master_host=`echo "$1" | awk -F = '{print $2}'` new_master_host=`echo "$2" | awk -F = '{print $2}'` new_slave_hosts=`echo "$3" | awk -F = '{print $2}'` subject=`echo "$4" | awk -F = '{print $2}'` body=`echo "$5" | awk -F = '{print $2}'` #定义收件人地址 email="***@***.com" # 下面这俩个须要微信公众号中自行获取 CropID='******************' Secret='***************************************' GURL="https://qyapi.weixin.qq.com/cgi-bin/gettoken?corpid=$CropID&corpsecret=$Secret" Gtoken=$(/usr/bin/curl -s -G $GURL | awk -F\" '{print $10}') PURL="https://qyapi.weixin.qq.com/cgi-bin/message/send?access_token=$Gtoken" function body() { #企业号中的应用id local int AppID=1000002 #部门成员id, local UserID=$1 #部门id,定义了范围,组内成员均可接收到消息 local PartyID='2|3' #过滤出zabbix传递的第三个参数 local Msg=$(echo "$@" | cut -d" " -f3-) printf '{\n' printf '\t"touser": "'"$UserID"\"",\n" printf '\t"toparty": "'"$PartyID"\"",\n" printf '\t"msgtype": "text",\n' printf '\t"agentid": "'" $AppID "\"",\n" printf '\t"text": {\n' printf '\t\t"content": "'"$Msg"\""\n" printf '\t},\n' printf '\t"safe":"0"\n' printf '}\n' } tac /var/log/mha/app1/manager.log | sed -n 2p | grep 'successfully' > /dev/null if [ $? -eq 0 ] then messages=`echo -e "MHA $subject 主从切换成功\n master:$orig_master_host --> $new_master_host \n $body \n 当前从库:$new_slave_hosts"` echo "$messages" | mail -s "Mysql 实例宕掉,MHA $subject 切换成功" $email >>/tmp/mailx.log 2>&1 /usr/bin/curl --data-ascii "$(body 1 1 ${messages})" ${PURL} else messages=`echo -e "MHA $subject 主从切换失败\n master:$orig_master_host --> $new_master_host \n $body" ` echo "$messages" | mail -s ""Mysql 实例宕掉,MHA $subject 切换失败"" $email >>/tmp/mailx.log 2>&1 /usr/bin/curl --data-ascii "$(body 1 1 ${messages})" ${PURL} fi
vim /etc/mha/scripts/master_ip_online_change
#!/bin/bash source /root/.bash_profile vip=`echo '10.0.20.199/24'` #设置VIP key=`echo '1'` command=`echo "$1" | awk -F = '{print $2}'` orig_master_host=`echo "$2" | awk -F = '{print $2}'` new_master_host=`echo "$7" | awk -F = '{print $2}'` orig_master_ssh_user=`echo "${12}" | awk -F = '{print $2}'` new_master_ssh_user=`echo "${13}" | awk -F = '{print $2}'` #要求服务的网卡识别名同样 stop_vip=`echo "ssh root@$orig_master_host /usr/sbin/ifconfig bond0:$key down"` start_vip=`echo "ssh root@$new_master_host /usr/sbin/ifconfig bond0:$key $vip"` if [ $command = 'stop' ] then echo -e "\n\n\n****************************\n" echo -e "Disabled thi VIP - $vip on old master: $orig_master_host \n" $stop_vip if [ $? -eq 0 ] then echo "Disabled the VIP successfully" else echo "Disabled the VIP failed" fi echo -e "***************************\n\n\n" fi if [ $command = 'start' -o $command = 'status' ] then echo -e "\n\n\n*************************\n" echo -e "Enabling the VIP - $vip on new master: $new_master_host \n" $start_vip if [ $? -eq 0 ] then echo "Enabled the VIP successfully" else echo "Enabled the VIP failed" fi echo -e "***************************\n\n\n" fi
最后给刚刚配置的三个脚本增长执行权限
chmod +x /etc/mha/scripts/master_ip_failover chmod +x /etc/mha/scripts/master_ip_online_change chmod +x /etc/mha/scripts/send_report
经过 masterha_check_ssh 命令验证
[root@manager scripts]# masterha_check_ssh --conf=/etc/mha/app1.cnf # 最后出现如下提示,则表示经过 Thu Jun 13 17:19:34 2019 - [info] All SSH connection tests passed successfully.
经过 masterha_check_repl 命令验证
[root@manager mha]# vim /etc/masterha_default.cnf # 最后出现如下提示,则表示经过 MySQL Replication Health is OK.
本次在node01 上操做
先在node01 的 MySQL master上绑定vip,
只须要在master绑定这一次,之后会自动切换
[root@node01 mysql-5.7]# ip a | grep 20 inet 10.0.20.201/24 brd 10.0.20.255 scope global bond0 inet 10.0.20.199/24 brd 10.0.20.255 scope global secondary bond0:1
这一步在manager上操做
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
检查 MHA 状态
[root@manager mha]# masterha_check_status --conf=/etc/mha/app1.cnf app1 (pid:4745) is running(0:PING_OK), master:10.0.20.201
MHA 的日志保存在/var/log/masterha/app1/manager.log 下
[root@manager mha]# tailf /var/log/mha/manager.log #若是最后一行是以下,代表启动成功 Thu Jun 13 17:31:41 2019 - [info] Starting ping health check on 10.0.20.201(10.0.20.201:3306).. Thu Jun 13 17:31:41 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
若已处于监控状态,须要停掉它
masterha_stop --conf=/etc/mha/app1.cnf
手动中止node01 的 MySQL master,而后查看其它节点状况。
[root@node01 ~]# /etc/init.d/mysqld stop Shutting down MySQL............ SUCCESS! [root@node01 ~]# ip a | grep 20 inet 10.0.20.201/24 brd 10.0.20.255 scope global bond0
在node02 上查看VIP
[root@node02 ~]# ip a | grep 20 inet 10.0.20.202/24 brd 10.0.20.255 scope global bond0 inet 10.0.20.199/24 brd 10.0.20.255 scope global secondary bond0:1
在node03 上查看主从同步状态和地址
[root@node03 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running' mysql: [Warning] Using a password on the command line interface can be insecure. Master_Host: 10.0.20.202 Slave_IO_Running: Yes Slave_SQL_Running: Yes Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
在node04 上查看主从同步状态和地址
[root@node04 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running' mysql: [Warning] Using a password on the command line interface can be insecure. Master_Host: 10.0.20.202 Slave_IO_Running: Yes Slave_SQL_Running: Yes Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
查看Manager日志
[root@manager mha]# tailf manager.log Fri Jun 14 10:01:03 2019 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) Fri Jun 14 10:01:03 2019 - [info] Executing SSH check script: exit 0 Fri Jun 14 10:01:03 2019 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 10.0.20.201 -s 10.0.20.202 -s 10.0.20.203 -s 10.0.20.204 --user=root --master_host=10.0.20.201 --master_ip=10.0.20.201 --master_port=3306 --master_user=root --master_password=123456 --ping_type=SELECT Fri Jun 14 10:01:03 2019 - [info] HealthCheck: SSH to 10.0.20.201 is reachable. Monitoring server 10.0.20.201 is reachable, Master is not reachable from 10.0.20.201. OK. Monitoring server 10.0.20.202 is reachable, Master is not reachable from 10.0.20.202. OK. Monitoring server 10.0.20.203 is reachable, Master is not reachable from 10.0.20.203. OK. Fri Jun 14 10:01:04 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.20.201' (111)) Fri Jun 14 10:01:04 2019 - [warning] Connection failed 2 time(s).. Monitoring server 10.0.20.204 is reachable, Master is not reachable from 10.0.20.204. OK. Fri Jun 14 10:01:04 2019 - [info] Master is not reachable from all other monitoring servers. Failover should start. Fri Jun 14 10:01:05 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.20.201' (111)) Fri Jun 14 10:01:05 2019 - [warning] Connection failed 3 time(s).. Fri Jun 14 10:01:06 2019 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.20.201' (111)) Fri Jun 14 10:01:06 2019 - [warning] Connection failed 4 time(s).. Fri Jun 14 10:01:06 2019 - [warning] Master is not reachable from health checker! Fri Jun 14 10:01:06 2019 - [warning] Master 10.0.20.201(10.0.20.201:3306) is not reachable! Fri Jun 14 10:01:06 2019 - [warning] SSH is reachable. Fri Jun 14 10:01:06 2019 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect to all servers to check server status.. Fri Jun 14 10:01:06 2019 - [info] Reading default configuration from /etc/masterha_default.cnf.. Fri Jun 14 10:01:06 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Fri Jun 14 10:01:06 2019 - [info] Reading server configuration from /etc/mha/app1.cnf.. Fri Jun 14 10:01:07 2019 - [info] GTID failover mode = 1 Fri Jun 14 10:01:07 2019 - [info] Dead Servers: Fri Jun 14 10:01:07 2019 - [info] 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:07 2019 - [info] Alive Servers: Fri Jun 14 10:01:07 2019 - [info] 10.0.20.202(10.0.20.202:3306) Fri Jun 14 10:01:07 2019 - [info] 10.0.20.203(10.0.20.203:3306) Fri Jun 14 10:01:07 2019 - [info] 10.0.20.204(10.0.20.204:3306) Fri Jun 14 10:01:07 2019 - [info] Alive Slaves: Fri Jun 14 10:01:07 2019 - [info] 10.0.20.202(10.0.20.202:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:07 2019 - [info] GTID ON Fri Jun 14 10:01:07 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:07 2019 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 14 10:01:07 2019 - [info] 10.0.20.203(10.0.20.203:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:07 2019 - [info] GTID ON Fri Jun 14 10:01:07 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:07 2019 - [info] 10.0.20.204(10.0.20.204:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:07 2019 - [info] GTID ON Fri Jun 14 10:01:07 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:07 2019 - [info] Checking slave configurations.. Fri Jun 14 10:01:07 2019 - [info] read_only=1 is not set on slave 10.0.20.202(10.0.20.202:3306). Fri Jun 14 10:01:07 2019 - [info] read_only=1 is not set on slave 10.0.20.203(10.0.20.203:3306). Fri Jun 14 10:01:07 2019 - [info] read_only=1 is not set on slave 10.0.20.204(10.0.20.204:3306). Fri Jun 14 10:01:07 2019 - [info] Checking replication filtering settings.. Fri Jun 14 10:01:07 2019 - [info] Replication filtering check ok. Fri Jun 14 10:01:07 2019 - [info] Master is down! Fri Jun 14 10:01:07 2019 - [info] Terminating monitoring script. Fri Jun 14 10:01:07 2019 - [info] Got exit code 20 (Master dead). Fri Jun 14 10:01:07 2019 - [info] MHA::MasterFailover version 0.58. Fri Jun 14 10:01:07 2019 - [info] Starting master failover. Fri Jun 14 10:01:07 2019 - [info] Fri Jun 14 10:01:07 2019 - [info] * Phase 1: Configuration Check Phase.. Fri Jun 14 10:01:07 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] GTID failover mode = 1 Fri Jun 14 10:01:08 2019 - [info] Dead Servers: Fri Jun 14 10:01:08 2019 - [info] 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Checking master reachability via MySQL(double check)... Fri Jun 14 10:01:08 2019 - [info] ok. Fri Jun 14 10:01:08 2019 - [info] Alive Servers: Fri Jun 14 10:01:08 2019 - [info] 10.0.20.202(10.0.20.202:3306) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.203(10.0.20.203:3306) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.204(10.0.20.204:3306) Fri Jun 14 10:01:08 2019 - [info] Alive Slaves: Fri Jun 14 10:01:08 2019 - [info] 10.0.20.202(10.0.20.202:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.203(10.0.20.203:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.204(10.0.20.204:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Starting GTID based failover. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] ** Phase 1: Configuration Check Phase completed. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 2: Dead Master Shutdown Phase.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] Forcing shutdown so that applications never connect to the current master.. Fri Jun 14 10:01:08 2019 - [info] Executing master IP deactivation script: Fri Jun 14 10:01:08 2019 - [info] /etc/mha/scripts/master_ip_failover --orig_master_host=10.0.20.201 --orig_master_ip=10.0.20.201 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig bond0:1 down==/sbin/ifconfig bond0:1 10.0.20.199/24=== Disabling the VIP on old master: 10.0.20.201 Fri Jun 14 10:01:08 2019 - [info] done. Fri Jun 14 10:01:08 2019 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Fri Jun 14 10:01:08 2019 - [info] * Phase 2: Dead Master Shutdown Phase completed. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 3: Master Recovery Phase.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] The latest binary log file/position on all slaves is mysql-bin.000004:194 Fri Jun 14 10:01:08 2019 - [info] Retrieved Gtid Set: 6211616e-8db3-11e9-be15-005056990727:3-5 Fri Jun 14 10:01:08 2019 - [info] Latest slaves (Slaves that received relay log files to the latest): Fri Jun 14 10:01:08 2019 - [info] 10.0.20.202(10.0.20.202:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.203(10.0.20.203:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.204(10.0.20.204:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] The oldest binary log file/position on all slaves is mysql-bin.000004:194 Fri Jun 14 10:01:08 2019 - [info] Retrieved Gtid Set: 6211616e-8db3-11e9-be15-005056990727:3-5 Fri Jun 14 10:01:08 2019 - [info] Oldest slaves: Fri Jun 14 10:01:08 2019 - [info] 10.0.20.202(10.0.20.202:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.203(10.0.20.203:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] 10.0.20.204(10.0.20.204:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 3.3: Determining New Master Phase.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] Searching new master from slaves.. Fri Jun 14 10:01:08 2019 - [info] Candidate masters from the configuration file: Fri Jun 14 10:01:08 2019 - [info] 10.0.20.202(10.0.20.202:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 10:01:08 2019 - [info] GTID ON Fri Jun 14 10:01:08 2019 - [info] Replicating from 10.0.20.201(10.0.20.201:3306) Fri Jun 14 10:01:08 2019 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 14 10:01:08 2019 - [info] Non-candidate masters: Fri Jun 14 10:01:08 2019 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Fri Jun 14 10:01:08 2019 - [info] New master is 10.0.20.202(10.0.20.202:3306) Fri Jun 14 10:01:08 2019 - [info] Starting master failover.. Fri Jun 14 10:01:08 2019 - [info] From: 10.0.20.201(10.0.20.201:3306) (current master) +--10.0.20.202(10.0.20.202:3306) +--10.0.20.203(10.0.20.203:3306) +--10.0.20.204(10.0.20.204:3306) To: 10.0.20.202(10.0.20.202:3306) (new master) +--10.0.20.203(10.0.20.203:3306) +--10.0.20.204(10.0.20.204:3306) Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 3.3: New Master Recovery Phase.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] Waiting all logs to be applied.. Fri Jun 14 10:01:08 2019 - [info] done. Fri Jun 14 10:01:08 2019 - [info] Getting new master's binlog name and position.. Fri Jun 14 10:01:08 2019 - [info] mysql-bin.000002:194 Fri Jun 14 10:01:08 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.20.202', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Fri Jun 14 10:01:08 2019 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000002, 194, 6211616e-8db3-11e9-be15-005056990727:4-5 Fri Jun 14 10:01:08 2019 - [info] Executing master IP activate script: Fri Jun 14 10:01:08 2019 - [info] /etc/mha/scripts/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.0.20.201 --orig_master_ip=10.0.20.201 --orig_master_port=3306 --new_master_host=10.0.20.202 --new_master_ip=10.0.20.202 --new_master_port=3306 --new_master_user='root' --new_master_password=xxx Unknown option: new_master_user Unknown option: new_master_password IN SCRIPT TEST====/sbin/ifconfig bond0:1 down==/sbin/ifconfig bond0:1 10.0.20.199/24=== Enabling the VIP - 10.0.20.199/24 on the new master - 10.0.20.202 Fri Jun 14 10:01:08 2019 - [info] OK. Fri Jun 14 10:01:08 2019 - [info] ** Finished master recovery successfully. Fri Jun 14 10:01:08 2019 - [info] * Phase 3: Master Recovery Phase completed. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 4: Slaves Recovery Phase.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] * Phase 4.1: Starting Slaves in parallel.. Fri Jun 14 10:01:08 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] -- Slave recovery on host 10.0.20.203(10.0.20.203:3306) started, pid: 2838. Check tmp log /var/log/mha/10.0.20.203_3306_20190614100107.log if it takes time.. Fri Jun 14 10:01:08 2019 - [info] -- Slave recovery on host 10.0.20.204(10.0.20.204:3306) started, pid: 2839. Check tmp log /var/log/mha/10.0.20.204_3306_20190614100107.log if it takes time.. Fri Jun 14 10:01:09 2019 - [info] Fri Jun 14 10:01:09 2019 - [info] Log messages from 10.0.20.204 ... Fri Jun 14 10:01:09 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] Resetting slave 10.0.20.204(10.0.20.204:3306) and starting replication from the new master 10.0.20.202(10.0.20.202:3306).. Fri Jun 14 10:01:08 2019 - [info] Executed CHANGE MASTER. Fri Jun 14 10:01:08 2019 - [info] Slave started. Fri Jun 14 10:01:08 2019 - [info] gtid_wait(6211616e-8db3-11e9-be15-005056990727:4-5) completed on 10.0.20.204(10.0.20.204:3306). Executed 0 events. Fri Jun 14 10:01:09 2019 - [info] End of log messages from 10.0.20.204. Fri Jun 14 10:01:09 2019 - [info] -- Slave on host 10.0.20.204(10.0.20.204:3306) started. Fri Jun 14 10:01:10 2019 - [info] Fri Jun 14 10:01:10 2019 - [info] Log messages from 10.0.20.203 ... Fri Jun 14 10:01:10 2019 - [info] Fri Jun 14 10:01:08 2019 - [info] Resetting slave 10.0.20.203(10.0.20.203:3306) and starting replication from the new master 10.0.20.202(10.0.20.202:3306).. Fri Jun 14 10:01:08 2019 - [info] Executed CHANGE MASTER. Fri Jun 14 10:01:09 2019 - [info] Slave started. Fri Jun 14 10:01:09 2019 - [info] gtid_wait(6211616e-8db3-11e9-be15-005056990727:4-5) completed on 10.0.20.203(10.0.20.203:3306). Executed 0 events. Fri Jun 14 10:01:10 2019 - [info] End of log messages from 10.0.20.203. Fri Jun 14 10:01:10 2019 - [info] -- Slave on host 10.0.20.203(10.0.20.203:3306) started. Fri Jun 14 10:01:10 2019 - [info] All new slave servers recovered successfully. Fri Jun 14 10:01:10 2019 - [info] Fri Jun 14 10:01:10 2019 - [info] * Phase 5: New master cleanup phase.. Fri Jun 14 10:01:10 2019 - [info] Fri Jun 14 10:01:10 2019 - [info] Resetting slave info on the new master.. Fri Jun 14 10:01:10 2019 - [info] 10.0.20.202: Resetting slave info succeeded. Fri Jun 14 10:01:10 2019 - [info] Master failover to 10.0.20.202(10.0.20.202:3306) completed successfully. Fri Jun 14 10:01:10 2019 - [info] Deleted server1 entry from /etc/mha/app1.cnf . Fri Jun 14 10:01:10 2019 - [info] ----- Failover Report ----- app1: MySQL Master failover 10.0.20.201(10.0.20.201:3306) to 10.0.20.202(10.0.20.202:3306) succeeded Master 10.0.20.201(10.0.20.201:3306) is down! Check MHA Manager logs at manager.mha:/var/log/mha/manager.log for details. Started automated(non-interactive) failover. Invalidated master IP address on 10.0.20.201(10.0.20.201:3306) Selected 10.0.20.202(10.0.20.202:3306) as a new master. 10.0.20.202(10.0.20.202:3306): OK: Applying all logs succeeded. 10.0.20.202(10.0.20.202:3306): OK: Activated master IP address. 10.0.20.204(10.0.20.204:3306): OK: Slave started, replicating from 10.0.20.202(10.0.20.202:3306) 10.0.20.203(10.0.20.203:3306): OK: Slave started, replicating from 10.0.20.202(10.0.20.202:3306) 10.0.20.202(10.0.20.202:3306): Resetting slave info succeeded. Master failover to 10.0.20.202(10.0.20.202:3306) completed successfully. Fri Jun 14 10:01:10 2019 - [info] Sending mail.. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 347 100 45 100 302 133 897 --:--:-- --:--:-- --:--:-- 898
由上面的日志以及各节点状态看出,vip
已经自动漂移到node02
的服务器上,而且node02
自动提高为主库,node03
和 node04
自动同步node02
的库。
同时也收到了微信和邮件告警。
从上面的输出能够看出整个 MHA 的切换过程,共包括如下的步骤:
切换完成后,关注以下变化:
模拟宕机的时候,中止了MySQL进程,如今从新启动MySQL,并加入到Node02 的从库中
[root@node02 ~]# mysql -uroot -p123456 -e 'show master status\G' mysql: [Warning] Using a password on the command line interface can be insecure. *************************** 1. row *************************** File: mysql-bin.000002 Position: 194 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 6211616e-8db3-11e9-be15-005056990727:4-5
mysql> change master to master_host='10.0.20.202',master_user='repl',master_password='123456',master_log_file='mysql-bin.000002',master_log_pos=194; Query OK, 0 rows affected, 2 warnings (0.00 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> exit Bye [root@node01 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running' mysql: [Warning] Using a password on the command line interface can be insecure. Master_Host: 10.0.20.202 Slave_IO_Running: Yes Slave_SQL_Running: Yes Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
须要注意的是,当发生宕机切换后,manager
中的MHA
进程会自动中止,在修复后,须要手动再次启动
当发生宕机切换,MHA
会自动把宕机的信息从app1.cnf
配置文件中删除,修复后机器,要把信息从新写入到app1.cnf
中。
[root@manager mha]# pwd /etc/mha [root@manager mha]# cat app1.cnf [server2] candidate_master=1 check_repl_delay=0 hostname=10.0.20.202 port=3306 [server3] hostname=10.0.20.203 port=3306 [server4] hostname=10.0.20.204 port=3306
[root@manager mha]# cat app1.cnf [server1] candidate_master=1 check_repl_delay=0 hostname=10.0.20.201 [server2] hostname=10.0.20.202 port=3306 [server3] hostname=10.0.20.203 port=3306 [server4] hostname=10.0.20.204 port=3306
修改好配置文件后,再次启动MHA便可
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
此时修复完成。
在许多状况下, 须要将现有的主服务器迁移到另一台服务器上。 好比主服务器硬件故障,RAID 控制卡须要重建,将主服务器移到性能更好的服务器上等等。维护主服务器引发性能降低, 致使停机时间至少没法写入数据。 另外, 阻塞或杀掉当前运行的会话会致使主主之间数据不一致的问题发生。 MHA 提供快速切换和优雅的阻塞写入,这个切换过程只须要 0.5-2s 的时间,这段时间内数据是没法写入的。在不少状况下,0.5-2s 的阻塞写入是能够接受的。所以切换主服务器不须要计划分配维护时间窗口。
MHA在线切换的大概过程:
注意,在线切换的时候应用架构须要考虑如下两个问题:
为了保证数据彻底一致性,在最快的时间内完成切换,MHA的在线切换必须知足如下条件才会切换成功,不然会切换失败。
[root@manager mha]# masterha_stop --conf=/etc/mha/app1.cnf Stopped app1 successfully. [1]+ Exit 1 nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1
进行在线切换操做
模拟在线切换主库操做,原主库10.0.20.202变为slave,10.0.20.201提高为新的主库
上一次进行了模拟宕机测试,最开始的主库是201,切换到了202位主库了
[root@manager mha]# masterha_master_switch --conf=/etc/mha/app1.cnf --master_state=alive --new_master_host=10.0.20.201 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
执行后输出的日志以下:
Fri Jun 14 11:30:26 2019 - [info] MHA::MasterRotate version 0.58. Fri Jun 14 11:30:26 2019 - [info] Starting online master switch.. Fri Jun 14 11:30:26 2019 - [info] Fri Jun 14 11:30:26 2019 - [info] * Phase 1: Configuration Check Phase.. Fri Jun 14 11:30:26 2019 - [info] Fri Jun 14 11:30:26 2019 - [info] Reading default configuration from /etc/masterha_default.cnf.. Fri Jun 14 11:30:26 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Fri Jun 14 11:30:26 2019 - [info] Reading server configuration from /etc/mha/app1.cnf.. Fri Jun 14 11:30:27 2019 - [info] GTID failover mode = 1 Fri Jun 14 11:30:27 2019 - [info] Current Alive Master: 10.0.20.202(10.0.20.202:3306) Fri Jun 14 11:30:27 2019 - [info] Alive Slaves: Fri Jun 14 11:30:27 2019 - [info] 10.0.20.201(10.0.20.201:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 11:30:27 2019 - [info] GTID ON Fri Jun 14 11:30:27 2019 - [info] Replicating from 10.0.20.202(10.0.20.202:3306) Fri Jun 14 11:30:27 2019 - [info] Primary candidate for the new Master (candidate_master is set) Fri Jun 14 11:30:27 2019 - [info] 10.0.20.203(10.0.20.203:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 11:30:27 2019 - [info] GTID ON Fri Jun 14 11:30:27 2019 - [info] Replicating from 10.0.20.202(10.0.20.202:3306) Fri Jun 14 11:30:27 2019 - [info] 10.0.20.204(10.0.20.204:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled Fri Jun 14 11:30:27 2019 - [info] GTID ON Fri Jun 14 11:30:27 2019 - [info] Replicating from 10.0.20.202(10.0.20.202:3306) Fri Jun 14 11:30:27 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] Checking MHA is not monitoring or doing failover.. Fri Jun 14 11:30:27 2019 - [info] Checking replication health on 10.0.20.201.. Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] Checking replication health on 10.0.20.203.. Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] Checking replication health on 10.0.20.204.. Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] 10.0.20.201 can be new master. Fri Jun 14 11:30:27 2019 - [info] From: 10.0.20.202(10.0.20.202:3306) (current master) +--10.0.20.201(10.0.20.201:3306) +--10.0.20.203(10.0.20.203:3306) +--10.0.20.204(10.0.20.204:3306) To: 10.0.20.201(10.0.20.201:3306) (new master) +--10.0.20.203(10.0.20.203:3306) +--10.0.20.204(10.0.20.204:3306) +--10.0.20.202(10.0.20.202:3306) Fri Jun 14 11:30:27 2019 - [info] Checking whether 10.0.20.201(10.0.20.201:3306) is ok for the new master.. Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] 10.0.20.202(10.0.20.202:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. Fri Jun 14 11:30:27 2019 - [info] 10.0.20.202(10.0.20.202:3306): Resetting slave pointing to the dummy host. Fri Jun 14 11:30:27 2019 - [info] ** Phase 1: Configuration Check Phase completed. Fri Jun 14 11:30:27 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] * Phase 2: Rejecting updates Phase.. Fri Jun 14 11:30:27 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] Executing master ip online change script to disable write on the current master: Fri Jun 14 11:30:27 2019 - [info] /etc/mha/scripts/master_ip_online_change --command=stop --orig_master_host=10.0.20.202 --orig_master_ip=10.0.20.202 --orig_master_port=3306 --orig_master_user='root' --new_master_host=10.0.20.201 --new_master_ip=10.0.20.201 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx **************************** Disabled thi VIP - 10.0.20.199/24 on old master: 10.0.20.202 Disabled the VIP successfully *************************** Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Fri Jun 14 11:30:27 2019 - [info] Executing FLUSH TABLES WITH READ LOCK.. Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] Orig master binlog:pos is mysql-bin.000002:194. Fri Jun 14 11:30:27 2019 - [info] Waiting to execute all relay logs on 10.0.20.201(10.0.20.201:3306).. Fri Jun 14 11:30:27 2019 - [info] master_pos_wait(mysql-bin.000002:194) completed on 10.0.20.201(10.0.20.201:3306). Executed 0 events. Fri Jun 14 11:30:27 2019 - [info] done. Fri Jun 14 11:30:27 2019 - [info] Getting new master's binlog name and position.. Fri Jun 14 11:30:27 2019 - [info] mysql-bin.000005:194 Fri Jun 14 11:30:27 2019 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.20.201', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Fri Jun 14 11:30:27 2019 - [info] Executing master ip online change script to allow write on the new master: Fri Jun 14 11:30:27 2019 - [info] /etc/mha/scripts/master_ip_online_change --command=start --orig_master_host=10.0.20.202 --orig_master_ip=10.0.20.202 --orig_master_port=3306 --orig_master_user='root' --new_master_host=10.0.20.201 --new_master_ip=10.0.20.201 --new_master_port=3306 --new_master_user='root' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx ************************* Enabling the VIP - 10.0.20.199/24 on new master: 10.0.20.201 Enabled the VIP successfully *************************** Fri Jun 14 11:30:27 2019 - [info] ok. Fri Jun 14 11:30:27 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] * Switching slaves in parallel.. Fri Jun 14 11:30:27 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] -- Slave switch on host 10.0.20.203(10.0.20.203:3306) started, pid: 7081 Fri Jun 14 11:30:27 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] -- Slave switch on host 10.0.20.204(10.0.20.204:3306) started, pid: 7082 Fri Jun 14 11:30:27 2019 - [info] Fri Jun 14 11:30:29 2019 - [info] Log messages from 10.0.20.203 ... Fri Jun 14 11:30:29 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] Waiting to execute all relay logs on 10.0.20.203(10.0.20.203:3306).. Fri Jun 14 11:30:27 2019 - [info] master_pos_wait(mysql-bin.000002:194) completed on 10.0.20.203(10.0.20.203:3306). Executed 0 events. Fri Jun 14 11:30:27 2019 - [info] done. Fri Jun 14 11:30:27 2019 - [info] Resetting slave 10.0.20.203(10.0.20.203:3306) and starting replication from the new master 10.0.20.201(10.0.20.201:3306).. Fri Jun 14 11:30:27 2019 - [info] Executed CHANGE MASTER. Fri Jun 14 11:30:28 2019 - [info] Slave started. Fri Jun 14 11:30:29 2019 - [info] End of log messages from 10.0.20.203 ... Fri Jun 14 11:30:29 2019 - [info] Fri Jun 14 11:30:29 2019 - [info] -- Slave switch on host 10.0.20.203(10.0.20.203:3306) succeeded. Fri Jun 14 11:30:29 2019 - [info] Log messages from 10.0.20.204 ... Fri Jun 14 11:30:29 2019 - [info] Fri Jun 14 11:30:27 2019 - [info] Waiting to execute all relay logs on 10.0.20.204(10.0.20.204:3306).. Fri Jun 14 11:30:27 2019 - [info] master_pos_wait(mysql-bin.000002:194) completed on 10.0.20.204(10.0.20.204:3306). Executed 0 events. Fri Jun 14 11:30:27 2019 - [info] done. Fri Jun 14 11:30:27 2019 - [info] Resetting slave 10.0.20.204(10.0.20.204:3306) and starting replication from the new master 10.0.20.201(10.0.20.201:3306).. Fri Jun 14 11:30:27 2019 - [info] Executed CHANGE MASTER. Fri Jun 14 11:30:28 2019 - [info] Slave started. Fri Jun 14 11:30:29 2019 - [info] End of log messages from 10.0.20.204 ... Fri Jun 14 11:30:29 2019 - [info] Fri Jun 14 11:30:29 2019 - [info] -- Slave switch on host 10.0.20.204(10.0.20.204:3306) succeeded. Fri Jun 14 11:30:29 2019 - [info] Unlocking all tables on the orig master: Fri Jun 14 11:30:29 2019 - [info] Executing UNLOCK TABLES.. Fri Jun 14 11:30:29 2019 - [info] ok. Fri Jun 14 11:30:29 2019 - [info] Starting orig master as a new slave.. Fri Jun 14 11:30:29 2019 - [info] Resetting slave 10.0.20.202(10.0.20.202:3306) and starting replication from the new master 10.0.20.201(10.0.20.201:3306).. Fri Jun 14 11:30:29 2019 - [info] Executed CHANGE MASTER. Fri Jun 14 11:30:30 2019 - [info] Slave started. Fri Jun 14 11:30:30 2019 - [info] All new slave servers switched successfully. Fri Jun 14 11:30:30 2019 - [info] Fri Jun 14 11:30:30 2019 - [info] * Phase 5: New master cleanup phase.. Fri Jun 14 11:30:30 2019 - [info] Fri Jun 14 11:30:30 2019 - [info] 10.0.20.201: Resetting slave info succeeded. Fri Jun 14 11:30:30 2019 - [info] Switching master to 10.0.20.201(10.0.20.201:3306) completed successfully.
node01
[root@node01 ~]# mysql -uroot -p123456 -e 'show slave status\G' mysql: [Warning] Using a password on the command line interface can be insecure. [root@node01 ~]# ip a | grep 20 inet 10.0.20.201/24 brd 10.0.20.255 scope global bond0 inet 10.0.20.199/24 brd 10.0.20.255 scope global secondary bond0:1
node02
[root@node02 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running' mysql: [Warning] Using a password on the command line interface can be insecure. Master_Host: 10.0.20.201 Slave_IO_Running: Yes Slave_SQL_Running: Yes Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates [root@node02 ~]# ip a | grep 20 inet 10.0.20.202/24 brd 10.0.20.255 scope global bond0
node03
[root@node03 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running' mysql: [Warning] Using a password on the command line interface can be insecure. Master_Host: 10.0.20.201 Slave_IO_Running: Yes Slave_SQL_Running: Yes Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
node04
[root@node04 ~]# mysql -uroot -p123456 -e "show slave status\G" | egrep 'Master_Host|Slave_IO_Running|Slave_SQL_Running' mysql: [Warning] Using a password on the command line interface can be insecure. Master_Host: 10.0.20.201 Slave_IO_Running: Yes Slave_SQL_Running: Yes Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
从上面各个数据库的状态能够看出来,主库已经变成了node01
了,而且vip
也漂移到node01
的机器上了。