目录html
在MySQL故障切换过程当中,MHA能作到在0~30秒以内自动完成数据库的故障切换操做,而且在进行故障切换的过程当中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。node
优势 | 缺点 |
---|---|
由perl语言开发的开源工具 | 须要编写脚本或利用第三方工具来实现Vip的配置 |
支持基于gtid的复制模式 | MHA启动后只会对主数据库进行监控 |
同一个监控节点能够监控多个集群 | 须要基于SSH免认证配置,存在必定的安全隐患 |
MHA在进行故障转移时更不易产生数据丢失 | 没有提供从服务器的读负载均衡功能 |
使用gtid时大大简化复制过程,gtid是彻底基于事务的,只要主服务器上提交了事务,那么从服务器上就必定会执行该事务mysql
MHA是由两部分组成MHA manager节点,和MHA node节点.能够部署在一台独立的机器上管理多个集群,也能够部署在一台slave节点上,只管理当前所在的集群.MHA node运行在每台mysql服务器和 manager服务器上,MHA manager定时探测master节点状态,master故障时,自动将拥有最新数据的slave提高为masterlinux
MHAmaster切换时会试图从宕机的主服务器上保存二进制日志文件,最大程度保证数据不丢失,但有必定的几率会丢失数据,例如,若是主服务器硬件故障或没法经过 ssh 访问,MHA 无法保存二进制日志,只进行故障转移从而丢失了最新的数据。使用 MySQL 5.5 的半同步复制,能够下降数据丢失的风险。MHA能够与半同步复制结合起来。若是只有一个slave已经收到了最新的二进制日志,MHA能够将最新的二进制日志应用于其余全部的slave服务器上,所以能够保证全部节点的数据一致性。sql
案例环境就是一个集群,因此manager节点部署在其中一个slave上就能够,只管理当前集群.数据库
而为了节省资源本案例使用一台主库,一台备用主库,主库空闲时也负责读操做,外加一台从库ubuntu
而因为yum安装的版本很差指定,我这里采用二进制安装,而且使用lvs来调度读库,keepalived高可用lvsvim
测试过程关闭防火墙和selinux安全
主机名 | IP地址 | MHA角色 | mysql角色 |
---|---|---|---|
master | 192.168.111.3 | 主库 | master |
node1 | 192.168.111.4 | MHA集群manager节点,从库 | slave |
node2 | 192.168.111.5 | 备用主库 | slave |
lvs1 | 192.168.111.6 | lvs,keepalived | |
lvs2 | 192.168.111.7 | lvs,keepalived | |
MHA VIP | 192.168.111.100 | ||
keepalived VIP | 192.168.111.200 |
[root@localhost ~]# vim /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.111.3 master 192.168.111.4 node1 192.168.111.5 node2 192.168.111.6 lvs1 192.168.111.7 lvs2 [root@localhost ~]# scp /etc/hosts root@node1:/etc/ [root@localhost ~]# scp /etc/hosts root@node2:/etc/ [root@localhost ~]# scp /etc/hosts root@lvs1:/etc/ [root@localhost ~]# scp /etc/hosts root@lvs2:/etc/ [root@localhost ~]# hostname master [root@localhost ~]# bash [root@master ~]# uname -n master [root@localhost ~]# hostname node1 [root@localhost ~]# bash [root@node1 ~]# uname -n node1 [root@localhost ~]# hostname node2 [root@localhost ~]# bash [root@node2 ~]# uname -n node2 [root@localhost ~]# hostname lvs1 [root@localhost ~]# bash [root@lvs1 ~]# uname -n lvs1 [root@localhost ~]# hostname lvs2 [root@localhost ~]# bash [root@lvs2 ~]# uname -n lvs2
http://downloads.mariadb.com/MHA/ #下载网址下载MHA-manager和MHA-node 我这里的版本是mha4mysql-manager-0.56.tar.gz;mha4mysql-node-0.56.tar.gz 自行配置epel源 [root@master ~]# yum install -y perl-DBD-MySQL.x86_64 perl-DBI.x86_64 perl-CPAN perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker #依赖包 [root@master ~]# rpm -q perl-DBD-MySQL.x86_64 perl-DBI.x86_64 perl-CPAN perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker #检查,必需要所有安装上 #全部节点安装node [root@master ~]# tar xf mha4mysql-node-0.56.tar.gz [root@master ~]# cd mha4mysql-node-0.56/ [root@master mha4mysql-node-0.56]# perl Makefile.PL [root@master mha4mysql-node-0.56]# make && make install [root@master ~]# ls -l /usr/local/bin/ 总用量 40 -r-xr-xr-x 1 root root 16346 5月 19 16:21 apply_diff_relay_logs -r-xr-xr-x 1 root root 4807 5月 19 16:21 filter_mysqlbinlog -r-xr-xr-x 1 root root 7401 5月 19 16:21 purge_relay_logs -r-xr-xr-x 1 root root 7395 5月 19 16:21 save_binary_logs #生成的二进制文件 manager的安装 刚才已经安了一部分依赖,有缺乏的 [root@node1 mha4mysql-node-0.56]# yum -y install perl perl-Log-Dispatch perl-Parallel-ForkManager perl-DBD-MySQL perl-DBI perl-Time-HiRes perl-Config-Tiny [root@node1 mha4mysql-node-0.56]# rpm -q perl perl-Log-Dispatch perl-Parallel-ForkManager perl-DBD-MySQL perl-DBI perl-Time-HiRes perl-Config-Tiny perl-5.16.3-294.el7_6.x86_64 perl-Log-Dispatch-2.41-1.el7.1.noarch perl-Parallel-ForkManager-1.18-2.el7.noarch perl-DBD-MySQL-4.023-6.el7.x86_64 perl-DBI-1.627-4.el7.x86_64 perl-Time-HiRes-1.9725-3.el7.x86_64 perl-Config-Tiny-2.14-7.el7.noarch [root@node1 ~]# tar xf mha4mysql-manager-0.56.tar.gz [root@node1 ~]# cd mha4mysql-manager-0.56/ [root@node1 mha4mysql-manager-0.56]# perl Makefile.PL && make && make install [root@node1 mha4mysql-manager-0.56]# ls /usr/local/bin/ apply_diff_relay_logs masterha_check_ssh masterha_manager masterha_secondary_check save_binary_logs filter_mysqlbinlog masterha_check_status masterha_master_monitor masterha_stop masterha_check_repl masterha_conf_host masterha_master_switch purge_relay_logs
在manager上操做 [root@node1 ~]# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory '/root/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:qvOacPMWP+iO4BcxPtHJkVDdJXE4HNklCxlPkWShdMM root@node1 The key's randomart image is: +---[RSA 2048]----+ | .o.o oB%%=. | | o ..BXE+ | | o o ..o | | + + | | . + S | | + .. | | o oo.+ | | . +o*o o | | ..=B= . | +----[SHA256]-----+ #以上都是回车我按照的默认的来的,也能够本身指定选项 [root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@master [root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2 [root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1 #逻辑上虽然本身就是node1,不会影响什么,可是按照个人思路node1也应该传递公钥,感兴趣的能够研究一下ssh的实现原理 [root@node1 ~]# ssh node1 Last login: Sat Apr 27 13:33:49 2019 from 192.168.111.1 [root@node1 ~]# exit 登出 Connection to node1 closed. [root@node1 ~]# ssh node2 Last login: Thu Apr 18 22:55:10 2019 from 192.168.111.1 [root@node2 ~]# exit 登出 Connection to node2 closed. [root@node1 ~]# ssh master Last login: Sun May 19 16:00:20 2019 from 192.168.111.1 #保险起见,我一个个试了一遍 每一个节点上也须要分发公钥,原本只须要分发给node节点,在本案例中manager不是部署在单独服务器上,而是部署在一个node节点上,因此,也要给它分发 [root@master ~]# ssh-keygen -t rsa [root@master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1 [root@master ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2 [root@node2 ~]# ssh-keygen -t rsa [root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@master [root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1
三个节点安装mysql二进制安装 yum -y install libaio wget http://mirrors.sohu.com/mysql/MySQL-5.7/mysql-5.7.24-linux-glibc2.12-x86_64.tar.gz useradd -M -s /sbin/nologin mysql tar zxf mysql-5.7.24-linux-glibc2.12-x86_64.tar.gz mv mysql-5.7.24-linux-glibc2.12-x86_64 /usr/local/mysql chown -R mysql:mysql /usr/local/mysql ln -s /usr/local/mysql/bin/* /usr/local/bin/ /usr/local/mysql/bin/mysqld --user=mysql --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --initialize cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysqld 将随机生成的登陆密码记录下来 2019-05-18T08:43:11.094845Z 1 [Note] A temporary password is generated for root@localhost: 2Gk75Zvp&!-y #能够启动服务后,使用mysql -u root -p'旧密码' password '新密码'更改密码 master: vim /etc/my.cnf [mysqld] datadir=/usr/local/mysql/data socket=/usr/local/mysql/data/mysql.sock server-id=1 log-bin=mysql-binlog log-slave-updates=true symbolic-links=0 [mysqld_safe] log-error=/usr/local/mysql/data/mysql.log pid-file=/usr/local/mysql/data/mysql.pid node1: [root@node1 ~]# vim /etc/my.cnf [mysqld] datadir=/usr/local/mysql/data socket=/tmp/mysql.sock server-id=2 relay-log=relay-log-bin relay-log-index=slave-relay-bin.index symbolic-links=0 [mysqld_safe] log-error=/usr/local/mysql/data/mysql.log pid-file=/usr/local/mysql/data/mysql.pid node2: [root@node2 mysql]# vim /etc/my.cnf [mysqld] datadir=/usr/local/mysql/data socket=/tmp/mysql.sock server-id=3 relay-log=relay-log-bin relay-log-index=slave-relay-bin.index symbolic-links=0 [mysqld_safe] log-error=/usr/local/mysql/data/mysql.log pid-file=/usr/local/mysql/data/mysql.pid #主上操做 mysql> grant replication slave on *.* to 'myslave'@'192.168.111.%' identified by '123456'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.10 sec) mysql> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | mysql-binlog.000002 | 864 | | | | +---------------------+----------+--------------+------------------+-------------------+ 因为我这里是测试环境,以前也没什么数据,就不对原本已经存在的数据进行备份了. node1,node2: mysql> change master to -> master_host='192.168.111.3', -> master_user='myslave', -> master_password='123456', -> master_log_file='mysql-binlog.000002', -> master_log_pos=864; Query OK, 0 rows affected, 2 warnings (0.01 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G; Slave_IO_Running: Yes Slave_SQL_Running: Yes 而后建立表,库,到从库上查看测试.这里再也不演示
[root@master plugin]# ll -lh semisync_* -rwxr-xr-x 1 mysql mysql 692K 10月 4 2018 semisync_master.so -rwxr-xr-x 1 mysql mysql 149K 10月 4 2018 semisync_slave.so #这是半同步的插件 [root@master plugin]# mysql -u root -p123456 #所有安装以下 mysql> install plugin rpl_semi_sync_master soname 'semisync_master.so'; mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'; #安装插件 mysql> SELECT PLUGIN_NAME, PLUGIN_STATUS FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_NAME LIKE '%semi%'; +----------------------+---------------+ | PLUGIN_NAME | PLUGIN_STATUS | +----------------------+---------------+ | rpl_semi_sync_slave | ACTIVE | | rpl_semi_sync_master | ACTIVE | +----------------------+---------------+ set global rpl_semi_sync_master_enabled=on ; #主上开启插件 set global rpl_semi_sync_slave_enabled=on ; #node1备上开启 mysql> set global rpl_semi_sync_slave_enabled=on ; mysql> set global rpl_semi_sync_master_enabled=on ; #node2上两个都打开,它也是备用的主
以上配置重启mysql就会失效,添加到配置文件中可避免bash
主: vim /etc/my.cnf plugin-load=rpl_semi_sync_master=semisync_master.so plugin-load=rpl_semi_sync_slave=semisync_slave.so rpl_semi_sync_master_enabled=on node1: vim /etc/my.cnf plugin-load=rpl_semi_sync_master=semisync_master.so plugin-load=rpl_semi_sync_slave=semisync_slave.so rpl_semi_sync_slave_enabled=on node2: vim /etc/my.cnf plugin-load=rpl_semi_sync_master=semisync_master.so plugin-load=rpl_semi_sync_slave=semisync_slave.so rpl_semi_sync_slave_enabled=on rpl_semi_sync_master_enabled=on
mysql> create database qiao; Query OK, 1 row affected (0.50 sec) mysql> SHOW GLOBAL STATUS LIKE '%semi%'; +--------------------------------------------+--------+ | Variable_name | Value | +--------------------------------------------+--------+ | Rpl_semi_sync_master_yes_tx | 1 | #作一些测试操做,改参数会变大,随后在salve上查看是否同步
node2和node1同样: mysql> set global read_only=1; #该命令只限制普通用户,不包括root等具备supper权限的用户,要想拒绝全部用户能够经过"flush tables with read lock"即谁也没有办法进行写入了
主: mysql> grant all privileges on *.* to 'root'@'192.168.111.%' identified by '123456'; mysql> flush privileges; 其他节点查看以下: mysql> show grants for root@'192.168.111.%'; +-------------------------------------------------------+ | Grants for root@192.168.111.% | +-------------------------------------------------------+ | GRANT ALL PRIVILEGES ON *.* TO 'root'@'192.168.111.%' | +-------------------------------------------------------+ #能够看到受权也是更改了数据库的数据,因此也复制完成了
[root@node1 ~]# vim /etc/masterha/app1.cnf [server default] manager_workdir=/var/log/masterha/app1 #mha manager生成的相关状态文件的绝对路径,若是没有设置,则默认使用/var/tmp manager_log=/var/log/masterha/app1/manager.log #mha manager生成的日志据对路径,若是没有设置,mha manager将打印在标准输出,标准错误输出上,如:当mha manager执行故障转移的时候,这些信息就会打印 master_binlog_dir=/usr/local/mysql/data #在master上生成binlog的绝对路径,这个参数用在master挂了,可是ssh还可达的时候,从主库的这个路径下读取和复制必须的binlog events,这个参数是必须的,由于master的mysqld挂掉以后,没有办法自动识别master的binlog存放目录。默认状况下,master_binlog_dir的值是/var/lib/mysql,/var/log/mysql,/var/lib/mysql目录是大多数mysql分支默认的binlog输出目录,而 /var/log/mysql是ubuntu包的默认binlog输出目录,这个参数能够设置多个值,用逗号隔开 master_ip_failover_script=/usr/local/bin/master_ip_failover #本身编写一个脚原本透明地把应用程序链接到新主库上,用于切换VIP转移 password=123456 user=root #目标mysql实例的管理账号,尽可能是root用户,由于运行全部的管理命令(如:stop slave,change master,reset slave)须要使用,默认是root,以及密码 ping_interval=1 #这个参数表示mha manager多久ping(执行select ping sql语句)一次master,连续三个丢失ping链接,mha master就断定mater死了,所以,经过4次ping间隔的最大时间的机制来发现故障,默认是3,表示间隔是3秒 remote_workdir=/usr/local/mysql/data #每个MHA node(指的是mysql server节点)生成日志文件的工做路径,这个路径是绝对路径,若是该路径目录不存在,则会自动建立,若是没有权限访问这个路径,那么MHA将终止后续过程,另外,你须要关心一下这个路径下的文件系统是否有足够的磁盘空间,默认值是/var/tmp repl_password=123456 repl_user=myslave #在全部slave上执行change master的复制用户名及密码,这个用户最好是在主库上拥有replication slave权限 [server1] hostname=master #主机名或者ip地址 port=3306 #mysql端口 [server2] hostname=node2 candidate_master=1 #从不一样的从库服务器中,提高一个可靠的机器为新主库,(好比:RAID 10比RAID0的从库更可靠),能够经过在配置文件中对应的从库服务器的配置段下添加candidate_master=1来提高这个从库被提高为新主库的优先级(这个从库要开始binlog,以及没有显著的复制延迟,若是不知足这两个条件,也并不会在主库挂掉的时候成为新主库,因此,这个参数只是提高了优先级,并非说指定了这个参数就必定会成为新主库) port=3306 check_repl_delay=0 #默认状况下,若是从库落后主库100M的relay logs,MHA不会选择这个从库做为新主库,由于它会增长恢复的时间,设置这个参数为0,MHA在选择新主库的时候,则忽略复制延迟,这个选项用在你使用candidate_master=1 明确指定须要哪一个从库做为新主库的时候使用。 [server3] hostname=node1 port=3306 node2操做: [root@node2 ~]# vim /etc/my.cnf log-bin=mysql-binlog #开启二进制日志,并重启服务
[root@node1 ~]# vim /usr/local/bin/masterha_ip_failover #!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, ); my $vip = '192.168.111.100'; my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down"; $ssh_user = "root"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; #eval { # print "Disabling the VIP on old master: $orig_master_host \n"; # &stop_vip(); # $exit_code = 0; #}; eval { print "Disabling the VIP on old master: $orig_master_host \n"; #my $ping=`ping -c 1 10.0.0.13 | grep "packet loss" | awk -F',' '{print $3}' | awk '{print $1}'`; #if ( $ping le "90.0%"&& $ping gt "0.0%" ){ #$exit_code = 0; #} #else { &stop_vip(); # updating global catalog, etc $exit_code = 0; #} }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; `ssh $ssh_user\@$orig_master_ip \" $ssh_start_vip \"`; exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port -- new_mas ter_host=host --new_master_ip=ip --new_master_port=port\n"; } #以上脚本只须要修改相应的ip以及网卡就好 [root@node1 ~]# chmod +x /usr/local/bin/masterha_ip_failover
MySQL在主从复制场景下默认状况是:从库的relaylog会在SQL线程执行完毕以后备自动删除.可是在本案例MHA场景下,对于某些滞后从库的恢复依赖于其它从库的relaylog,所以采用禁用自动删除功能以及加入计划任务按期清理的办法
那么待会作计划任务时会采起硬连接的方式,这是由于在文件系统中;咱们一个没作硬连接的文件对应一个inode节点,删除的文件是将其的全部数据块删除.而作了硬连接的文件至关因而让多个文件指向同一个inode节点,删除操做时删除的仅仅是指向inode指针.这种方法在删除一些大的文件,数据库大表中也是常常用到.
==可是阻止清楚relaylog日志也有其弊端==,看这边文章
关于如下脚本中使用到的purge_relay_logs工具的使用,详情请看这篇文档
mysql> set global relay_log_purge=0; #两个从节点操做同样 配置脚本并添加计划任务(两从同样) [root@node1 ~]# vim /opt/purge_relay_log.sh #!/bin/bash user=root passwd=123456 #建立的监控帐号及密码 port=3306 #端口 log_dir='/usr/local/mysql/data' #relaylog日志的目录 work_dir='/tmp' #指定建立relaylog硬连接的位置,默认是/var/tmp,由于不一样分区之间是不能建立硬连接的,最好指定下硬连接的具体位置,成功执行脚本后,硬连接的中继日志文件就会被删除 purge='/usr/local/bin/purge_relay_logs' if [ ! -d $log_dir ] then mkdir $log_dir -p fi $purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1 [root@node1 ~]# chmod +x /opt/purge_relay_log.sh [root@node1 ~]# crontab -e 0 4 * * * /bin/bash /opt/purge_relay_log.sh 手动执行一遍 [root@node1 data]# purge_relay_logs --user=root --password=123456 --disable_relay_log_purge --port=3306 --workdir=/tmp 2019-04-27 23:50:55: purge_relay_logs script started. Found relay_log.info: /usr/local/mysql/data/relay-log.info Removing hard linked relay log files relay-log-bin* under /tmp.. done. Current relay log file: /usr/local/mysql/data/relay-log-bin.000007 Archiving unused relay log files (up to /usr/local/mysql/data/relay-log-bin.000006) ... Creating hard link for /usr/local/mysql/data/relay-log-bin.000006 under /tmp/relay-log-bin.000006 .. ok. Creating hard links for unused relay log files completed. Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SE T GLOBAL relay_log_purge=0; .. ok. Removing hard linked relay log files relay-log-bin* under /tmp.. done. 2019-04-27 23:50:58: All relay log purging operations succeeded.
[root@node1 data]# masterha_check_ssh --conf=/etc/masterha/app1.cnf Sat Apr 27 23:55:13 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat Apr 27 23:55:13 2019 - [info] Reading application default configurations from /etc/masterha/app1.cnf.. Sat Apr 27 23:55:13 2019 - [info] Reading server configurations from /etc/masterha/app1.cnf.. Sat Apr 27 23:55:13 2019 - [info] Starting SSH connection tests.. Sat Apr 27 23:55:15 2019 - [debug] Sat Apr 27 23:55:14 2019 - [debug] Connecting via SSH from root@node1(192.168.111.4:22) to root@master(192.168.111.3:22).. Sat Apr 27 23:55:15 2019 - [debug] ok. Sat Apr 27 23:55:15 2019 - [debug] Connecting via SSH from root@node1(192.168.111.4:22) to root@node2(192.168.111.5:22).. Sat Apr 27 23:55:15 2019 - [debug] ok. Sat Apr 27 23:55:15 2019 - [debug] Sat Apr 27 23:55:13 2019 - [debug] Connecting via SSH from root@node2(192.168.111.5:22) to root@master(192.168.111.3:22).. Sat Apr 27 23:55:14 2019 - [debug] ok. Sat Apr 27 23:55:14 2019 - [debug] Connecting via SSH from root@node2(192.168.111.5:22) to root@node1(192.168.111.4:22).. Sat Apr 27 23:55:14 2019 - [debug] ok. Sat Apr 27 23:55:15 2019 - [debug] Sat Apr 27 23:55:13 2019 - [debug] Connecting via SSH from root@master(192.168.111.3:22) to root@node2(192.168.111.5:22).. Sat Apr 27 23:55:13 2019 - [debug] ok. Sat Apr 27 23:55:13 2019 - [debug] Connecting via SSH from root@master(192.168.111.3:22) to root@node1(192.168.111.4:22).. Sat Apr 27 23:55:14 2019 - [debug] ok. Sat Apr 27 23:55:15 2019 - [info] All SSH connection tests passed successfully.
[root@node1 data]# masterha_check_ssh --conf=/etc/masterha/app1.cnf Sat Apr 27 23:57:21 2019 - [error][/usr/local/share/perl5/MHA/Server.pm, ln383] node2(192.168.111.5:3306): User myslave does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host. Sat Apr 27 23:57:21 2019 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln401] Error happend on checking configurations. at /usr/local/share/perl5/MHA/ServerManager.pm line 1354.Sat Apr 27 23:57:21 2019 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln500] Error happened on monitoring servers. #正常的不说,看报错的 #第一句日志看出是受权用户的问题,去111.5查看下 mysql> show grants for myslave@'192.168.111.%'; +-------------------------------------------------------------+ | Grants for myslave@192.168.111.% | +-------------------------------------------------------------+ | GRANT REPLICATION SLAVE ON *.* TO 'myslave'@'192.168.111.%' | +-------------------------------------------------------------+ #主上是有的,从库没有,由于我最开始时是在主上受权了以后才开启的二进制功能""show master status"就是从那以后才开始复制的 mysql> grant replication slave on *.* to myslave@'192.168.111.%' identified by '123456'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.01 sec) #主上再次受权,此次由于已经主从复制了,因此其它两台从再查看也有了 mysql> show grants for myslave@'192.168.111.%'; +-------------------------------------------------------------+ | Grants for myslave@192.168.111.% | +-------------------------------------------------------------+ | GRANT REPLICATION SLAVE ON *.* TO 'myslave'@'192.168.111.%' | +-------------------------------------------------------------+ 再次执行 [root@node1 data]# masterha_check_repl --conf=/etc/masterha/app1.cnf 还报错,继续解决 [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln401] Error happend on checking configurations. Can't exec "/usr/local/bin/master_ip_failover": 没有那个文件或目录 at /usr/local/share/perl5/MHA/ManagerUtil.pm line 68. #监控脚本好像放的位置不对 [root@node1 data]# ll /usr/local/bin/masterha_ip_failover #名字和程序定义的不一样,更改下 [root@node1 data]# mv /usr/local/bin/masterha_ip_failover /usr/local/bin/master_ip_failover 再次重试 [root@node1 data]# masterha_check_repl --conf=/etc/masterha/app1.cnf MySQL Replication Health is OK 最下面报这个,只能说明暂时没问题了,咱们继续进行下去
两种方式,一种经过keepalived或heartbeat管VIP转移,另外一种为命令方式
本案例采起命令方式
[root@node1 data]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING). #检查manager状态若是正常会显示"PING_OK",不然会显示"NOT_RUNNING",表明 MHA 监控没有开启。咱们刚才所作的都是预先进行手动测试,并非打开MHA监控 [root@node1 data]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover > /var/log/masterha/app1/manager.log 2>&1 & [1] 67827 #--remove_dead_master_conf:若是设置此参数,当成功failover后,MHA manager将会自动删除配置文件中关于dead master的配置选项。 #--ignore_last_failover:在缺省状况下,若是 MHA 检测到连续发生宕机,且两次宕机间隔不足 8 小时的话,则不会进行 Failover,之因此这样限制是为了不 ping-pong 效应。该参数表明忽略上次 MHA 触发切换产生的文件,默认状况下,MHA 发生切换后会在日志目录,也就是上面我设置的/data 产生 app1.failover.complete 文件,下次再次切换的时候若是发现该目录下存在该文件将不容许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover。 [root@node1 data]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:67827) is running(0:PING_OK), master:master #再次查看状态
我在master上已经看到了VIP,如今测试切换VIP,master上关闭mysql服务/etc/init.d/mysqld stop
node2是备用主,查看VIP是否转移
[root@node2 ~]# ip a | grep ens32 2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 inet 192.168.111.5/24 brd 192.168.111.255 scope global noprefixroute ens32 inet 192.168.111.100/24 brd 192.168.111.255 scope global secondary ens32:1 #转移成功
node1是从库,查看下主从复制的状况
[root@node1 ~]# mysql -uroot -p123456 mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.111.5 Master_User: myslave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-binlog.000004 Read_Master_Log_Pos: 154 Relay_Log_File: relay-log-bin.000002 Relay_Log_Pos: 323 Relay_Master_Log_File: mysql-binlog.000004 Slave_IO_Running: Yes Slave_SQL_Running: Yes #指定的主库已经更改成192.168.111.5 [root@node1 ~]# jobs -l [2]+ 73464 中止 vim /usr/local/bin/master_ip_failover #该脚本已经切换了至关于完成了任务,已经中止了,能够再次运行 [root@node1 ~]# vim /etc/masterha/app1.cnf [server default] manager_log=/var/log/masterha/app1/manager.log manager_workdir=/var/log/masterha/app1 master_binlog_dir=/usr/local/mysql/data master_ip_failover_script=/usr/local/bin/master_ip_failover password=123456 ping_interval=1 remote_workdir=/usr/local/mysql/data repl_password=123456 repl_user=myslave user=root [server2] candidate_master=1 check_repl_delay=0 hostname=node2 port=3306 [server3] hostname=node1 port=3306 #能够看到server1因为故障已经从配置文件删除了
[root@master ~]# /etc/init.d/mysqld start #master修复好mysql启动服务 [root@master ~]# mysql -uroot -p123456 mysql> stop slave; mysql> change master to -> master_host='192.168.111.5', -> master_user='myslave', -> master_password='123456'; #填写新主的ip,不写二进制文件名和位置参数了. mysql> start slave; mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_Running: Yes Slave_SQL_Running: Yes manager上: [root@node1 ~]# vim /etc/masterha/app1.cnf #添加以下 [server1] hostname=master port=3306 [root@node1 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf #再次检查集群状态 MySQL Replication Health is OK. [root@node1 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover > /var/log/masterha/app1/manager.log 2>&1 &[3] 75013 [root@node1 ~]# jobs -l [3]- 75013 运行中 nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover > /var/log/mas terha/app1/manager.log 2>&1 & node2:关闭服务,测试VIP自动切回 [root@node2 ~]# /etc/init.d/mysqld stop Shutting down MySQL............ SUCCESS! [root@node2 ~]# ip a | grep ens32 2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 inet 192.168.111.5/24 brd 192.168.111.255 scope global noprefixroute ens32 master:查看 [root@master ~]# ip a| grep ens32 2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 inet 192.168.111.3/24 brd 192.168.111.255 scope global noprefixroute ens32 inet 192.168.111.100/24 brd 192.168.111.255 scope global secondary ens32:1 node1:从库查看主从复制状态,指定主ip已经自动切换 mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.111.3 Master_User: myslave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-binlog.000004 Read_Master_Log_Pos: 398 Relay_Log_File: relay-log-bin.000002 Relay_Log_Pos: 323 Relay_Master_Log_File: mysql-binlog.000004 Slave_IO_Running: Yes Slave_SQL_Running: Yes
而后按照一样的思路以及配置再将node2加入到MHA中
若出现这样错误: mysql> stop slave; Query OK, 0 rows affected (0.01 sec) mysql> change master to master_host='192.168.111.3', master_user='myslave', master_password='123456'; Query OK, 0 rows affected, 2 warnings (0.11 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.111.3 Master_User: myslave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-binlog.000004 Read_Master_Log_Pos: 842 Relay_Log_File: relay-log-bin.000003 Relay_Log_Pos: 1392 Relay_Master_Log_File: mysql-binlog.000002 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1007 Last_Error: Error 'Can't create database 'qiao'; database exists' on query. Default database: 'qiao'. Query: 'create database qia o' Skip_Counter: 0 Exec_Master_Log_Pos: 1173 Relay_Log_Space: 4470 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1007 Last_SQL_Error: Error 'Can't create database 'qiao'; database exists' on query. Default database: 'qiao'. Query: 'create database qia o' 解决: mysql> stop slave; Query OK, 0 rows affected (0.00 sec) mysql> set global sql_slave_skip_counter=1; Query OK, 0 rows affected (0.00 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.111.3 Master_User: myslave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-binlog.000004 Read_Master_Log_Pos: 842 Relay_Log_File: relay-log-bin.000006 Relay_Log_Pos: 323 Relay_Master_Log_File: mysql-binlog.000004 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 842 Relay_Log_Space: 1191 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 9da60612-7a17-11e9-b288-000c2935c4a6 Master_Info_File: /usr/local/mysql/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec)
两台机器同样先安装上 [root@lvs2 ~]# yum -y install ipvsadm kernel-devel openssl-devel keepalived [root@lvs2 ~]# vim /etc/keepalived/keepalived.conf #这里keepalived.lvs配置文件不解释,能够到其它站点搜索文档阅读 ! Configuration File for keepalived global_defs { notification_email { acassen@firewall.loc failover@firewall.loc sysadmin@firewall.loc } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 192.168.200.1 smtp_connect_timeout 30 router_id LVS_DEVEL vrrp_skip_check_adv_addr vrrp_garp_interval 0 vrrp_gna_interval 0 } vrrp_instance VI_1 { state MASTER interface ens32 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.111.200 } } virtual_server 192.168.111.200 3306 { delay_loop 6 lb_algo rr lb_kind DR protocol TCP real_server 192.168.111.4 3306 { weight 1 TCP_CHECK { connect_timeout 10 nb_get_retry 3 delay_before_retry 3 connect_port 3306 } } real_server 192.168.111.5 3306 { weight 1 TCP_CHECK { connect_timeout 10 nb_get_retry 3 delay_before_retry 3 connect_port 3306 } } } } [root@lvs2 ~]# scp /etc/keepalived/keepalived.conf root@lvs1:/etc/keepalived/ #复制到另外一台机器 lvs1修改以下 [root@lvs1 ~]# vim /etc/keepalived/keepalived.conf 12 router_id LVS_DEVEL1 20 state BACKUP priority 90 [root@lvs2 ~]# systemctl start keepalived.service #分别启动服务 [root@lvs2 ~]# ip a | grep ens32 2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 inet 192.168.111.7/24 brd 192.168.111.255 scope global noprefixroute ens32 inet 192.168.111.200/32 scope global ens32 [root@lvs2 ~]# ipvsadm -ln #查看状态 IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.111.200:3306 rr -> 192.168.111.4:3306 Route 1 0 0 -> 192.168.111.5:3306 Route 1 0 0
node1: [root@node1 ~]# vim /opt/realserver.sh #!/bin/bash SNS_VIP=192.168.111.200 ifconfig lo:0 $SNS_VIP netmask 255.255.255.255 broadcast $SNS_VIP /sbin/route add -host $SNS_VIP dev lo:0 echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce sysctl -p >/dev/null 2>&1 node2: [root@node1 ~]# vim /opt/realserver.sh #!/bin/bash SNS_VIP=192.168.111.200 ifconfig lo:0 $SNS_VIP netmask 255.255.255.255 broadcast $SNS_VIP /sbin/route add -host $SNS_VIP dev lo:0 echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce sysctl -p >/dev/null 2>&1 [root@node1 ~]# sh /opt/realserver.sh #在manager,node1机器上链接VIP进行测试 [root@node1 ~]# mysql -uroot -p123456 -h192.168.111.200 mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 749 Server version: 5.7.24 MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>