mysql高可用方案之MaxScale-HA-with-Corosync-Pacemaker

时间 2019-11-16

标签 mysql 可用方案 maxscale corosync pacemaker 栏目 MySQL 繁體版

原文原文链接

前面一篇文章已经实现了mysql的主从复制以及MHA的高可用，那么接下来这一章就要实现Maxscale的读写分离和HA，对于Maxscale的HA能够用keepalived、Heartbeat来实现，不过官方推荐corosync+pacemaker，熟悉高可用的朋友们就会知道corosync+pacemaker更增强大，配置灵活，corosync则容许为不一样的资源组配置不一样的主服务，在corosync中，其会自行处理配置文件的同步问题，corosync支持多个节点的集群，支持把资源进行分组，按照组进行资源的管理，设置主服务，自行进行启停，固然Corosync有必定的复杂度，因此咱们在配置的时候须要一点耐心，所以，通常来讲选择corosync来进行心跳检测，再搭配pacemaker的资源管理系统来构建高可用的系统。node

#初始化python

ntpdate 120.25.108.11 /root/init_system_centos7.sh #hosts文件配置(maxscale61,maxscale62)mysql

cat >> /etc/hosts << EOF 192.168.5.61 maxscale61.blufly.com 192.168.5.62 maxscale62.blufly.com 192.168.5.51 db51.blufly.com 192.168.5.52 db52.blufly.com 192.168.5.53 db53.blufly.com EOF #配置双机信任sql

[root@maxscale61 ~]# ssh-keygen -t rsa [root@maxscale61 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub -p 65535 root@192.168.5.62 [root@maxscale62 ~]# ssh-keygen -t rsa [root@maxscale62 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub -p 65535 root@192.168.5.61 #####---------------- 1、安装maxscale -----------------#####数据库

#在mysql master节点上建立监控、路由账户(db52，故障切换后，如今db52为master)后端

CREATE USER maxscale@'%' IDENTIFIED BY "balala369"; GRANT replication slave, replication client ON . TO maxscale@'%'; GRANT SELECT ON mysql.* TO maxscale@'%'; GRANT ALL ON maxscale_schema.* TO maxscale@'%'; GRANT SHOW DATABASES ON . TO maxscale@'%'; flush privileges; #安装maxscale(maxscale61,maxscale62)centos

[root@maxscale61 opt]# yum -y install libcurl libaio openssl [root@maxscale61 opt]# cd /opt [root@maxscale61 opt]# wget downloads.mariadb.com/MaxScale/la… [root@maxscale61 opt]# yum -y localinstall maxscale-2.2.13-1.centos.7.x86_64.rpm [root@maxscale61 opt]# maxkeys [root@maxscale61 opt]# maxpasswd balala369 47794130FFBA029760829CD50C10ABAC chown -R maxscale:maxscale /var/lib/maxscale/ #Maxscale 配置文件(maxscale61,maxscale62)缓存

cat /etc/maxscale.cnf [maxscale]bash

开启线程个数，默认为1.设置为auto会同cpu核数相同

threads=auto服务器

timestamp精度

ms_timestamp=1

将日志写入到syslog中

syslog=1

将日志写入到maxscale的日志文件中

maxlog=1

不将日志写入到共享缓存中，开启debug模式时可打开加快速度

log_to_shm=0

记录告警信息

log_warning=1

记录notice

log_notice=1

记录info

log_info=1

不打开debug模式

log_debug=0

日志递增

log_augmentation=1 [server1] type=server address=192.168.5.51 port=9106 protocol=MariaDBBackend serv_weight=3 [server2] type=server address=192.168.5.52 port=9106 protocol=MariaDBBackend serv_weight=1 [server3] type=server address=192.168.5.53 port=9106 protocol=MariaDBBackend serv_weight=3 [MariaDB-Monitor] type=monitor module=mariadbmon servers=server1,server2,server3 user=maxscale passwd=47794130FFBA029760829CD50C10ABAC monitor_interval=2000 detect_stale_master=true [Read-Only-Service] type=service router=readconnroute servers=server1,server2,server3 user=maxscale passwd=47794130FFBA029760829CD50C10ABAC router_options=slave enable_root_user=1 weightby=serv_weight [Read-Write-Service] type=service router=readwritesplit servers=server1,server2,server3 user=maxscale passwd=47794130FFBA029760829CD50C10ABAC enable_root_user=1 [MaxAdmin-Service] type=service router=cli [Read-Only-Listener] type=listener service=Read-Only-Service protocol=MariaDBClient port=4008 [Read-Write-Listener] type=listener service=Read-Write-Service protocol=MariaDBClient port=4006 [MaxAdmin-Listener] type=listener service=MaxAdmin-Service protocol=maxscaled socket=default #配置上 systemctl 的方式启动 maxscale

vi /usr/lib/systemd/system/maxscale.service [Unit] Description=MariaDB MaxScale Database Proxy After=network.target

[Service] Type=forking Restart=on-abort

PIDFile=/var/run/maxscale/maxscale.pid

ExecStartPre=/usr/bin/install -d /var/run/maxscale -o maxscale -g maxscale ExecStart=/usr/bin/maxscale --user=maxscale -f /etc/maxscale.cnf TimeoutStartSec=120 LimitNOFILE=65535

[Install] WantedBy=multi-user.target #测试maxscale启动、中止

systemctl start maxscale.service systemctl status maxscale.service systemctl stop maxscale.service systemctl status maxscale.service #开机自启动

systemctl enable maxscale.service #启动maxscale

[root@maxscale61 opt]# maxscale --user=maxscale -f /etc/maxscale.cnf [root@maxscale61 opt]# netstat -tnlup|grep maxscale tcp 0 0 127.0.0.1:8989 0.0.0.0:* LISTEN 31708/maxscale
tcp6 0 0 :::4008 :::* LISTEN 31708/maxscale
tcp6 0 0 :::4006 :::* LISTEN 31708/maxscale #登陆 MaxScale 管理器，查看一下数据库链接状态

[root@maxscale61 ~]# maxadmin -S /tmp/maxadmin.sock MaxScale> list servers Servers. -------------------+-----------------+-------+-------------+-------------------- Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+-------------------- server1 | 192.168.5.51 | 9106 | 0 | Slave, Running server2 | 192.168.5.52 | 9106 | 0 | Master, Running server3 | 192.168.5.53 | 9106 | 0 | Slave, Running -------------------+-----------------+-------+-------------+-------------------- MaxScale> MaxScale> list services Services. --------------------------+-------------------+--------+----------------+------------------- Service Name | Router Module | #Users | Total Sessions | Backend databases --------------------------+-------------------+--------+----------------+------------------- Read-Only-Service | readconnroute | 1 | 1 | server1, server2, server3 Read-Write-Service | readwritesplit | 1 | 1 | server1, server2, server3 MaxAdmin-Service | cli | 2 | 2 | --------------------------+-------------------+--------+----------------+------------------- ###验证maxscale的monitor插件，关闭db51的数据库服务

[root@db51 ~]# /etc/init.d/mysqld stop Stopping mysqld (via systemctl): [ 肯定 ] [root@maxscale61 opt]# maxadmin -S /tmp/maxadmin.sock MaxScale> list servers Servers. -------------------+-----------------+-------+-------------+-------------------- Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+-------------------- server1 | 192.168.5.51 | 9106 | 0 | Down server2 | 192.168.5.52 | 9106 | 0 | Master, Running server3 | 192.168.5.53 | 9106 | 0 | Slave, Running -------------------+-----------------+-------+-------------+-------------------- #启动db51的数据库服务

[root@db51 ~]# /etc/init.d/mysqld start Starting mysqld (via systemctl): [ 肯定 ] MaxScale> list servers Servers. -------------------+-----------------+-------+-------------+-------------------- Server | Address | Port | Connections | Status
-------------------+-----------------+-------+-------------+-------------------- server1 | 192.168.5.51 | 9106 | 0 | Slave, Running server2 | 192.168.5.52 | 9106 | 0 | Master, Running server3 | 192.168.5.53 | 9106 | 0 | Slave, Running -------------------+-----------------+-------+-------------+-------------------- ###验证读写分离(在db51操做，maxscale61没有装mysql，因此没有mysql命令)

[root@db51 ~]# mysql -ublufly -p852741 -h192.168.5.61 -P4006 #注意: 这边登陆的用户就是普通的MySQL用户, 不是maxscale用户

MySQL [(none)]> select @@hostname; +-----------------+ | @@hostname | +-----------------+ | db51.blufly.com | +-----------------+ 1 row in set (0.001 sec) MySQL [mysql]> use test; Database changed #建立表

MySQL [test]> CREATE TABLE bf_staff( -> staff_id INT NOT NULL AUTO_INCREMENT, -> staff_name VARCHAR(40) NOT NULL, -> staff_title VARCHAR(100) NOT NULL, -> entry_date DATE, -> PRIMARY KEY ( staff_id ) -> )ENGINE=InnoDB DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.167 sec)

MySQL [test]> show tables; +----------------+ | Tables_in_test | +----------------+ | bf_staff | +----------------+ 1 row in set (0.001 sec)

#插入数据

MySQL [test]> insert into bf_staff (staff_name,staff_title,entry_date) values('张森','软件工程师','1988-10-11'),('王梅','人事专员','1993-3-20'); Query OK, 2 rows affected (0.012 sec) Records: 2 Duplicates: 0 Warnings: 0 MySQL [test]> select * from bf_staff; +----------+------------+-----------------+------------+ | staff_id | staff_name | staff_title | entry_date | +----------+------------+-----------------+------------+ | 1 | 张森 | 软件工程师 | 1988-10-11 | | 2 | 王梅 | 人事专员 | 1993-03-20 | +----------+------------+-----------------+------------+ 2 rows in set (0.001 sec) MySQL [test]> insert into bf_staff (staff_name,staff_title,entry_date) values('李自在','产品经理','1979-11-19'),('王衡','测试工程师','1995-6-2'); #在maxscale61查看读写分离的过程

[root@maxscale61 ~]# cat /var/log/maxscale/maxscale.log #select被分配到db51 2018-09-12 16:51:46.262 info : (5) [readwritesplit] (log_transaction_status): > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 16, type: QUERY_TYPE_SHOW_TABLES, stmt: show tables 2018-09-12 16:51:46.262 info : (5) [readwritesplit] (handle_got_target): Route query to slave [192.168.5.51]:9106 < 2018-09-12 16:51:46.262 info : (5) [readwritesplit] (clientReply): Reply complete, last reply from server1 2018-09-12 16:51:58.842 info : (5) [readwritesplit] (log_transaction_status): > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 27, type: QUERY_TYPE_READ, stmt: select * from bf_staff 2018-09-12 16:51:58.842 info : (5) [readwritesplit] (handle_got_target): Route query to slave [192.168.5.51]:9106 < 2018-09-12 16:51:58.843 info : (5) [readwritesplit] (clientReply): Reply complete, last reply from server1 #insert被分配到db52

2018-09-12 16:59:52.066 info : (5) [readwritesplit] (log_transaction_status): > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 149, type: QUERY_TYPE_WRITE, stmt: insert into bf_staff (staff_name,staff_title,entry_date) values('李自在','产品经理','1979-11-19'),('王衡','测试工程师','1995-6-2') 2018-09-12 16:59:52.066 info : (5) [readwritesplit] (handle_got_target): Route query to master [192.168.5.52]:9106 < 2018-09-12 16:59:52.071 info : (5) [readwritesplit] (clientReply): Reply complete, last reply from server2 ##------- maxscale注意事项 --------##

#详细的注意事项连接 mariadb.com/kb/en/maria…

#这里我主要讲些重点须要注意的：

1）建立连接的时候，不支持压缩协议

2）转发路由不能动态的识别master节点的迁移

3）LONGLOB字段不支持

4）在一下状况会将语句转到master节点中(保证事务一致)：

明确指定事务；

 prepared的语句；

 语句中包含存储过程，自定义函数

包含多条语句信息：INSERT INTO ... ; SELECT LAST_INSERT_ID();
复制代码

5）一些语句默认会发送到后端的全部server中，可是能够指定

use_sql_variables_in=[master|all] (default: all)

为master的时候能够将语句都转移到master 上执行。可是自动提交值和prepared的语句仍然发送到全部后端server。

这些语句为：

COM_INIT_DB (USE creates this)
COM_CHANGE_USER
COM_STMT_CLOSE
COM_STMT_SEND_LONG_DATA
COM_STMT_RESET
COM_STMT_PREPARE
COM_QUIT (no response, session is closed)
COM_REFRESH
COM_DEBUG
COM_PING
SQLCOM_CHANGE_DB (USE ... statements)
SQLCOM_DEALLOCATE_PREPARE
SQLCOM_PREPARE
SQLCOM_SET_OPTION
SELECT ..INTO variable|OUTFILE|DUMPFILE
SET autocommit=1|0 6）maxscale不支持主机名匹配的认证模式，只支持IP地址方式的host解析。因此在添加user的时候记得使用合适的范式。

7）跨库查询不支持，会显示的指定到第一个数据库中

8）经过select方式改变会话变量的行为不支持

#####------------ 2、安装配置pacemaker+corosync --------------#####

#官方推荐用pacemaker+corosync来实现maxscale的高可用

yum install pcs pacemaker corosync fence-agents-all -y #启动pcsd服务（开机自启动）(maxscale61,maxscale62)

systemctl start pcsd.service systemctl enable pcsd.service #为hacluster设置密码，安装组件生成的hacluster用户，用来本地启动pcs进程，所以咱们须要设定密码，每一个节点的密码相同(maxscale61,maxscale62)

passwd hacluster balala369 #集群各节点之间认证

[root@maxscale61 ~]# pcs cluster auth 192.168.5.61 192.168.5.62 Username: hacluster Password: 192.168.5.62: Authorized 192.168.5.61: Authorized #建立 maxscalecluster 集群资源

[root@maxscale61 ~]# pcs cluster setup --name maxscalecluster 192.168.5.61 192.168.5.62 Destroying cluster on nodes: 192.168.5.61, 192.168.5.62... 192.168.5.62: Stopping Cluster (pacemaker)... 192.168.5.61: Stopping Cluster (pacemaker)... 192.168.5.62: Successfully destroyed cluster 192.168.5.61: Successfully destroyed cluster Sending 'pacemaker_remote authkey' to '192.168.5.61', '192.168.5.62' 192.168.5.61: successful distribution of the file 'pacemaker_remote authkey' 192.168.5.62: successful distribution of the file 'pacemaker_remote authkey' Sending cluster config files to the nodes... 192.168.5.61: Succeeded 192.168.5.62: Succeeded Synchronizing pcsd certificates on nodes 192.168.5.61, 192.168.5.62... 192.168.5.62: Success 192.168.5.61: Success Restarting pcsd on the nodes in order to reload the certificates... 192.168.5.62: Success 192.168.5.61: Success #查看corosync配置文件

cat /etc/corosync/corosync.conf #设置集群自启动

[root@maxscale61 ~]# pcs cluster enable --all 192.168.5.61: Cluster Enabled 192.168.5.62: Cluster Enabled #查看集群状态

[root@maxscale61 ~]# pcs cluster status Error: cluster is not currently running on this node #n the back-end , “pcs cluster start” command will trigger the following command on each cluster node [root@maxscale61 ~]# systemctl start corosync.service [root@maxscale61 ~]# systemctl start pacemaker.service [root@maxscale61 ~]# systemctl enable corosync [root@maxscale61 ~]# systemctl enable pacemaker [root@maxscale62 ~]# systemctl start corosync.service [root@maxscale62 ~]# systemctl start pacemaker.service [root@maxscale62 ~]# systemctl enable corosync [root@maxscale62 ~]# systemctl enable pacemaker [root@maxscale61 ~]# pcs cluster status Cluster Status: Stack: corosync Current DC: maxscale61.blufly.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Tue Sep 18 16:05:30 2018 Last change: Tue Sep 18 15:47:57 2018 by hacluster via crmd on maxscale61.blufly.com 2 nodes configured 0 resources configured PCSD Status: maxscale62.blufly.com (192.168.5.62): Online maxscale61.blufly.com (192.168.5.61): Online #查看启动节点状态

[root@maxscale61 ~]# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 192.168.5.61 status = ring 0 active with no faults [root@maxscale62 ~]# corosync-cfgtool -s Printing ring status. Local node ID 2 RING ID 0 id = 192.168.5.62 status = ring 0 active with no faults #查看pacemaker进程

[root@maxscale61 ~]# ps axf |grep pacemaker 17859 pts/0 S+ 0:00 | _ grep --color=auto pacemaker 17699 ? Ss 0:00 /usr/sbin/pacemakerd -f 17700 ? Ss 0:00 _ /usr/libexec/pacemaker/cib 17701 ? Ss 0:00 _ /usr/libexec/pacemaker/stonithd 17702 ? Ss 0:00 _ /usr/libexec/pacemaker/lrmd 17703 ? Ss 0:00 _ /usr/libexec/pacemaker/attrd 17704 ? Ss 0:02 _ /usr/libexec/pacemaker/pengine 17705 ? Ss 0:00 _ /usr/libexec/pacemaker/crmd #查看集群信息

[root@maxscale61 ~]# corosync-cmapctl | grep members runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.5.61) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.5.62) runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.2.status (str) = joined #禁用STONITH

pcs property set stonith-enabled=false #没法仲裁时候，选择忽略

pcs property set no-quorum-policy=ignore #检查配置是否正确

crm_verify -L -V #用crm添加集群资源

[root@maxscale61 ~]# crm -bash: crm: 未找到命令 [root@maxscale61 ~]# rpm -qa pacemaker pacemaker-1.1.18-11.el7_5.3.x86_64 #从pacemaker 1.1.8开始，crm发展成了一个独立项目，叫crmsh。也就是说，咱们安装了pacemaker后，并无crm这个命令，咱们要实现对集群资源管理，还须要独立安装crmsh，crmsh依赖于许多包如：pssh

[root@maxscale61 ~]# wget -O /etc/yum.repos.d/network:ha-clustering:Stable.repo download.opensuse.org/repositorie… [root@maxscale61 ~]# yum -y install crmsh [root@maxscale62 ~]# wget -O /etc/yum.repos.d/network:ha-clustering:Stable.repo download.opensuse.org/repositorie… [root@maxscale62 ~]# yum -y install crmsh #若是yum安装报错，那就下载rpm包进行安装(maxscale61,maxscale62)

cd /opt wget download.opensuse.org/repositorie… wget download.opensuse.org/repositorie… wget download.opensuse.org/repositorie… wget mirror.yandex.ru/opensuse/re… wget download.opensuse.org/repositorie… yum -y install crmsh-3.0.0-6.2.noarch.rpm crmsh-scripts-3.0.0-6.2.noarch.rpm pssh-2.3.1-7.3.noarch.rpm python-parallax-1.0.1-29.1.noarch.rpm python-pssh-2.3.1-7.3.noarch.rpm #配置VIP和监控的服务(只在maxscale61上配置)

crm crm(live)# status #查看systemd类型可代理的服务，其中有maxscale crm(live)ra# list systemd crm(live)# configure crm(live)configure# primitive maxscalevip ocf:IPaddr params ip=192.168.5.60 op monitor timeout=30s interval=60s #在这里咱们以192.168.5.60做为浮动IP，名字为maxscalevip而且告诉集群每30秒检查它一次 #配置监控的服务(maxscale.service) crm(live)configure# primitive maxscaleserver systemd:maxscale op monitor timeout=30s interval=60s #将 VIP(MaxScaleVIP)和监听的服务(maxscaleserver)归为同一个组 crm(live)configure# group maxscalegroup maxscalevip maxscaleserver #验证配置, 提交修改的配置 crm(live)configure# verify crm(live)configure# commit crm(live)configure# show #查看服务状况

crm(live)# status Stack: corosync Current DC: maxscale61.blufly.com (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Tue Sep 18 16:50:13 2018 Last change: Tue Sep 18 16:48:12 2018 by root via cibadmin on maxscale61.blufly.com 2 nodes configured 2 resources configured Online: [ maxscale61.blufly.com maxscale62.blufly.com ] Full list of resources: Resource Group: maxscalegroup maxscalevip (ocf::heartbeat:IPaddr): Started maxscale61.blufly.com maxscaleserver (systemd:maxscale): Started maxscale61.blufly.com crm(live)# quit #查看启动的资源

[root@maxscale61 opt]# ip addr | grep 192.168.5.60 inet 192.168.5.60/24 brd 192.168.5.255 scope global secondary eno16777984 [root@maxscale61 opt]# ps -ef | grep maxscale maxscale 22159 1 0 16:48 ? 00:00:01 /usr/bin/maxscale root 22529 13940 0 16:51 pts/0 00:00:00 grep --color=auto maxscale #服务转跳测试

#中止maxscale61上的maxscale服务

[root@maxscale61 opt]# systemctl stop maxscale.service

#在使用systemctl stop maxscale.service进行故障切换的时候，它不会立刻发生VIP漂移，而是会先在本机(maxscale61:192.168.5.61)上尝试启动maxscale服务, 通过屡次尝试不行才发生VIP和服务的转移。

#这边我要夸一下这样的资源管理实际上是很符合常理的很好的。这比咱们的MHA符合常理的多，其实在压力比较大的数据库中，也是不该该若是一宕机就立刻转移的，应该先在原先的服务器上再次启动一下服务，起步来在转跳的。由于若是压力大致使的奔溃，启动服务应该须要先把热数据加载到数据库中的。

#演示maxscale61宕机的状况下，看看maxscale服务和VIP是不是会漂移到maxscale62（192.168.5.62）上。

[root@maxscale61 opt]# shutdown -h now

#maxscale61被关机后，VIP立马切换到maxscale62上。从ping上面来看没有掉包的状况，作到无缝切换