MHA官方文档翻译

时间 2019-11-30

标签 mha 官方文档翻译繁體版

原文原文链接

英文官方文档node

http://code.google.com/p/mysql-master-ha/wiki/TableOfContents?tm=6mysql

转载请注明出处sql

Overview

MHA可以在较短的时间内实现自动故障检测和故障转移，一般在10-30秒之内;在复制框架中，MHA可以很好地解决复制过程当中的数据一致性问题，因为不须要在现有的replication中添加额外的服务器，仅须要一个manager节点，而一个Manager能管理多套复制，因此能大大地节约服务器的数量;另外，安装简单，无性能损耗，以及不须要修改现有的复制部署也是它的优点之处。安全

MHA还提供在线主库切换的功能，可以安全地切换当前运行的主库到一个新的主库中(经过将从库提高为主库),大概0.5-2秒内便可完成。服务器

MHA提供了上述功能，使得其在适用于对高可用性，数据完整性要求高的场合，还有要求几乎non-stop的主库维护。网络

◎自动故障检测和自动故障转移

MHA可以在一个已经存在的复制环境中监控MySQL，当检测到Master故障后可以实现自动故障转移，经过鉴定出最“新”的Salve的relay log，并将其应用到全部的Slave，这样MHA就可以保证各个slave之间的数据一致性，即便有些slave在主库崩溃时尚未收到最新的relay log事件。一般状况下MHA可以达到以下指标：9-12秒检测到主库故障，7-10秒关闭master所在的mysqld服务以防止故障扩散，并在几秒内实现各个slave上的relay log重放到新的master。总共的down time一般控制在10-30秒内。一个slave节点可否成为候选的主节点可经过在配置文件中配置它的优先级。因为master可以保证各个slave之间的数据一致性，因此全部的slave节点都有但愿成为主节点。在一般的replication环境中因为复制中断而极容易产生的数据一致性问题，在MHA中将不会发生。app

◎交互式（手动）故障转移

MHA能够被定义成手动地实现故障转移，而没必要去理会master的状态，即不监控master状态，确认故障发生后可经过MHA手动切换。框架

◎非交互式的故障转移

即不监控Master状态，可是发生故障后可经过MHA实现自动转移。ssh

◎在线切换Master到不一样的主机

一般当RAID控制器或者RAM损坏，或者须要将现有的master服务器进行升级的时候，咱们就须要切换当前的master到其余的主机中。这并非主库崩溃，可是却须要咱们手动切换。这一般是越快越好，由于这段时间内主库是写禁止的。因此，你还须要阻塞或删除正在进行的会话，由于不由止写就会致使数据一致性问题。举个例子，updating master1, updating master 2,committing master1, getting error on committing master 2就会致使数据一致性问题。因此说，快速的切换和优美平滑的阻塞写都是须要的。异步

MHA可以在0.5-2秒内实现切换，0.5-2秒的写阻塞一般是可接受的，因此你甚至能在非维护期间就在线切换master。诸如升级到高版本，升级到更快的服务器之类的工做，将会变得更容易。

Architecture of MHA

当主库发生崩溃，MHA经过如下方式修复

关于MHA如何修复一致性问题，详细请查看以下连接地址，这里我不作详细研究

http://www.slideshare.net/matsunobu/automated-master-failover

MHA Components

MHA由Manager节点和Node节点组成。

Manaer模块：能够管理多套Master-Slave Replication

Masterha_manager：提供实现自动故障检测和故障转移的命令

其余帮助脚本：提供手工故障转移，在线master切换，con 检查等功能

Node模块：部署在全部的MySQL Server上

Save_binary_logs:若有必要，复制master的二进制日志

Apply_diff_relay_logs:从数据最新的slave上产生不一样的relay log，而且将其应用到不一样的binlog events中

Purge_relay_log：清除relay log

MHA manager节点上运行着这些程序：监控mysql状态，master故障转移等。

MHA node节点上有实现自动故障转移的helper脚本，好比分析mysql binary/relay log，认出哪个relay log应该应用到其余的slave，并识别出这个relay log的位置，并将events应用到目标slave上等。Node节点应该运行在每个mysql server上。

若是MHA Manager挂掉了，MHA会尝试经过SSH链接到node节点并执行node节点的命令

Advantages of MHA

这一节简略介绍，大体内容在上面的叙述中已经有提到。

1 Masterfailover and slave promotion can be done very quickly

自动故障转移快

2 Mastercrash does not result in data inconsistency

主库崩溃不存在数据一致性问题

3 Noneed to modify current MySQL settings (MHA works with regular MySQL (5.0 orlater))

不须要对当前mysql环境作重大修改

4 Noneed to increase lots of servers

不须要添加额外的服务器(仅一台manager就可管理上百个replication)

5 Noperformance penalty

性能优秀，可工做在半同步复制和异步复制，当监控mysql状态时，仅须要每隔N秒向master发送ping包(默认3秒)，因此对性能无影响。你能够理解为MHA的性能和简单的主从复制框架性能同样。

6 Works with any storage engine

只要replication支持的存储引擎，MHA都支持，不会局限于innodb

Typical Use cases

怎么部署Manager节点

◎设置一个专门的Manager Server和多个Replication环境

因为MHA manager仅仅使用了很是少的cpu和内存资源，因此你可让一个manager管理不少个replication，甚至超过100个replication

◎Manager节点和一个salve节点复用

假如你只有一个replication环境，并且你可能不喜欢为配置一个专门的manager而花费了更多的硬件开销，那么你可让manager和一个slave节点复用。值得注意的是，若是这么配置了，尽管manager和slave在同一台机子上了，可是manger依旧经过SSH链接到slave，因此你依旧须要配置SSH无密码登录。

复制配置（这一部分简略翻译）

Singlemaster, multiple slaves

一主多从，这是最广泛的状况。

Singlemaster, multiple slaves (one on remote datacenter)

一主多从，将其中一个从配置成远程数据中心，其永远不会成为master

Singlemaster, multiple slaves, one candidate master

一主多从，并只配置一个候选主节点

Multiplemasters, multiple slaves

Threetier replication

管理MasterIP地址

HA方案中，不少状况下人们会在master上绑定一个虚拟IP。当master崩溃的时候，软件好比Keepalived会将虚拟IP从新指向正常的Server。

通用的方法就是建立一个全局的目录库，在库中存放着全部应用和IP地址之间的映射关系，用以取代VIP。在这种方案下，若是master崩溃，那么你就须要修改这个目录库。

两种方案都各有优缺点,MHA不会强制使用哪种。MHA能够调用其余的脚原本禁用\激活write ip地址，经过设置master_ip_failover_script 脚本的参数，该脚本可在manager节点中找到。你能够在该脚本中更新目录库，或者实现VIP漂移等任何你想干的事。你一样能够借用现有的HA方案的软件实现IP故障转移，好比Pacemaker，在这种状况下MHA将不会作IP故障转移。

和MySQL半同步复制配合使用

尽管MHA试图从崩溃的master上保存binarylog，但这并不老是可行的。例如，若是master是由于H/W故障或者是SSH故障，则MHA没法保存binlog，从而没法应用仅存在master上的binlog进行故障转移，这将会致使丢失最近的数据。

使用半同步复制能够极大地减小这种丢失数据的风险。因为它也是基于mysql的复制机制，因此MHA可以配合半同步复制一块儿使用。值得一提的是，只要有一台slave收到最新的binlog events，则MHA就会将它应用到全部的slave，从而保证了数据的一致性。

Tutorial

建立通用的复制环境

MHA不会本身建立replication环境，因此你须要本身手动搭建。换句话说，你能够将MHA部署在现有的复制环境中。举个例子，假设有四台主机：host1，host2，host3，host4.咱们将host1配置成master，host2和host3配置成slave，而host4配置成manager

在host1-host4上安装node节点

RHEL/Centos系统

 ## If you have not installed DBD::mysql, install it like below, or install from source.
  # yum install perl-DBD-MySQL

  ## Get MHA Node rpm package from "Downloads" section.
  # rpm -ivh mha4mysql-node-X.Y-0.noarch.rpm

Ubuntu/Debian系统

## If you have not installed DBD::mysql, install it like below, or install from source.
  # apt-get install libdbd-mysql-perl

  ## Get MHA Node deb package from "Downloads" section.
  # dpkg -i mha4mysql-node_X.Y_all.deb

源码安装

  ## Install DBD::mysql if not installed
  $ tar -zxf mha4mysql-node-X.Y.tar.gz
  $ perl Makefile.PL
  $ make
  $ sudo make install

在host4上安装manager节点

MHA的manager节点提供masterha_manager,masterha_master_switch等命令行的功能，依赖与Perl模块。在安装manager节点以前，你须要安装如下prel模块，另外别忘了在manager节点安装node节点。

MHA Node package
DBD::mysql
Config::Tiny
Log::Dispatch
Parallel::ForkManager
Time::HiRes (included from Perl v5.7.3)

RHEL/Centos系统

## Install dependent Perl modules
  # yum install perl-DBD-MySQL
  # yum install perl-Config-Tiny
  # yum install perl-Log-Dispatch
  # yum install perl-Parallel-ForkManager

  ## Install MHA Node, since MHA Manager uses some modules provided by MHA Node.
  # rpm -ivh mha4mysql-node-X.Y-0.noarch.rpm

  ## Finally you can install MHA Manager
  # rpm -ivh mha4mysql-manager-X.Y-0.noarch.rpm

Ubuntu/Debian系统

  ## Install dependent Perl modules
  # apt-get install libdbd-mysql-perl
  # apt-get install libconfig-tiny-perl
  # apt-get install liblog-dispatch-perl
  # apt-get install libparallel-forkmanager-perl

  ## Install MHA Node, since MHA Manager uses some modules provided by MHA Node.
  # dpkg -i mha4mysql-node_X.Y_all.deb

  ## Finally you can install MHA Manager
  # dpkg -i mha4mysql-manager_X.Y_all.deb

源码安装

  ## Install dependent Perl modules
  # MHA Node (See above)
  # Config::Tiny
  ## perl -MCPAN -e "install Config::Tiny"
  # Log::Dispatch
  ## perl -MCPAN -e "install Log::Dispatch"
  # Parallel::ForkManager 
  ## perl -MCPAN -e "install Parallel::ForkManager"
  ## Installing MHA Manager
  $ tar -zxf mha4mysql-manager-X.Y.tar.gz
  $ perl Makefile.PL
  $ make
  $ sudo make install

建立配置文件

下一步就是建立manager的配置文件，参数主要包括mysql server的用户名，密码，复制帐户的用户名和密码，工做目录等。全部的参数列表详见parameter表。

manager_host$ cat /etc/app1.cnf
  
  [server default]
  # mysql user and password
  user=root
  password=mysqlpass
  ssh_user=root
  # working directory on the manager
  manager_workdir=/var/log/masterha/app1
  # working directory on MySQL servers
  remote_workdir=/var/log/masterha/app1
  
  [server1]
  hostname=host1
  
  [server2]
  hostname=host2
  
  [server3]
  hostname=host3

注意到host1是当前的master，MHA会自动检测到它。

检查SSH链接

MHA manager经过SSH访问全部的node节点，各个node节点也一样须要经过SSH来相互发送不一样的relay log 文件，因此有必要在每个node和manager上配置SSH无密码登录。MHAmanager可经过masterha_check_ssh脚本检测SSH链接是否配置正常。

# masterha_check_ssh --conf=/etc/app1.cnf
  
  Sat May 14 14:42:19 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
  Sat May 14 14:42:19 2011 - [info] Reading application default configurations from /etc/app1.cnf..
  Sat May 14 14:42:19 2011 - [info] Reading server configurations from /etc/app1.cnf..
  Sat May 14 14:42:19 2011 - [info] Starting SSH connection tests..
  Sat May 14 14:42:19 2011 - [debug]  Connecting via SSH from root@host1(192.168.0.1) to root@host2(192.168.0.2)..
  Sat May 14 14:42:20 2011 - [debug]   ok.
  Sat May 14 14:42:20 2011 - [debug]  Connecting via SSH from root@host1(192.168.0.1) to root@host3(192.168.0.3)..
  Sat May 14 14:42:20 2011 - [debug]   ok.
  Sat May 14 14:42:21 2011 - [debug]  Connecting via SSH from root@host2(192.168.0.2) to root@host1(192.168.0.1)..
  Sat May 14 14:42:21 2011 - [debug]   ok.
  Sat May 14 14:42:21 2011 - [debug]  Connecting via SSH from root@host2(192.168.0.2) to root@host3(192.168.0.3)..
  Sat May 14 14:42:21 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [debug]  Connecting via SSH from root@host3(192.168.0.3) to root@host1(192.168.0.1)..
  Sat May 14 14:42:22 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [debug]  Connecting via SSH from root@host3(192.168.0.3) to root@host2(192.168.0.2)..
  Sat May 14 14:42:22 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [info] All SSH connection tests passed successfully.

若是有报错，则表示SSH配置有问题，影响MHA工做。你须要修复它并重试，一般的错误都是SSH public key认证没有正确配置。

检查复制配置

为了让MHA正常工做，全部的master和slave必须在配置文件中正确配置，MHA可经过masterha_check_repl 脚本检测复制是否正确配置。

  manager_host$ masterha_check_repl --conf=/etc/app1.cnf
  ...
  MySQL Replication Health is OK.

若是有报错，可经过查看日志修复它。当前的master必定不能是slave，其余全部的slave必须正确从master中复制。常见的错误可参考 TypicalErrors 页。

开启Manager

当你正确配置了mysql复制，正确安装了manager和node节点，SSH配置也正确，那么下一步就是开启manager，可经过 masterha_manager 命令开启

  manager_host$ masterha_manager --conf=/etc/app1.cnf
  ....
  Sat May 14 15:58:29 2011 - [info] Connecting to the master host1(192.168.0.1:3306) and sleeping until it doesn't respond..

若是全部的配置都正确，masterha_manager会检查mastermaster是否可用直到master崩溃。若是在监控master以前masterha_manager报错，你能够检查下logs并修改配置。全部的日志都会以标准错误的方式打印出来，也能够在manager配置文件中指定错误日志位置。典型的错误有复制配置问题，ssh无访问relay log的权限问题。默认状况下masterha_manager不是运行在后台，按下crtl+c键就会终止masterha_manager。

检查manager状态

当MHA manager启动监控之后，若是没有异常则不会打印任何信息。咱们可经过masterha_check_status命令检查manager的状态，如下是范例

manager_host$ masterha_check_status --conf=/etc/app1.cnf
  app1 (pid:5057) is running(0:PING_OK), master:host1

app1是MHA内部的应用名称，该名称可在manager配置文件中指定，若是manager终止或者配置得有错误，将会显示如下信息

  manager_host$ masterha_check_status --conf=/etc/app1.cnf
  app1 is stopped(1:NOT_RUNNING).

终止manager

你能够经过 masterha_stop命令来中止manager

manager_host$ masterha_stop --conf=/etc/app1.cnf
  Stopped app1 successfully.

若是没法中止，尝试加--abort参数，知道了怎么中止，下面咱们从新开启manager。

测试master的自动故障转移

如今master运行正常，manager监控也正常，下一步就是中止master，测试自动故障转移，你能够简单地中止master上的mysqld服务

  host1$  killall -9 mysqld mysqld_safe

这时候检查manager的log日志，看看host2是否成功成为新的master，而且host3从host2中复制。

当完成一次正常的故障转移后，manager进程将会终止。若是你须要将manager进程运行在后台，可运行以下指令，或者经过安装daemontools来实现(这里略)

manager_host$ nohup masterha_manager --conf=/etc/app1.cnf < /dev/null > /var/log/masterha/app1/app1.log 2>&1 &

Writing an application configuration file

为了MHA正常运行，你须要建立一个配置文件并设置参数，参数主要包括每一个mysql进程所在的服务器的用户名和密码，mysql服务的用户名和密码，工做目录等等。整个参数列表设置详细请见Parameters 页。

下面是一个配置文件的设置范例

manager_host$ cat /etc/app1.cnf

  [server default]
  # mysql user and password
user=root
password=mysqlpass
  # working directory on the manager
manager_workdir=/var/log/masterha/app1
  # manager log file
manager_log=/var/log/masterha/app1/app1.log
  # working directory on MySQL servers
remote_workdir=/var/log/masterha/app1

  [server1]
hostname=host1

  [server2]
hostname=host2

  [server3]
hostname=host3

全部的参数设置必须是"param=value"格式，打个比方，如下设置时错误的。

[server1]
hostname=host1
# incorrect: must be"no_master=1"
no_master

Application-scope参数必须写在[server default]块下，而在 [serverN]块下，你须要设置的是local-scope参数,好比hostname是一个local-scope参数，因此必须写在这个块下面。块名称必须是字母”server”开头。

Writing a global configuration file

若是你计划只用一台manager管理两个或以上的master-slave对，那么建议你建立一个全局配置文件，这样你就不须要为每个复制都配置相同的参数。若是你建立了一个文件/etc/masterha_default.cnf，那么它默认就是全局配置文件。

你能够在全局配置文件中设置application scope参数，例如，若是全部的mysql服务器的管理帐户和密码都是同样的，你就能够在这里设置user和password

如下是全局配置文件范例

Global configuration file (/etc/masterha_default.cnf)

[serverdefault]
user=root
password=rootpass
ssh_user=root
master_binlog_dir= /var/lib/mysql
remote_workdir=/data/log/masterha
secondary_check_script= masterha_secondary_check-s remote_host1 -s remote_host2
ping_interval=3
master_ip_failover_script=/script/masterha/master_ip_failover
shutdown_script= /script/masterha/power_manager
report_script= /script/masterha/send_master_failover_mail

以上这些参数可适用于全部的applications。

Application配置文件应该被单独配置，如下是app1(host1-4)和app2(host11-14)的范例

app1:

manager_host$ cat /etc/app1.cnf

  [server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log

  [server1]
hostname=host1
candidate_master=1

  [server2]
hostname=host2
candidate_master=1

  [server3]
hostname=host3

  [server4]
hostname=host4
no_master=1

app2:

manager_host$ cat /etc/app2.cnf

  [server default]
manager_workdir=/var/log/masterha/app2
manager_log=/var/log/masterha/app2/app2.log

  [server1]
hostname=host11
candidate_master=1

  [server2]
hostname=host12
candidate_master=1

  [server3]
hostname=host13

  [server4]
hostname=host14
no_master=1

Requirements and Limitations

1 这一部分作简要翻译，安装MHA的依赖和限制

2 SSH public key认证

3 仅支持Liunx操做系统

4 只有一台master能被设置成readonly=0，其余设置为只读

5若是是Master1 -> Master2-> Slave3这样的三级复制框架，在配置文件中只须要设置master1和master2这样的二级复制结构，并设置multi_tier_slave=1来支持三级复制结构。

6 MHA仅支持mysql 5.0及之后的版本

7 mysqlbinlog必须是3.3及以上版本

8 log-bin必须在每个可称为master的mysql服务器上设置

9 全部mysql服务器的复制过滤规则必须一致

10 必须在能成为master的服务器上设置复制帐户

11全部Mysql服务器上必须设置relay_log_purge=1，使得可以保存一段时间的relay log

12 基于语句的复制时，不要使用load datainfile命令

What MHA does on monitoring and failover

这一部分不少内容与上述重复，我只作简要翻译，在监控和故障转移过程当中，MHA主要作了如下几项工做

Verifying replicationsettings and identifying the current master

核实复制配置并识别出当前的master

Monitoring the masterserver
Detecting the masterserver failure
Verifying slaveconfigurations again
Shutting down failedmaster server (optional)
Recovering a newmaster
Activating the newmaster
Recovering the restslaves
Notifications(optional)

监控master server直到master崩溃，在这一步时manager再也不监控slave的状态。因此若是须要添加或删除slave节点，最好从新修改manager配置文件并重启MHA

检测到master故障

从新扫描配置文件，各类重连，核实master确实已经崩溃。若是最近一次的报错和如今同样而且时间相隔很是之短，MHA将会中止继续报错并进入下一步

关闭崩溃的主机(可选)，防止错误继续扩散

从新选举出一个新的master。若是崩溃的主机可以经过SSH链接，则复制崩溃主机的binlog到最新的slave上，并指向他的end_log_pos。在选择新的master上遵照manager上的配置文件，若是某个slave能成为master，则设置candidate_master=1。若是某个slave永远不能成为master，则设置no_master=1。识别出最新的slave并将其选举为新的master，最新的slave即接受到最新的relay log的那台slave。

激活新的master

从新设置其他的slave使其指向新选举出来的master

发送通告（可选），好比发送邮件，禁用新master上backup工做等，可经过 report_script脚本设置

What MHA does on online(fast) master switch

简要翻译，在线master切换过程当中，MHA主要作了如下工做

Verifying replication settings and identifying the current master
Identifying the new mater
Rejecting writes on the current master
Waiting for all slaves to catch up replication
Granting writes on the new master
Switching replication on all the rest slaves

核实复制配置并识别出当前的master，这个过程还会检测如下几个条件是否知足：

Slave上的IO线程is running

Salve上的SQL线程is running

Slave上全部的复制延迟少于2s

在master上的update操做没有超过2秒的

识别出新的master

在当前master上执行FLUSHTABLES WITH READ LOCK阻塞写操做防止数据一致性问题

等待全部的slave的复制跟上master

在新的master上执行SHOW MASTER STATUS，记录下binlog文件名称和pos，并执行SET GLOBAL read_only=0受权其写操做

在其余salve上并行执行CHANGE MASTER, START SLAVE，指向新的master，并start slave

Parameters

MHA manager配置参数列表以下

Parameter Name	Required?	Parameter Scope	Default Value	Example
hostname	Yes	Local Only	-	hostname=mysql_server1, hostname=192.168.0.1, etc
ip	No	Local Only	gethostbyname($hostname)	ip=192.168.1.3
port	No	Local/App/Global	3306	port=3306
ssh_host	No	Local Only	same as hostname	ssh_host=mysql_server1, ssh_host=192.168.0.1, etc
ssh_ip	No	Local Only	gethostbyname($ssh_host)	ssh_ip=192.168.1.3
ssh_port	No	Local/App/Global	22	ssh_port=22
ssh_connection_timeout	No	Local/App/Global	5	ssh_connection_timeout=20
ssh_options	No	Local/App/Global	""(empty string)	ssh_options="-i /root/.ssh/id_dsa2"
candidate_master	No	Local Only	0	candidate_master=1
no_master	No	Local Only	0	no_master=1
ignore_fail	No	Local Only	0	ignore_fail=1
skip_init_ssh_check	No	Local Only	0	skip_init_ssh_check=1
skip_reset_slave	No	Local/App/Global	0	skip_reset_slave=1
user	No	Local/App/Global	root	user=mysql_root
password	No	Local/App/Global	""(empty string)	password=rootpass
repl_user	No	Local/App/Global	Master_User value from SHOW SLAVE STATUS	repl_user=repl
repl_password	No	Local/App/Global	- (current replication password)	repl_user=replpass
disable_log_bin	No	Local/App/Global	0	disable_log_bin=1
master_pid_file	No	Local/App/Global	""(empty string)	master_pid_file=/var/lib/mysql/master1.pid
ssh_user	No	Local/App/Global	current OS user	ssh_user=root
remote_workdir	No	Local/App/Global	/var/tmp	remote_workdir=/var/log/masterha/app1
master_binlog_dir	No	Local/App/Global	/var/lib/mysql	master_binlog_dir=/data/mysql1,/data/mysql2
log_level	No	App/Global	info	log_level=debug
manager_workdir	No	App	/var/tmp	manager_workdir=/var/log/masterha
client_bindir	No	App	-	client_bindir=/usr/mysql/bin
client_libdir	No	App	-	client_libdir=/usr/lib/mysql
manager_log	No	App	STDERR	manager_log=/var/log/masterha/app1.log
check_repl_delay	No	App/Global	1	check_repl_delay=0
check_repl_filter	No	App/Global	1	check_repl_filter=0
latest_priority	No	App/Global	1	latest_priority=0
multi_tier_slave	No	App/Global	0	multi_tier_slave=1
ping_interval	No	App/Global	3	ping_interval=5
ping_type	No	App/Global	SELECT	ping_type=CONNECT
secondary_check_script	No	App/Global	null	secondary_check_script= masterha_secondary_check -s remote_dc1 -s remote_dc2
master_ip_failover_script	No	App/Global	null	master_ip_failover_script=/usr/local/custom_script/master_ip_failover
master_ip_online_change_script	No	App/Global	null	master_ip_online_change_script= /usr/local/custom_script/master_ip_online_change
shutdown_script	No	App/Global	null	shutdown_script= /usr/local/custom_script/master_shutdown
report_script	No	App/Global	null	report_script= /usr/local/custom_script/report
init_conf_load_script	No	App/Global	null	report_script= /usr/local/custom_script/init_conf_loader

Local Scope: Per-server scope parameters. Local scope parameters should be set under [server_xxx] blocks within application configuration file.
App Scope: Parameters for each {master, slaves} pair. These parameters should be set under a [server_default] block withinapplication configuration file.
Global Scope: Parameters for all {master, slaves} pairs. Global scope parameters are useful only when you manage multiple {master, slaves} pairs from single manager server. These parameters should be set in a global configuration file.

hostname

Hostname or IP address of the target MySQL server. This parameteris mandatory, and must be configured under [server_xxx]blockswithin applicationconfiguration file.

MySQL服务器的主机名称或IP地址，写在[server_xxx]下，xxx至关于各个mysql服务器。

ip

IP address of the target MySQL server. Default isgethostbyname($hostname). MHA Manager and MHA Node internally uses this IPaddress to connect via MySQL and SSH. Normally you don't need to configure thisparameter because it's automatically resolved from hostname parameter.

一般不须要配置

port

Port number of the target MySQL server. Default is 3306. MHAconnects to MySQL servers by using IP address and port.

Mysql服务的端口号，默认3306.

ssh_host

Ssh所在服务器，默认和hostname同样，不须要配置。

ssh_ip

(Supported from 0.53) IP address of the target MySQL server thatis used via SSH. Default is gethostbyname($ssh_host).一般不用配置

ssh_port

(Supported from 0.53) Port number of the target MySQL server usedvia SSH. Default is 22.

ssh_connection_timeout

(Supported from 0.54) Default is 5 seconds. Before adding thisparameter timeout was hard coded.

ssh_options

(Supported from 0.53) Additional SSH command line options.

candidate_master

在[server_xxx]下配置，值为1表明该mysql能够成为master，若是有两个以上mysql都设置为1，那么谁写在前面，谁的优先级就高。

no_master

设置为1表明该mysql永远没法成为master，一般在RAID0或者远程数据中心设置该mysql的no_master为1，或者manager和slave复用的主机上也这么设置。

ignore_fail

默认状况下，manager在slave出现故障的时候不会自动故障转移，好比SSH链接或者SQL线程有问题等。若是设置为1则该slave出现故障时会自动切换

skip_init_ssh_check

跳过初始化过程当中的ssh检查

skip_reset_slave

0.56版本后支持当master崩溃，跳过执行resetslave

user

mysql管理帐户，最好是root帐户，默认也就是root帐户

password

user对应的mysql帐户密码

repl_user

复制帐户，一般不用设置

repl_password

复制帐户对应的密码，一般不用设置

disable_log_bin

若是这个选项被设置，那么当将不一样的relay log应用到各个slave的过程当中，slave不产生binlog

master_pid_file

设置master的pid文件，一般不用设置

ssh_user

默认是当前登录manager的OS的用户，须要拥有读取mysql binlog和relay log的权限

remote_workdir

每个MHA node节点产生log文件的目录，若是不存在MHA会自动建立，须要给出相应目录的权限，默认在/var/tmp,最好本身指定

master_binlog_dir

master产生binlog文件的目录，最好本身指定，由于当master崩溃后，若是master还能连通SSH，就会复制其binlog，默认路径为/var/lib/mysql.

log_level

一般不用设置,表示日志级别

manager_workdir

manager产生自身状态的文件的目录，默认/var/tmp

client_bindir

If MySQL command line utilities are installed under a non-standarddirectory, use this option to set the directory.

client_libdir

If MySQL libraries are installed under a non-standard directory,use this option to set the directory.

manager_log

Manager日志的全路径名称，若未设置，默认输出到STDOUT/STDERR；
若是手动故障切换时，MHA则忽略参数设置，而直接输出到STDOUT/STDERR。

Full path file name that MHA Manager generates logs. If not set,MHA Manager prints to STDOUT/STDERR. When executing manual failover(interactive failover), MHA Manager ignores manager_log setting and alwaysprints to STDOUT/STDERR.

check_repl_delay

默认状况下，若是某个slave的复制延迟超过100MB，MHA则不会使其成为新的master，由于这须要很长的时间来恢复。若是设为0，MHA在选举新的master时会忽略复制延迟
若设置该参数为0，MHA在选择新的Master时，会忽略复制延迟。当某个mysql设置candidate_master=1时，再将check_repl_delay设置为0就颇有必要，确保它能成为新的master

check_repl_filter

检查复制过滤，默认状况下若是master和slave拥有不一样的过滤规则就会报错，经过设置为0能够忽略复制过滤检查，固然你得特别当心，确保没有问题。

latest_priority

默认状况下MHA在master崩溃后，选举复制延迟最低的slave为新的master，但容许你本身控制每一个slave成为主节点的优先级和顺序，经过设置该参数为0，并由写入candidate_master=1的mysql服务器顺序决定。

multi_tier_slave

从0.52版本开始，MHA支持多级复制配置。默认状况下，不容许设置三层以上的复制结构，好比h2从h1复制，而h3又从h2复制，MHA将会报错。经过设置multi_tier_slave参数，则h1崩溃后，h2被选举为新的master，而h3依旧从h2复制

ping_interval

这个参数指定了MHA manager应该多长时间执行ping SQL一次去链接master，当超过三次链接不上master，manager将断定master已经死亡。默认3秒ping一次，因此，总的检测时间大概就是12秒。若是因为链接错误或者链接数过多而致使的错误不会计入master死亡统计。

ping_type

0.53版本默认链接到master并执行select 1，即ping_type=SELECT。可是在某些场合，更好地方式是经过建立链接后又断开链接的方式，由于这个更加严格，而且能更快地发现tcp链接问题，即ping_type=CONNECT。从5.6版本之后还支持ping_type=INSERT

secondary_check_script

一般状况下，咱们建议使用两个或以上的网络路由来检测master是否存活。但默认状况下，manager仅经过单个路由来检查，即from Manager节点to Master节点。MHA实际上能够支持多个路由检测，只要经过调用额外的脚本masterha_check_script便可，下面是范例。

  secondary_check_script = masterha_secondary_check -s remote_host1 -s remote_host2

masterha_secondary_check脚本在manager节点上，一般状况下可以运行良好。

在这个范例中，MHA经过Manager-(A)->remote_host1-(B)->master_host

和Manager-(A)->remote_host2-(B)->master_host来检测master状态。若是在上述两步中都是A链接成功而B链接不成功，则MHA可以判断是master确实已经死亡并返回0，进行故障切换。若是A链接不成功，该脚本会返回2，MHA认为多是自身的网络问题而不进行故障转移。若是此时B链接成功，则实际上master是存活的。通俗地说，remote_host1和remote_host2应该被设置在不一样的网段上。

该脚本在通用场合中都适用，固然你也能够本身写脚原本实现更多的功能。下面是该脚本的参数列表。

--user=(SSH username of the remote hosts. ssh_user parameter value will be passed)
--master_host=(master's hostname)
--master_ip=(master's ip address)
--master_port=(master's port number)

注意该脚本须要依赖于IO::Socket::INET Perl包，Perl v5.6.0中默认已经包括。而该脚本容许链接任何一个远程服务器，因此须要配置SSH public key。而且，该脚本尝试创建远程服务器到master的tcp链接，意味着若是tcp链接成功，则mysql配置文件中的max_connections设置不受影响，而aborts_connects的值会自动加1

master_ip_failover_script

HA方案中，不少状况下人们会在master上绑定一个虚拟IP。当master崩溃的时候，软件好比Keepalived会将虚拟IP从新指向正常的Server。

都各有优缺点,MHA不会强制使用哪种，容许用户使用任何的ip漂移技术。master_ip_failover_script 脚本能用于该目的。换句话说，你须要本身写脚本实现应用层链接到新的master，而且必须定义master_ip_failover_script 脚本参数，下面是使用范例

  master_ip_failover_script= /usr/local/sample/bin/master_ip_failover

MHA Manager须要调用3次该脚本，第一次是在启动监控master以前(检查脚本是否可用)，，第二次是在调用shutdown_script脚本以前，而第三次是在新的Master应用完全部的
relay logs以后。MHA Manager会传递以下参数(这些参数不须要你本身配置)：

Checking phase
- --command=status
- --ssh_user=(current master's ssh username)
- --orig_master_host=(current master's hostname)
- --orig_master_ip=(current master's ip address)
- --orig_master_port=(current master's port number)

Current master shutdown phase
- --command=stop or stopssh
- --ssh_user=(dead master's ssh username, if reachable via ssh)
- --orig_master_host=(current(dead) master's hostname)
- --orig_master_ip=(current(dead) master's ip address)
- --orig_master_port=(current(dead) master's port number)

New master activation phase
- --command=start
- --ssh_user=(new master's ssh username)
- --orig_master_host=(dead master's hostname)
- --orig_master_ip=(dead master's ip address)
- --orig_master_port=(dead master's port number)
- --new_master_host=(new master's hostname)
- --new_master_ip=(new master's ip address)
- --new_master_port(new master's port number)
- --new_master_user=(new master's user)
- --new_master_password(new master's password)

若是你在master上使用了VIP，当master关闭阶段你可能不须要作任何事，只要你可以让VIP漂移到新的master。若是你使用的目录库方案，你可能须要删除或更新在master上的记录。在新的master激活阶段，你能够在新的master上插入/更新一条记录。而且，你能够作任何事使得应用层可以向新master中插入数据，好比设置read_only=0,建立用户的写权限等。

MHA manager会检查这个脚本返回的运行结果，若是返回0或10，则MHA manager继续运行。若是返回的不是0或10，mangaer就会终止。默认参数空置，因此MHA manager不会作任何事。

master_ip_online_change_script

这个和master_ip_failover_script参数类似，但它并非用在master故障切换上，而是用在master在线手动切换命令上，传递参数过程以下

Current master write freezing phase
- --command=stop or stopssh
- --orig_master_host=(current master's hostname)
- --orig_master_ip=(current master's ip address)
- --orig_master_port=(current master's port number)
- --orig_master_user=(current master's user)
- --orig_master_password=(current master's password)
- --orig_master_ssh_user=(from 0.56, current master's ssh user)
- --orig_master_is_new_slave=(from 0.56, notifying whether the orig master will be new slave or not)

New master granting write phase
- --command=start
- --orig_master_host=(orig master's hostname)
- --orig_master_ip=(orig master's ip address)
- --orig_master_port=(orig master's port number)
- --new_master_host=(new master's hostname)
- --new_master_ip=(new master's ip address)
- --new_master_port(new master's port number)
- --new_master_user=(new master's user)
- --new_master_password=(new master's password)
- --new_master_ssh_user=(from 0.56, new master's ssh user)

shutdown_script

你或许但愿强制关闭master所在的服务器，这样就能够防止灾难扩散，如下是范例

  shutdown_script= /usr/local/sample/bin/power_manager

MHA manager包中有一个范例脚本，在调用该命令前，MHA内部会检查master可否经过SSH链接。若是可链接(OS存活可是mysqld服务终止)，MHA manager传递以下参数

--command=stopssh
--ssh_user=(ssh username so that you can connect to the master)
--host=(master's hostname)
--ip=(master's ip address)
--port=(master's port number)
--pid_file=(master's pid file)

If the master is not reachable via SSH, MHA Manager passes thefollowing arguments.

--command=stop
--host=(master's hostname)
--ip=(master's ip address)

该脚本以以下方式运行。若是--command=stopssh被经过，则该脚本会经过ssh在mysqld和mysqld_safe进程上执行kill -9操做。若是—pid_file一样被经过，该脚本就会尝试只杀死代理的进程，而不是全部的mysql进程，这在单个master上运行多实例时是很是有用的。若是成功地经过SSH中止了该服务，则脚本运行结果返回10，而且后续manager会经过SSH链接到master并保存必要的binlog。若是该脚本没法经过SSH链接到master或者—command命令经过的话，那么该脚本将会尝试关闭机器电源。关闭电源依赖于H/W。若是电源关闭成功，该脚本返回0，不然返回1。当MHA接到返回的0时即开始故障切换。若是返回的代码既不是0也不是10，MHA将会终止故障转移工做。缺省参数为空，因此默认状况下MHA不对其作任何事。

而且，MHA在开始监控以后就会调用该脚本，如下参数将会在这个时候被传递过去，你能够在这里检测脚本设置。是否控制电源不少程度上决定于H/W，因此很是简易在这里检测电源状态。若是你哪里配置错了，在启动监控的时候你须要特别当心。

--command=status
--host=(master's hostname)
--ip=(master's ip address)

report_script

当故障切换完成或返回错误的时候，你或许但愿能够发送一个报告给你，report_script参数可适用于这种场合。MHA manager将会传递以下参数

--orig_master_host=(dead master's hostname)
--new_master_host=(new master's hostname)
--new_slave_hosts=(new slaves' hostnames, delimited by commas)
--subject=(mail subject)
--body=(body)

默认状况下该参数为空，即MHA不对其作任何事。在MHAmanager包的

Default parameter is empty, so MHA Manager does not invokeanything by default. /samples/scripts/send_report目录下有使用范例。

init_conf_load_script

这个脚本能被应用于你不想在配置文件中填写清楚的文本信息，好比密码和复制帐户的密码。经过从这个脚本中返回name=value对，你能够重写这个全局配置文件。范例以下

  #!/usr/bin/perl
  
  print "password=$ROOT_PASS\n";
  print "repl_password=$REPL_PASS\n";

缺省参数为空，因此MHA不对其作任何事。

Command reference

这一部分不作翻译，一般状况下只须要运行范例的命令，参数的详细介绍请见官方文档。

masterha_manager: 开启MHA Manager

  # masterha_manager --conf=/etc/conf/masterha/app1.cnf

masterha_master_switch：切换master，分故障master切换和在线master切换两种

交互式故障master切换

$ masterha_master_switch --master_state=dead --conf=/etc/app1.cnf--dead_master_host=host1

指定新的master

$ masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=host1 --new_master_host=host5

非交互式

$ masterha_master_switch --master_state=dead --conf=/etc/conf/masterha/app1.cnf --dead_master_host=host1 --new_master_host=host2 --interactive=0

在线master切换

$ masterha_master_switch --master_state=alive --conf=/etc/app1.cnf --new_master_host=host2

masterha_check_status：检查MHA运行状态

$ masterha_check_status --conf=/path/to/app1.cnf
  app1 (pid:8368) is running(0:PING_OK), master:host1
  $ echo $?
  0

Status Code(Exit code)	Status String	Description
0	PING_OK	Master is running and MHA Manager is monitoring. Master state is alive.
1	---	Unexpected error happened. For example, config file does not exist. If this error happens, check arguments are valid or not.
2	NOT_RUNNING	MHA Manager is not running. Master state is unknown.
3	PARTIALLY_RUNNING	MHA Manager main process is not running, but child processes are running. This should not happen and should be investigated. Master state is unknown.
10	INITIALIZING_MONITOR	MHA Manager is just after startup and initializing. Wait for a while and see how the status changes. Master state is unknown.
20	PING_FAILING	MHA Manager detects ping to master is failing. Master state is maybe down.
21	PING_FAILED	MHA Manager detects either a) ping to master failed three times, b) preparing for starting master failover. Master state is maybe down.
30	RETRYING_MONITOR	MHA Manager internal health check program detected that master was not reachable from manager, but after double check MHA Manager verified the master is alive, and currently waiting for retry. Master state is very likely alive.
31	CONFIG_ERROR	There are some configuration problems and MHA Manager can't monitor the target master. Check a logfile for detail. Master state is unknown.
32	TIMESTAMP_OLD	MHA Manager detects that ping to master is ok but status file is not updated for a long time. Check whether MHA Manager itself hangs or not. Master state is unknown.
50	FAILOVER_RUNNING	MHA Manager confirms that master is down and running failover. Master state is dead.
51	FAILOVER_ERROR	MHA Manager confirms that master is down and running failover, but failed during failover. Master state is dead.

masterha_check_repl：检查复制健康状态

manager_host$ masterha_check_repl --conf=/etc/app1.cnf
  ...
  MySQL Replication Health is OK.

masterha_stop：中止MHA manager运行

manager_host$ masterha_stop --conf=/etc/app1.cnf
  Stopped app1 successfully.

Masterha_conf_host：在配置文件中添加或移除host

# masterha_conf_host --command=add--conf=/etc/conf/masterha/app1.cnf --hostname=db101

Then the following lines will be added to theconf file.

[server_db101]
hostname=db101

You can add several parameters in the configfile by passing --param parameters, separated by semi-colon(;).

# masterha_conf_host --command=add--conf=/etc/conf/masterha/app1.cnf --hostname=db101 --block=server100--params="no_master=1;ignore_fail=1"

The following lines will be added to the conffile.

[server100]
hostname=db101
no_master=1
ignore_fail=1

You can also remove specified block. Thebelow command will remove the etire block server100.

# masterha_conf_host --command=delete--conf=/etc/conf/masterha/app1.cnf --block=server100

masterha_conf_host takes below arguments

master_check_ssh:ssh认证检查

# masterha_check_ssh --conf=/etc/app1.cnf
  
  Sat May 14 14:42:19 2011 - [warn] Global configuration file /etc/masterha_default.cnf not found. Skipping.
  Sat May 14 14:42:19 2011 - [info] Reading application default configurations from /etc/app1.cnf..
  Sat May 14 14:42:19 2011 - [info] Reading server configurations from /etc/app1.cnf..
  Sat May 14 14:42:19 2011 - [info] Starting SSH connection tests..
  Sat May 14 14:42:19 2011 - [debug]  Connecting via SSH from root@host1(192.168.0.1) to root@host2(192.168.0.2)..
  Sat May 14 14:42:20 2011 - [debug]   ok.
  Sat May 14 14:42:20 2011 - [debug]  Connecting via SSH from root@host1(192.168.0.1) to root@host3(192.168.0.3)..
  Sat May 14 14:42:20 2011 - [debug]   ok.
  Sat May 14 14:42:21 2011 - [debug]  Connecting via SSH from root@host2(192.168.0.2) to root@host1(192.168.0.1)..
  Sat May 14 14:42:21 2011 - [debug]   ok.
  Sat May 14 14:42:21 2011 - [debug]  Connecting via SSH from root@host2(192.168.0.2) to root@host3(192.168.0.3)..
  Sat May 14 14:42:21 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [debug]  Connecting via SSH from root@host3(192.168.0.3) to root@host1(192.168.0.1)..
  Sat May 14 14:42:22 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [debug]  Connecting via SSH from root@host3(192.168.0.3) to root@host2(192.168.0.2)..
  Sat May 14 14:42:22 2011 - [debug]   ok.
  Sat May 14 14:42:22 2011 - [info] All SSH connection tests passed successfully.

purge_relay_logs script：删除旧的relay log

[app@slave_host1]$ cat /etc/cron.d/purge_relay_logs
  # purge relay logs at 5am
  0 5 * * * app /usr/bin/purge_relay_logs --user=root --password=PASSWORD --disable_relay_log_purge >> /var/log/masterha/purge_relay_logs.log 2>&1

Monitoring multiple applications

你或许在一台机子上但愿监控多套master-salve复制，这很是容易，只要为application2建立一个新的配置文件并启动manager

  # masterha_manager --conf=/etc/conf/masterha/app1.cnf
  # masterha_manager --conf=/etc/conf/masterha/app2.cnf

若是你在app1和app2上有一些共有的参数，可在全局配置文件中配置。

Using with clustering software

若是你在master上使用虚拟IP，你可能已经使用了相似于Pacemaker的集群软件。若是你使用了类似的工具，你或许须要使用它们来管理虚拟IP地址，而不是让全部的事都由MHA完成。MHA仅用于故障切换，因此你须要使用配合使用其余集群工具来实现高可用。

下面是一个简要的Pacemaker配置(Heartbeat v1 模式)

# /etc/ha.d/haresources on host2
host2 failover_start IPaddr::192.168.0.3

# failover_start script example

start)
  `masterha_master_switch --master_state=dead--interactive=0 --wait_on_failover_error=0 --dead_master_host=host1--new_master_host=host2`
  exit

stop)
  # do nothing

# Application configuration file:

  [server1]
hostname=host1
candidate_master=1

  [server2]
hostname=host2
candidate_master=1

  [server3]
hostname=host3
no_master=1

由于数据文件不是共享的，因此数据资源也不用被集群工具或DRBD管理。处于这个目的，集群工具仅仅实现一个执行masterha_master_switch脚本和虚拟IP漂移的功能，你也能够本身使用手工脚本实现这些功能。