Keepalived集群软件高级使用(工做原理和状态通知)

时间 2019-11-06

原文原文链接

一、介绍html

Keeaplived主要有两种应用场景，一个是经过配置keepalived结合ipvs作到负载均衡（LVS+Keepalived），有此需求者可参考以往博文：http://lizhenliang.blog.51cto.com/7876557/1343734。另外一个是经过自身健康检查、资源接管功能作高可用（双机热备），实现故障转移。mysql

如下内容主要针对Keepalived+MySQL双主实现双机热备为根据，主要讲解keepalived的状态转换通知功能，利用此功能可有效增强对MySQL数据库监控。此文再也不讲述Keepalived+MySQL双主部署过程，有需求者可参考以往博文：http://lizhenliang.blog.51cto.com/7876557/1362313linux

二、keepalived主要做用nginx

keepalived采用VRRP（virtual router redundancy protocol），虚拟路由冗余协议，以软件的形式实现服务器热备功能。一般状况下是将两台linux服务器组成一个热备组（master-backup），同一时间热备组内只有一台主服务器（master）提供服务，同时master会虚拟出一个共用IP地址（VIP），这个VIP只存在master上并对外提供服务。若是keepalived检测到master宕机或服务故障，备服务器（backup）会自动接管VIP成为master，keepalived并将master从热备组移除，当master恢复后，会自动加入到热备组，默认再抢占成为master，起到故障转移功能。sql

三、工做在三层、四层和七层原理shell

Layer3：工做在三层时，keepalived会按期向热备组中的服务器发送一个ICMP数据包，来判断某台服务器是否故障，若是故障则将这台服务器从热备组移除。数据库

Layer4：工做在四层时，keepalived以TCP端口的状态判断服务器是否故障，好比检测mysql 3306端口，若是故障则将这台服务器从热备组移除。bash

示例：
! Configuration File for keepalived
global_defs {
   notification_email {
     example@163.com
   }
   notification_email_from  example@example.com
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id MYSQL_HA
}
vrrp_instance VI_1 {
    state BACKUP
    interface eth1
    virtual_router_id 50
    nopreempt                   #当主down时，备接管，主恢复，不自动接管
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        ahth_pass 123
    }
    virtual_ipaddress {
        192.168.1.200          #虚拟IP地址
    }
}
virtual_server 192.168.1.200 3306 {        
    delay_loop 6
#    lb_algo rr 
#    lb_kind NAT
    persistence_timeout 50
    protocol TCP
    real_server 192.168.1.201 3306 {       #监控本机3306端口
        weight 1
        notify_down /etc/keepalived/kill_keepalived.sh   #检测3306端口为down状态就执行此脚本（只有keepalived关闭，VIP才漂移 ） 
        TCP_CHECK {         #健康状态检测方式，可针对业务需求调整（TTP_GET|SSL_GET|TCP_CHECK|SMTP_CHECK|MISC_CHECK）
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
        }
    }
}

Layer7：工做在七层时，keepalived根据用户设定的策略判断服务器上的程序是否正常运行，若是故障则将这台服务器从热备组移除。服务器

示例：
! Configuration File for keepalived
global_defs {
   notification_email {
     example@163.com
   }
   notification_email_from  example@example.com
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id MYSQL_HA
}
vrrp_script check_nginx {
    script /etc/keepalived/check_nginx.sh    #检测脚本
    interval 2   #执行间隔时间
}
vrrp_instance VI_1 {
    state BACKUP
    interface eth1
    virtual_router_id 50
    nopreempt                   #当主down时，备接管，主恢复，不自动接管
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        ahth_pass 123
    }
    virtual_ipaddress {
        192.168.1.200          #虚拟IP地址
    }
    track_script {          #在实例中引用脚本
        check_nginx
    }
}

脚本内容以下：
# cat /etc/keepalived/check_nginx.sh
Count1=`netstat -antp |grep -v grep |grep nginx |wc -l`
if [ $Count1 -eq 0 ]; then
    /usr/local/nginx/sbin/nginx
    sleep 2
    Count2=`netstat -antp |grep -v grep |grep nginx |wc -l`
    if [ $Count2 -eq 0 ]; then
        service keepalived stop
    else
        exit 0
    fi 
else
    exit 0
fi

四、健康状态检测方式负载均衡

4.1 HTTP服务状态检测

HTTP_GET或SSL_GET {    
      url {
          path /index.html        #检测url，可写多个
          digest  24326582a86bee478bac72d5af25089e    #检测效验码
          #digest效验码获取方法：genhash -s IP -p 80 -u http://IP/index.html 
          status_code 200         #检测返回http状态码
      }
      connect_port 80 #链接端口
      connect_timeout 3  #链接超时时间
      nb_get_retry 3  #重试次数
      delay_before_retry 2 #链接间隔时间
   }

4.2 TCP端口状态检测（使用TCP端口服务基本上均可以使用）

TCP_CHECK {    
      connect_port 80     #健康检测端口，默认为real_server后跟端口
      connect_timeout 5
      nb_get_retry 3
      delay_before_retry 3
  }

4.3 邮件服务器SMTP检测

SMTP_CHECK {            #健康检测邮件服务器smtp    
      host {
          connect_ip
          connect_port
      }
      connect_timeout 5
      retry 2
      delay_before_retry 3
      hello_name "mail.domain.com"
  }

4.4 用户自定义脚本检测real_server服务状态

MISC_CHECK {    
      misc_path /script.sh    #指定外部程序或脚本位置
      misc_timeout 3      #执行脚本超时时间
      !misc_dynamic       #不动态调整服务器权重（weight），若是启用将经过退出状态码动态调整real_server权重值
  }

五、状态转换通知功能

keepalived主配置邮件通知功能，默认当real_server宕机或者恢复时才会发出邮件。有时咱们更想知道keepalived的主服务器故障切换后，VIP是否顺利漂移到备服务器，MySQL服务器是否正常？那写个监控脚本吧，能够，但不必，由于keepalived具有状态检测功能，因此咱们直接使用就好了。

主配置默认邮件通知配置模板以下：
global_defs           # Block id
    {
    notification_email    # To:
        {
        admin@example1.com
        ...
         }
    # From: from address that will be in header
    notification_email_from admin@example.com
    smtp_server 127.0.0.1   # IP
    smtp_connect_timeout 30 # integer, seconds
    router_id my_hostname   # string identifying the machine,
                            # (doesn't have to be hostname).
    enable_traps            # enable SNMP traps
        }

5.1 实例状态通知

a) notify_master ：节点变为master时执行

b) notify_backup ：节点变为backup时执行

c) notify_fault ：节点变为故障时执行

5.2 虚拟服务器检测通知

a) notify_up ：虚拟服务器up时执行

b) notify_down ：虚拟服务器down时执行

示例：
    ! Configuration File for keepalived
    global_defs {
       notification_email {
         example@163.com
       }
       notification_email_from example@example.com 
       smtp_server 127.0.0.1
       smtp_connect_timeout 30
       router_id MYSQL_HA
    }
    vrrp_instance VI_1 {
        state BACKUP
        interface eth1
        virtual_router_id 50
        nopreempt           #当主down时，备接管，主恢复，不自动接管
        priority 100
        advert_int 1
        authentication {
            auth_type PASS
            ahth_pass 123
        }
        virtual_ipaddress {
            192.168.1.200
        }
            notify_master /etc/keepalived/to_master.sh
            notify_backup /etc/keepalived/to_backup.sh
            notify_fault /etc/keepalived/to_fault.sh
    }
    virtual_server 192.168.1.200 3306 {
        delay_loop 6
        persistence_timeout 50
        protocol TCP
        real_server 192.168.1.201 3306 {
            weight 1
            notify_up /etc/keepalived/mysql_up.sh
            notify_down /etc/keepalived/mysql_down.sh    
            TCP_CHECK {
                connect_timeout 3
                nb_get_retry 3
                delay_before_retry 3
            }
        }
    }

状态参数后能够是bash命令，也能够是shell脚本，内容根据本身需求定义，以上示例中所涉及状态脚本以下：

1) 当服务器改变为主时执行此脚本

# cat to_master.sh 
#!/bin/bash
Date=$(date +%F" "%T)
IP=$(ifconfig eth0 |grep "inet addr" |cut -d":" -f2 |awk '{print $1}')
Mail="baojingtongzhi@163.com"
echo "$Date $IP change to master." |mail -s "Master-Backup Change Status" $Mail

2) 当服务器改变为备时执行此脚本

# cat to_backup.sh
#!/bin/bash
Date=$(date +%F" "%T)
IP=$(ifconfig eth0 |grep "inet addr" |cut -d":" -f2 |awk '{print $1}')
Mail="baojingtongzhi@163.com"
echo "$Date $IP change to backup." |mail -s "Master-Backup Change Status" $Mail

3) 当服务器改变为故障时执行此脚本

# cat to_fault.sh
#!/bin/bash
Date=$(date +%F" "%T)
IP=$(ifconfig eth0 |grep "inet addr" |cut -d":" -f2 |awk '{print $1}')
Mail="baojingtongzhi@163.com"
echo "$Date $IP change to fault." |mail -s "Master-Backup Change Status" $Mail

4) 当检测TCP端口3306为不可用时，执行此脚本，杀死keepalived，实现切换

# cat mysql_down.sh
#!/bin/bash
Date=$(date +%F" "%T)
IP=$(ifconfig eth0 |grep "inet addr" |cut -d":" -f2 |awk '{print $1}')
Mail="baojingtongzhi@163.com"
pkill keepalived
echo "$Date $IP The mysql service failure,kill keepalived." |mail -s "Master-Backup MySQL Monitor" $Mail

5) 当检测TCP端口3306可用时，执行此脚本

# cat mysql_up.sh
#!/bin/bash
Date=$(date +%F" "%T)
IP=$(ifconfig eth0 |grep "inet addr" |cut -d":" -f2 |awk '{print $1}')
Mail="baojingtongzhi@163.com"
echo "$Date $IP The mysql service is recovery." |mail -s "Master-Backup MySQL Monitor" $Mail