keepalived+nginx实现HA高可用的web负载均衡

时间 2019-12-12

标签 keepalived+nginx keepalived nginx 实现可用 web 负载均衡栏目 Nginx 繁體版

原文原文链接

Keepalived 是一种高性能的服务器高可用或热备解决方案， Keepalived 能够用来防止服务器单点故障的发生，经过配合 Nginx 能够实现 web 前端服务的高可用。
Keepalived 以 VRRP 协议为实现基础，用 VRRP 协议来实现高可用性(HA)。 VRRP(Virtual RouterRedundancy Protocol)协议是用于实现路由器冗余的协议， VRRP 协议将两台或多台路由器设备虚拟成一个设备，对外提供虚拟路由器 IP(一个或多个)，而在路由器组内部，若是实际拥有这个对外 IP 的路由器若是工做正常的话就是 MASTER，或者是经过算法选举产生， MASTER 实现针对虚拟路由器 IP 的各类网络功能，如 ARP 请求， ICMP，以及数据的转发等；其余设备不拥有该虚拟 IP，状态是 BACKUP，除了接收 MASTER 的VRRP 状态通告信息外，不执行对外的网络功能。当主机失效时， BACKUP 将接管原先 MASTER 的网络功能。VRRP 协议使用多播数据来传输 VRRP 数据， VRRP 数据使用特殊的虚拟源 MAC 地址发送数据而不是自身网卡的 MAC 地址， VRRP 运行时只有 MASTER 路由器定时发送 VRRP 通告信息，表示 MASTER 工做正常以及虚拟路由器 IP(组)， BACKUP 只接收 VRRP 数据，不发送数据，若是必定时间内没有接收到 MASTER 的通告信息，各 BACKUP 将宣告本身成为 MASTER，发送通告信息，从新进行 MASTER 选举状态。

html

ip规划以下:定义VIP为:172.16.23.132前端

nginx1:172.16.23.129 keepalived:172.16.23.129node

nginx2:172.16.23.130 keepalived:172.16.23.130nginx

httpd1:172.16.23.128git

httpd2:172.16.23.131web

上面规划中nginx只提供负载均衡做用,并不实现web访问功能:算法

[root@master ~]# cat /etc/ansible/hosts|grep "^\[nodes" -A 2
[nodes]
172.16.23.129
172.16.23.130

查看nginx服务状态:shell

[root@master ~]# ansible nodes -m shell -a "systemctl status nginx"|grep running
   Active: active (running) since 二 2018-12-18 16:33:04 CST; 12min ago
   Active: active (running) since 二 2018-12-18 16:35:51 CST; 10min ago

首先nginx服务正常开启,而后查看后端服务httpd:json

[root@master ~]# cat /etc/ansible/hosts|grep "^\[backend_nodes" -A 2
[backend_nodes]
172.16.23.128
172.16.23.131

查看httpd服务状态:后端

[root@master ~]# ansible backend_nodes -m shell -a "systemctl status httpd"|grep running
   Active: active (running) since 二 2018-12-18 16:29:36 CST; 22min ago
   Active: active (running) since 二 2018-12-18 16:30:03 CST; 21min ago

而后在nginx两台服务器上分别测试负载均衡效果:

[root@master ~]# ansible 172.16.23.129 -m get_url -a "url=http://172.16.23.129/index.html dest=/tmp"|grep status_code
    "status_code": 200, 
[root@master ~]# ansible 172.16.23.129 -m shell -a "cat /tmp/index.html"
172.16.23.129 | CHANGED | rc=0 >>
172.16.23.128

[root@master ~]# ansible 172.16.23.129 -m get_url -a "url=http://172.16.23.129/index.html dest=/tmp"|grep status_code
    "status_code": 200, 
[root@master ~]# ansible 172.16.23.129 -m shell -a "cat /tmp/index.html"
172.16.23.129 | CHANGED | rc=0 >>
172.16.23.131

由上面能够看出nginx1:172.16.23.129上进行测试返回后端httpd服务的web页面:172.16.23.128以及172.16.23.131,测试访问没有问题,负载均衡没有问题

[root@master ~]# ansible 172.16.23.130 -m get_url -a "url=http://172.16.23.130/index.html dest=/tmp"|grep status_code
    "status_code": 200, 
[root@master ~]# ansible 172.16.23.130 -m shell -a "cat /tmp/index.html"
172.16.23.130 | CHANGED | rc=0 >>
172.16.23.128

[root@master ~]# ansible 172.16.23.130 -m get_url -a "url=http://172.16.23.130/index.html dest=/tmp"|grep status_code
    "status_code": 200, 
[root@master ~]# ansible 172.16.23.130 -m shell -a "cat /tmp/index.html"
172.16.23.130 | CHANGED | rc=0 >>
172.16.23.131

由上面能够看见nginx2服务访问后端httpd服务也是彻底OK的,因而nginx两台服务负载均衡效果达到,如今在nginx两台服务器上安装keepalived服务:

[root@master ~]# ansible nodes -m shell -a "systemctl status keepalived"|grep running
   Active: active (running) since 二 2018-12-18 16:06:38 CST; 52min ago
   Active: active (running) since 二 2018-12-18 16:05:04 CST; 54min ago

查看VIP信息:发现vip在node1节点上

[root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=\/)'"
172.16.23.129 | CHANGED | rc=0 >>
node1
172.16.23.129
172.16.23.132

172.16.23.130 | CHANGED | rc=0 >>
node2
172.16.23.130

能够看出VIP落在了nginx1也就是node1节点上,而后经过访问vip看看负载均衡效果:

[root@master ~]# curl http://172.16.23.132
172.16.23.131
[root@master ~]# curl http://172.16.23.132
172.16.23.128

由上面返回结果看,没有任何问题,如今摘掉一台nginx服务器,看看keepalived状况,以及访问vip的状况:

[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl stop nginx"
172.16.23.130 | CHANGED | rc=0 >>

查看keepalived服务状态,查看vip信息:

[root@master ~]# ansible nodes -m shell -a "systemctl status keepalived"|grep running
   Active: active (running) since 二 2018-12-18 16:05:04 CST; 1h 4min ago
   Active: active (running) since 二 2018-12-18 16:06:38 CST; 1h 3min ago

[root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=\/)'"
172.16.23.130 | CHANGED | rc=0 >>
node2
172.16.23.130

172.16.23.129 | CHANGED | rc=0 >>
node1
172.16.23.129
172.16.23.132

vip信息没有漂移,keepalived服务状态正常,如今访问vip:

[root@master ~]# curl http://172.16.23.132
172.16.23.128
[root@master ~]# curl http://172.16.23.132
172.16.23.131

经过vip访问web服务没有问题

如今将nginx服务开启,端掉一个节点的keepalived服务:

[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl start nginx"
172.16.23.130 | CHANGED | rc=0 >>

[root@master ~]# ansible nodes -m shell -a "systemctl status nginx"|grep running
   Active: active (running) since 二 2018-12-18 17:15:48 CST; 18s ago
   Active: active (running) since 二 2018-12-18 16:33:04 CST; 43min ago

[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl stop keepalived"
172.16.23.130 | CHANGED | rc=0 >>

而后在该节点日志查看以下:tail -f /var/log/message

Dec 18 17:16:50 node2 systemd: Stopping LVS and VRRP High Availability Monitor...
Dec 18 17:16:50 node2 Keepalived[12981]: Stopping
Dec 18 17:16:50 node2 Keepalived_healthcheckers[12982]: Stopped
Dec 18 17:16:51 node2 Keepalived_vrrp[12983]: Stopped
Dec 18 17:16:51 node2 Keepalived[12981]: Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Dec 18 17:16:52 node2 systemd: Stopped LVS and VRRP High Availability Monitor.

[root@master ~]# ansible nodes -m shell -a "systemctl status keepalived"|grep running
   Active: active (running) since 二 2018-12-18 16:06:38 CST; 1h 10min ago


[root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=\/)'"
172.16.23.130 | CHANGED | rc=0 >>
node2
172.16.23.130

172.16.23.129 | CHANGED | rc=0 >>
node1
172.16.23.129
172.16.23.132

因为断掉的是nginx2也就是node2节点的keepalived服务,因此vip仍是在node1上,并无漂移在node2,查看node1和node2节点上keepalived服务的配置文件:

[root@master ~]# ansible nodes -m shell -a "cat /etc/keepalived/keepalived.conf"
172.16.23.129 | CHANGED | rc=0 >>
! Configuration File for keepalived

global_defs {
   notification_email {
       346165580@qq.com
   }
   notification_email_from json_hc@163.com
   smtp_server smtp.163.com
   smtp_connect_timeout 30
   router_id test
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    virtual_router_id 51
    priority 100
    nopreempt           # 非抢占模式
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        172.16.23.132/24 dev ens33
    }
}

172.16.23.130 | CHANGED | rc=0 >>
! Configuration File for keepalived

global_defs {
   notification_email {
       346165580@qq.com
   }
   notification_email_from json_hc@163.com
   smtp_server smtp.163.com
   smtp_connect_timeout 30
   router_id test
}

vrrp_instance VI_1 {
    state BACKUP 
    interface ens33
    virtual_router_id 51
    priority 99
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    virtual_ipaddress {
        172.16.23.132/24 dev ens33 
    }
}

能够由配置看出,只有优先级不同以及node1节点设置了nopreempt # 非抢占模式,如今将node2节点的keepalived服务开启,而后将node1节点的keepalived服务关掉,看看vip信息:

[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl start keepalived"
172.16.23.130 | CHANGED | rc=0 >>

查看node2日志:

Dec 18 17:23:14 node2 systemd: Starting LVS and VRRP High Availability Monitor...
Dec 18 17:23:14 node2 Keepalived[15994]: Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Dec 18 17:23:14 node2 Keepalived[15994]: Opening file '/etc/keepalived/keepalived.conf'.
Dec 18 17:23:14 node2 Keepalived[15995]: Starting Healthcheck child process, pid=15996
Dec 18 17:23:14 node2 Keepalived_healthcheckers[15996]: Opening file '/etc/keepalived/keepalived.conf'.
Dec 18 17:23:14 node2 Keepalived[15995]: Starting VRRP child process, pid=15997
Dec 18 17:23:14 node2 systemd: Started LVS and VRRP High Availability Monitor.
Dec 18 17:23:14 node2 Keepalived_vrrp[15997]: Registering Kernel netlink reflector
Dec 18 17:23:14 node2 Keepalived_vrrp[15997]: Registering Kernel netlink command channel
Dec 18 17:23:14 node2 Keepalived_vrrp[15997]: Registering gratuitous ARP shared channel
Dec 18 17:23:14 node2 Keepalived_vrrp[15997]: Opening file '/etc/keepalived/keepalived.conf'.
Dec 18 17:23:24 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) removing protocol VIPs.
Dec 18 17:23:24 node2 Keepalived_vrrp[15997]: Using LinkWatch kernel netlink reflector...
Dec 18 17:23:24 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) Entering BACKUP STATE
Dec 18 17:23:24 node2 Keepalived_vrrp[15997]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]

两节点keepalived服务状态,以及vip信息:

[root@master ~]# ansible nodes -m shell -a "systemctl status keepalived"|grep running
   Active: active (running) since 二 2018-12-18 17:23:14 CST; 56s ago
   Active: active (running) since 二 2018-12-18 16:06:38 CST; 1h 17min ago

[root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=\/)'"
172.16.23.129 | CHANGED | rc=0 >>
node1
172.16.23.129
172.16.23.132

172.16.23.130 | CHANGED | rc=0 >>
node2
172.16.23.130

如今将node1的keepalived服务停掉看看vip信息:

[root@master ~]# ansible 172.16.23.129 -m shell -a "systemctl stop keepalived"
172.16.23.129 | CHANGED | rc=0 >>

查看各自节点的日志信息:

Dec 18 17:27:41 node1 systemd: Stopping LVS and VRRP High Availability Monitor...
Dec 18 17:27:41 node1 Keepalived[24483]: Stopping
Dec 18 17:27:41 node1 Keepalived_vrrp[24485]: VRRP_Instance(VI_1) sent 0 priority
Dec 18 17:27:41 node1 Keepalived_vrrp[24485]: VRRP_Instance(VI_1) removing protocol VIPs.
Dec 18 17:27:41 node1 Keepalived_healthcheckers[24484]: Stopped
Dec 18 17:27:42 node1 Keepalived_vrrp[24485]: Stopped
Dec 18 17:27:42 node1 Keepalived[24483]: Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Dec 18 17:27:42 node1 systemd: Stopped LVS and VRRP High Availability Monitor.

Dec 18 17:27:42 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) Transition to MASTER STATE
Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) Entering MASTER STATE
Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) setting protocol VIPs.
Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 172.16.23.132
Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:27:43 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:27:48 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:27:48 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 172.16.23.132
Dec 18 17:27:48 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:27:48 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:27:48 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:27:48 node2 Keepalived_vrrp[15997]: Sending gratuitous ARP on ens33 for 172.16.23.132

能够看到vip漂移的信息切换,如今查看vip信息:

[root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=\/)'"
172.16.23.130 | CHANGED | rc=0 >>
node2
172.16.23.130
172.16.23.132

172.16.23.129 | CHANGED | rc=0 >>
node1
172.16.23.129

由上面信息,vip确认漂移到了node2节点,如今将node1节点的keepalived服务开启,看看vip是否会再次漂移回去到node1节点:

[root@master ~]# ansible 172.16.23.129 -m shell -a "systemctl start keepalived"
172.16.23.129 | CHANGED | rc=0 >>

查看node1日志:

Dec 18 17:30:18 node1 systemd: Starting LVS and VRRP High Availability Monitor...
Dec 18 17:30:18 node1 Keepalived[28009]: Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Dec 18 17:30:18 node1 Keepalived[28009]: Opening file '/etc/keepalived/keepalived.conf'.
Dec 18 17:30:18 node1 Keepalived[28010]: Starting Healthcheck child process, pid=28011
Dec 18 17:30:18 node1 Keepalived_healthcheckers[28011]: Opening file '/etc/keepalived/keepalived.conf'.
Dec 18 17:30:18 node1 Keepalived[28010]: Starting VRRP child process, pid=28012
Dec 18 17:30:18 node1 systemd: Started LVS and VRRP High Availability Monitor.
Dec 18 17:30:18 node1 Keepalived_vrrp[28012]: Registering Kernel netlink reflector
Dec 18 17:30:18 node1 Keepalived_vrrp[28012]: Registering Kernel netlink command channel
Dec 18 17:30:18 node1 Keepalived_vrrp[28012]: Registering gratuitous ARP shared channel
Dec 18 17:30:18 node1 Keepalived_vrrp[28012]: Opening file '/etc/keepalived/keepalived.conf'.
Dec 18 17:30:28 node1 Keepalived_vrrp[28012]: VRRP_Instance(VI_1) removing protocol VIPs.
Dec 18 17:30:28 node1 Keepalived_vrrp[28012]: Using LinkWatch kernel netlink reflector...
Dec 18 17:30:28 node1 Keepalived_vrrp[28012]: VRRP_Instance(VI_1) Entering BACKUP STATE
Dec 18 17:30:28 node1 Keepalived_vrrp[28012]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]

查看node2日志:

Dec 18 17:30:01 node2 systemd: Started Session 1328 of user root.
Dec 18 17:30:01 node2 systemd: Starting Session 1328 of user root.
Dec 18 17:30:05 node2 systemd-logind: Removed session 1327.
Dec 18 17:31:02 node2 systemd: Started Session 1329 of user root.
Dec 18 17:31:02 node2 systemd: Starting Session 1329 of user root.

由node2日志信息显示vip并无作漂移切换动做,如今查看vip:

[root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=\/)'"
172.16.23.129 | CHANGED | rc=0 >>
node1
172.16.23.129

172.16.23.130 | CHANGED | rc=0 >>
node2
172.16.23.130
172.16.23.132

根据上面也能够验证到vip并无漂移回来,这正好验证了nopreempt # 非抢占模式的功能

根据上面操做说明:

若是不但愿keepalived服务再次上线而伴随vip再次漂移,能够设置nopreempt # 非抢占模式,具体配置信息参考上面的例子(只有优先级不一样,外加上nopreempt # 非抢占模式)

如今vip在node2节点上,若是node2节点keepalived服务再次挂掉,看看vip是否会漂移:

[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl stop keepalived"
172.16.23.130 | CHANGED | rc=0 >>

查看node2日志:

Dec 18 17:35:59 node2 systemd: Stopping LVS and VRRP High Availability Monitor...
Dec 18 17:35:59 node2 Keepalived[15995]: Stopping
Dec 18 17:35:59 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) sent 0 priority
Dec 18 17:35:59 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) removing protocol VIPs.
Dec 18 17:35:59 node2 Keepalived_healthcheckers[15996]: Stopped
Dec 18 17:36:00 node2 Keepalived_vrrp[15997]: Stopped
Dec 18 17:36:00 node2 Keepalived[15995]: Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Dec 18 17:36:00 node2 systemd: Stopped LVS and VRRP High Availability Monitor.

查看node1日志:

Dec 18 17:36:05 node1 Keepalived_vrrp[28012]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:36:05 node1 Keepalived_vrrp[28012]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 172.16.23.132
Dec 18 17:36:05 node1 Keepalived_vrrp[28012]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:36:05 node1 Keepalived_vrrp[28012]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:36:05 node1 Keepalived_vrrp[28012]: Sending gratuitous ARP on ens33 for 172.16.23.132
Dec 18 17:36:05 node1 Keepalived_vrrp[28012]: Sending gratuitous ARP on ens33 for 172.16.23.132

能够看到vip又再次漂移到node1节点上了,这正是目前但愿看到的

[root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=\/)'"
172.16.23.129 | CHANGED | rc=0 >>
node1
172.16.23.129
172.16.23.132

172.16.23.130 | CHANGED | rc=0 >>
node2
172.16.23.130

由上面测试得出:

当node1优先级高于node2节点,而且node1设置了nopreempt # 非抢占模式,那么当node1上面的keepalived服务挂掉并再次上线时,vip不会进行漂移回去,只有当node2上面的keepalived服务挂掉,vip才会再次漂移到node1节点

如今测试后端提供的httpd服务:

若是后端httpd服务挂掉一个,访问以下:

[root@master ~]# ansible 172.16.23.131 -m shell -a "systemctl stop httpd"

172.16.23.131

| CHANGED | rc=0 >> [root@master ~]# ansible 172.16.23.131 -m shell -a "systemctl status httpd" 172.16.23.131 | FAILED | rc=3 >>

根据vip访问以下:

[root@master ~]# curl http://172.16.23.132
172.16.23.128
[root@master ~]# curl http://172.16.23.132
172.16.23.128

访问没有任何问题,如今若是将172.16.23.131这台的httpd服务开启进行手动测试,并不提供给vip进行访问,当测试没问题后再进行为vip进行调用:

将两台nginx的配置以下进行修改:

    upstream webserver {
        server 172.16.23.128 weight=1;
        server 172.16.23.131 weight=1;
    }

将server 172.16.23.131 weight=1;这一行进行摘掉,由于nginx两台,因此一台一台来处理,确保应用不会中断(在172.16.23.131上线以前操做)

因为两台nginx都只负载到了172.16.23.128上面,因此当172.16.23.131上线了也不会被调度到,若是要将172.16.23.131做为服务提供,那么再将nginx一台一台进行增长后端节点就行