1、解决LVS server单点故障html
若是集群中只有一台LVS server提供数据包分发服务,若是宕机,则会致使全部的业务重点,由于全部的请求都没法到达后面的Real server。nginx
此时咱们能够采用多台LVS server造成主备模型,来解决单点故障的问题。须要解决的细节有如下几个方面:后端
1)备机如何得知主机宕机浏览器
要解决这个问题,有两种解决方案:服务器
2)主机宕机后,哪台备机接替成为主机网络
有两个策略:负载均衡
在Keepalived的方案中,咱们选用主机广播和谦让两种模式来解决LVS server单点故障问题。oop
2、解决Real server宕机检测问题测试
Real server发生宕机时,必须动态的从LVS server上剔除该宕机的Real server。因此必需要有监控机制。url
1)经过ping的方式查看主机状态:只能证实主机三层网络是否正常,而没法证实业务是否正常。
2)Real server提供一个专用页面,让一个第三方程序定时访问,若是返回200,则表示业务正常。
Keepalived提供第二种方案来确保业务的监控准确性。
3、Keepalived原理
VRRP协议(虚拟路由冗余协议)—— Virtual Router Redundancy Protocol
做用是在LVS server宕机时,将VIP漂移到备机上,能够实现短期内的业务切换。
Keepalived是用户控件的程序,能够代替ipvsadm来直接调用内核LVS提供的接口,因此咱们无需安装ipvsadm软件。可是ipvsadm能够提供一些方面的命令来查看LVS的运行状态,例如ipvsadm -lnc,因此建议安装。
yum install keepalived -y
Keepalived是经过配置文件来进行配置的,咱们无需手工配置LVS,因此在启动keepalived后,全部LVS配置都会自动完成。
# 配置文件 vi /etc/keepalived/keepalived.conf
# 查看日志 tail /var/log/message
3、Keepalived实验
1.系统准备
准备四台干净的虚拟机位于同一局域网,网段为192.168.1.0。
四台虚拟机IP分别是:
LVS server 1:192.168.1.199 (DIP)
LVS server 2:192.168.1.200 (DIP)
Real server 1:192.168.1.201 (RIP)
Real server 2:192.168.1.202 (RIP)
2.为两台LVS server安装ipvsadm(也能够不安装)
yum install ipvsadm -y
# 若是以前手工配置过LVS,则清除配置
ipvsadm -C
3.配置两台Real server
修改配置使其能够隐藏VIP:
# 只响应询问本身地址的请求,不回应位于lo的VIP echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_ignore # 只主动播报与本接口地址匹配的信息 echo 2 > /proc/sys/net/ipv4/conf/eth0/arp_announce # 为全部网卡添加一样配置,好比后续添加新的网卡 echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
给lo配置VIP:
ifconfig lo:2 192.168.1.10 netmask 255.255.255.255
安装httpd:
yum install httpd -y
在httpd的发布目录下写一个简单的页面index.html:
cd /var/www/html vi index.html from real-server-1
启动httpd服务:
systemctl start httpd
4.安装keepalived
为两台LVS server都安装keepalived:
yum install keepalived -y
5.配置keepalived
备份配置文件:
cd /etc/keepalived cp keepalived.conf keepalived.conf.bak
查看配置文件:
[root@lvs-server-2 keepalived]# more keepalived.conf ! Configuration File for keepalived global_defs { notification_email { acassen@firewall.loc failover@firewall.loc sysadmin@firewall.loc } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 192.168.200.1 smtp_connect_timeout 30 router_id LVS_DEVEL vrrp_skip_check_adv_addr vrrp_strict vrrp_garp_interval 0 vrrp_gna_interval 0 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.200.16 192.168.200.17 192.168.200.18 } } virtual_server 192.168.200.100 443 { delay_loop 6 lb_algo rr lb_kind NAT persistence_timeout 50 protocol TCP real_server 192.168.201.100 443 { weight 1 SSL_GET { url { path / digest ff20ad2481f97b1754ef3e12ecd3a9cc } url { path /mrtg/ digest 9b3a0c85a887a256d6939da88aabd8cd } connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } }
咱们能够看到,配置文件中是按模块配置的,主要有三大模块:
global_defs:全局配置,配置一些邮件信息,出现问题的时候能够发邮件给管理者。
vrrp_instance:VRRP冗余协议配置。
virtual_server:虚拟服务配置(也就是配置LVS server的冗余,其中包含后端的多个Real server配置)
vrrp_instance配置部分:
# 这是模板
vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.200.16 192.168.200.17 192.168.200.18 } }
解释:
1.state能够有MASTER和BACKUP两种,例如企业中多台LVS的主备配置不同,配置最好的设置为MASTER,其余设为BACKUP。当Master宕机,则有某台BACKUP接替,能MASTER抢修好了,MASTER会抢回主机角色。
注意:这种主修复好后马上抢回主机角色的模式,主要用于LVS这种四层负载,由于LVS无需保存业务上下文,主备切换的成本很低。主备机器除了一个拥有VIP,一个没有VIP的区别意外,其余都是同样的,主备都是随时监测Real server的状态,实时更新Real server的列表,因此不管是主机仍是备机,他们都能提供正确的服务。
而像nginx那种基于7层业务的负载均衡,则在主备切换的先后须要进行大量元数据的同步,若是主机修复好后,立刻抢回主机的角色,则须要在抢回以前锁定备机,进行元数据同步,此时会致使备机没法提供业务。切换成本很高。
2.interface配置的是心跳广播使用哪一个网卡,能够和业务公用一个网卡,例如eth0,也能够单独走一个网络(使用eth1)。
3.virtual_router_id是一套keepalived的ID,例如一个企业可能有多套keepalived,使用该ID区分,以避免混淆。
4.priority就是优先级,每台机器无论主备都应该是不同的,例如主机是100,备机是9九、9八、97......,用于备机接替主机时谦让选举。
5.advert_int、authentication 是用做验证,无需关心。
6.virtual_ipaddress:192.168.1.10/24 dev eth0 label eth0:8
virtual_ipaddress { 192.168.1.10/24 dev eth0 label eth0:8 }
若是配置不知道怎么配,可使用man查看帮助文档:
man 5 keepalived.conf
在帮助文档中搜索关键字,例如 virtual_ipaddress。
配置完后以下:
# 这是当前实验的配置(LVS server 1) vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.1.10/24 dev eth0 label eth0:8 } }
以上配置提供给Keepalived自动去配置LVS server的VIP,固然只有MASTER才会被真正配置VIP,由于配有VIP的LVS就是提供服务的LVS。
virtual_server配置部分:
virtual_server 192.168.200.100 443 { delay_loop 6 lb_algo rr lb_kind NAT persistence_timeout 50 protocol TCP real_server 192.168.201.100 443 { weight 1 SSL_GET { url { path / digest ff20ad2481f97b1754ef3e12ecd3a9cc } url { path /mrtg/ digest 9b3a0c85a887a256d6939da88aabd8cd } connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } }
解释:
如下几项是对LVS内核的配置:
1.virtual_server 后面的IP地址就是VIP地址以及端口,例如咱们配为192.168.1.10 80。
2.lb_algo是转发模式,rr为轮询。
3.lb_kind负载均衡的模式,咱们选择DR。
4.persistence_timeout:例如某个用户频繁访问,若是该timeout设置为0,则每次访问均可能被分配到不一样的Real server,一次访问Real server可能会为该用户分配内存空间和建立一些对象。若是每次访问都被分配到不一样的Real server,则意味着这些Real server都要为该用户分配一些内存空间和建立一些对象,这样回形成服务器资源的浪费。因此咱们设置该timeout为长一点的时间,在这个时间范围会,一个用户重复访问资源,都将其分到同一个Real server(也就是在LVS上记录他上次分到的Real server,一段时间内的其余请求也发给那个Real server)。正常业务时能够设置为180s。咱们这里配置为0,用于测试。
至此LVS就算配置完了,后面的real_server是对Real server的配置:
1.real_server后面的IP和端口就是真实提供服务的real server的IP以及Httpd的端口,这里是192.168.1.201 80。
2.weight是Real server的权重,若是一台为1,一台为2,则第二台被分配到的连接数量是第一台的两倍。
3. SSL_GET开始,就是Real server的健康检查部分,这里的SSL是对应https协议。咱们这里采用的是http协议,因此修改成HTTP_GET。
4.url是提供给keepalived访问的一个页面,专门用来判断Real server的业务是否正常。这里用index.html页面代替。用返回状态200所谓判断依据。
url { path / status_code 200 }
5.connect_timeout是访问测试页面超时的时间,nb_get_retry是重试次数,delay_before_retry是重试前延迟时间,能够保持默认。
配置完以下:
virtual_server 192.168.1.10 80 { delay_loop 6 lb_algo rr lb_kind DR persistence_timeout 0 protocol TCP real_server 192.168.1.201 80 { weight 1 SSL_GET { url { path / status_code 200 } connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } real_server 192.168.1.202 80 { weight 1 SSL_GET { url { path / status_code 200 } connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } }
这样就将LVS内核以及须要健康检查的两台Real server都配置好了。
除了以上三个主要部分的配置,后面的配置模板能够所有删除了。
配置另外一台LVS server:
将配置好的配置文件拷贝到另一台须要配置得LVS server,只须要修改vrrp_instance 模块中的state,将其修改成BACKUP。并将priority修改成99。
6.启动keepalived
启动MASTER机器的keepalive:
systemctl start httpalived
启动后,咱们查看ifconfig:
[root@lvs-server-1 keepalived]# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.199 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 240e:398:c0:ddc0:20c:29ff:fe86:f385 prefixlen 64 scopeid 0x0<global> inet6 fe80::20c:29ff:fe86:f385 prefixlen 64 scopeid 0x20<link> ether 00:0c:29:86:f3:85 txqueuelen 1000 (Ethernet) RX packets 62538 bytes 65619264 (62.5 MiB) RX errors 0 dropped 6988 overruns 0 frame 0 TX packets 4348 bytes 615699 (601.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0:8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.10 netmask 255.255.255.0 broadcast 0.0.0.0 ether 00:0c:29:86:f3:85 txqueuelen 1000 (Ethernet) lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
发现keepalive为咱们自动建立了VIP(MASTER上)。
启动备机BACKUP的keepalive:
systemctl start keepalived
[root@lvs-server-2 keepalived]# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.200 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 240e:398:c0:ddc0:20c:29ff:fea5:7756 prefixlen 64 scopeid 0x0<global> inet6 fe80::20c:29ff:fea5:7756 prefixlen 64 scopeid 0x20<link> ether 00:0c:29:a5:77:56 txqueuelen 1000 (Ethernet) RX packets 28544 bytes 23854439 (22.7 MiB) RX errors 0 dropped 7251 overruns 0 frame 0 TX packets 3701 bytes 669310 (653.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
咱们发现备机BACKUP启动keepalived后,并无建立VIP,符合咱们的预期。
7.验证负载均衡
使用ipvsadm -ln分别在主备LVS server上查看负载均衡状况:
查询结果一致:
[root@lvs-server-1 keepalived]# ipvsadm -ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.1.10:80 rr -> 192.168.1.201:80 Route 1 0 1 -> 192.168.1.202:80 Route 1 0 1
咱们发现VIP和下挂的Real server都被正确监控。
此时使用浏览器来访问VIP:
发现没法打开页面,咱们ping一下VIP:
发现也没法ping通。
检查keepalived配置文件,发如今global_defs中有一行为vrrp_strict,在两台LVS server上都将其注释掉:
global_defs { notification_email { acassen@firewall.loc failover@firewall.loc sysadmin@firewall.loc } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 192.168.200.1 smtp_connect_timeout 30 router_id LVS_DEVEL vrrp_skip_check_adv_addr # vrrp_strict vrrp_garp_interval 0 vrrp_gna_interval 0 }
而后再次尝试ping,就可以ping通了,再次尝试浏览器访问VIP:
此时,发现可以正常访问了。。
8.测试异常状况
测试四种状况:
1)MASTER宕机时
2)MASTER恢复时
3)Real server宕机时
4)Real server恢复时
当LVS server的MASTER机器宕机时(使用down掉网卡来模拟):
检查备机BACKUP也就是LVS server 2的网卡信息:
[root@lvs-server-2 ~]# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.200 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 240e:398:c0:ddc0:20c:29ff:fea5:7756 prefixlen 64 scopeid 0x0<global> inet6 fe80::20c:29ff:fea5:7756 prefixlen 64 scopeid 0x20<link> ether 00:0c:29:a5:77:56 txqueuelen 1000 (Ethernet) RX packets 4668 bytes 417590 (407.8 KiB) RX errors 0 dropped 1135 overruns 0 frame 0 TX packets 2830 bytes 292478 (285.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0:8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.100 netmask 255.255.255.0 broadcast 0.0.0.0 ether 00:0c:29:a5:77:56 txqueuelen 1000 (Ethernet) lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
发现备机被自动添加了VIP,说明keepalived运行正常。
当LVS MASTER恢复时(使网卡UP):
检查备机BACKUP的网卡信息:
[root@lvs-server-2 ~]# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.200 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 240e:398:c0:ddc0:20c:29ff:fea5:7756 prefixlen 64 scopeid 0x0<global> inet6 fe80::20c:29ff:fea5:7756 prefixlen 64 scopeid 0x20<link> ether 00:0c:29:a5:77:56 txqueuelen 1000 (Ethernet) RX packets 5009 bytes 449806 (439.2 KiB) RX errors 0 dropped 1244 overruns 0 frame 0 TX packets 3127 bytes 315912 (308.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
发现eth:8消失。
检查恢复的主机MASTER的网卡信息:
[root@lvs-server-1 ~]# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.199 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 240e:398:c0:ddc0:20c:29ff:fe86:f385 prefixlen 64 scopeid 0x0<global> inet6 fe80::20c:29ff:fe86:f385 prefixlen 64 scopeid 0x20<link> ether 00:0c:29:86:f3:85 txqueuelen 1000 (Ethernet) RX packets 3271 bytes 313970 (306.6 KiB) RX errors 0 dropped 1063 overruns 0 frame 0 TX packets 3090 bytes 274434 (268.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0:7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.100 netmask 255.255.255.0 broadcast 0.0.0.0 ether 00:0c:29:86:f3:85 txqueuelen 1000 (Ethernet) lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 30 bytes 2520 (2.4 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 30 bytes 2520 (2.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
发现eth0:7被从新设置,也就是VIP漂移回MASTER了。keepalived运行正常。
当Real server 1的Httpd服务宕掉时:
[root@real-server-1 ~]# systemctl stop httpd
检查两个LVS的负载信息:
# MASTER [root@lvs-server-1 ~]# ipvsadm -ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.1.100:80 rr -> 192.168.1.202:80 Route 1 1 0
# BACKUP [root@lvs-server-2 ~]# ipvsadm -ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.1.100:80 rr -> 192.168.1.202:80 Route 1 1 0
咱们发现两个LVS server上Real server列表中都将Real server 1剔除,keepalived运行正常。
当Real server 1的Httpd服务恢复时:
[root@real-server-1 ~]# systemctl start httpd
检查两个LVS的负载信息:
# MASTER [root@lvs-server-1 ~]# ipvsadm -ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.1.100:80 rr -> 192.168.1.201:80 Route 1 1 0 -> 192.168.1.202:80 Route 1 1 0
# BACKUP [root@lvs-server-2 ~]# ipvsadm -ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 192.168.1.100:80 rr -> 192.168.1.201:80 Route 1 0 0 -> 192.168.1.202:80 Route 1 1 0
咱们发现Real server 1又被正确的添加进了列表中,keepalived运行正常。