redis深度剖析: 03 redis读写哨兵模式

时间 2020-02-16

标签 redis 深度剖析读写哨兵模式栏目 Redis 繁體版

原文原文链接

什么叫系统不可用:node

什么是99.99高可用性:正则表达式

高可用计算规则,整年系统可用的时间 / 整年redis

redis不可用是什么?算法

redis主从基于哨兵模式的高可用:安全

哨兵的主要功能:服务器

(1)集群监控,负责监控redis master和slave进程是否正常正常工做架构

(2)消息通知,若是某个redis实例有故障,那么哨兵负责发送消息做为报警通知给管理员异步

(3)故障迁移,若是master node挂掉了,会自动转移到slave node 上ide

(4) 配置中心,若是故障发生了,通知client客户端新的master地址测试

哨兵的核心知识:

(1) 哨兵至少须要 3 个实例,保证本身的健壮性

(2) 哨兵 + redis,是不会保证数据零丢失的,只能保证redis集群的高可用

(3) 哨兵 + redis ,这种复制的架构,尽可能多作容灾演练

为何哨兵最少要三台?:

2台的 majority = 2, (reids 服务器发生问题以后,须要两台都赞成,才能执行迁移操做)

3 台的 majority = 2, (两台哨兵赞成便可)

4 台的 majority = 2, (两台哨兵赞成便可))

5 台的 majority = 3 (三台哨兵赞成便可))

若是哨兵只有两天,呢么其中一台发生问题,即时检查到了redis master发生问题,也不能执行redis迁移(必需要两台都赞成迁移,才能够迁移)

redis哨兵主备切换的数据丢失问题:

1)异步复制:

2) 集群脑裂

解决异步复制和脑裂致使的数据丢失(redis.conf文件中配置):

min-slaves-to-write 1

min-slaves-max-lag 10

要求至少有一个slave,复制和同步数据的延迟不能超过10秒

若是说一旦全部的slave,数据复制和同步延迟都超过了10秒,这时候master将不会接收任何写请求

redis哨兵核心底层原理:

1.sdown 和 odown

sdown和odown两种失败状态

sdown是主观宕机，就一个哨兵若是本身以为一个master宕机了，那么就是主观宕机

odown是客观宕机，若是quorum数量的哨兵都以为一个master宕机了，那么就是客观宕机

sdown达成的条件很简单，若是一个哨兵ping一个master，超过了is-master-down-after-milliseconds指定的毫秒数以后，就主观认为master宕机

sdown到odown转换的条件很简单，若是一个哨兵在指定时间内，收到了quorum指定数量的其余哨兵也认为那个master是sdown了，那么就认为是odown了，客观认为master宕机

2.哨兵集群的自动发现机制

哨兵互相之间的发现，是经过redis的pub/sub系统实现的，每一个哨兵都会往__sentinel__:hello这个channel里发送一个消息，这时候全部其余哨兵均可以消费到这个消息，并感知到其余的哨兵的存在

每隔两秒钟，每一个哨兵都会往本身监控的某个master+slaves对应的__sentinel__:hello channel里发送一个消息，内容是本身的host、ip和runid还有对这个master的监控配置

每一个哨兵也会去监听本身监控的每一个master+slaves对应的__sentinel__:hello channel，而后去感知到一样在监听这个master+slaves的其余哨兵的存在

每一个哨兵还会跟其余哨兵交换对master的监控配置，互相进行监控配置的同步

3.slave配置的自动纠正

哨兵会负责自动纠正slave的一些配置，好比slave若是要成为潜在的master候选人，哨兵会确保slave在复制现有master的数据; 若是slave链接到了一个错误的master上，好比故障转移以后，那么哨兵会确保它们链接到正确的master上

哨兵会自动更改redis的配置文件

4.slave->master选举算法

若是一个master被认为odown了，并且majority哨兵都容许了主备切换，那么某个哨兵就会执行主备切换操做，此时首先要选举一个slave来

会考虑slave的一些信息

（1）跟master断开链接的时长

（2）slave优先级

（3）复制offset

（4）run id

若是一个slave跟master断开链接已经超过了down-after-milliseconds的10倍，外加master宕机的时长，那么slave就被认为不适合选举为master

(down-after-milliseconds * 10) + milliseconds_since_master_is_in_SDOWN_state

接下来会对slave进行排序:

（1）按照slave优先级进行排序，slave priority越低，优先级就越高

（2）若是slave priority相同，那么看replica offset，哪一个slave复制了越多的数据，offset越靠后，优先级就越高

（3）若是上面两个条件都相同，那么选择一个run id比较小的那个slave

5.quorum和majority

每次一个哨兵要作主备切换，首先须要quorum数量的哨兵认为odown，而后选举出一个哨兵来作切换，这个哨兵还得获得majority哨兵的受权，才能正式执行切换

若是quorum < majority，好比5个哨兵，majority就是3，quorum设置为2，那么就3个哨兵受权就能够执行切换

可是若是quorum >= majority，那么必须quorum数量的哨兵都受权，好比5个哨兵，quorum是5，那么必须5个哨兵都赞成受权，才能执行切换

6.configuration epoch

哨兵会对一套redis master+slave进行监控，有相应的监控的配置

执行切换的那个哨兵，会从要切换到的新master（salve->master）那里获得一个configuration epoch，这就是一个version号，每次切换的version号都必须是惟一的

若是第一个选举出的哨兵切换失败了，那么其余哨兵，会等待failover-timeout时间，而后接替继续执行切换，此时会从新获取一个新的configuration epoch，做为新的version号

7.configuraiton传播

哨兵完成切换以后，会在本身本地更新生成最新的master配置，而后同步给其余的哨兵，就是经过以前说的pub/sub消息机制

这里以前的version号就很重要了，由于各类消息都是经过一个channel去发布和监听的，因此一个哨兵完成一次新的切换以后，新的master配置是跟着新的version号的

其余的哨兵都是根据版本号的大小来更新本身的master配置的

配置哨兵模式(主从状态监控)

1. Redis Sentinel搭建

1.1. Redis Sentinel的部署须知

1. 一个稳健的 Redis Sentinel 集群，应该使用至少三个 Sentinel 实例，而且保证讲这些实例放到不一样的机器上，甚至不一样的物理区域。

2. Sentinel 没法保证强一致性。

3. 常见的客户端应用库都支持 Sentinel。

4. Sentinel 须要经过不断的测试和观察，才能保证高可用。

1.2. Redis Sentinel的配置文件

# 哨兵sentinel实例运行的端口，默认26379  
port 26379
# 哨兵sentinel的工做目录
dir ./
# 哨兵sentinel监控的redis主节点的 
## ip：主机ip地址
## port：哨兵端口号
## master-name：能够本身命名的主节点名字（只能由字母A-z、数字0-9 、这三个字符".-_"组成。）
## quorum：当这些quorum个数sentinel哨兵认为master主节点失联 那么这时 客观上认为主节点失联了  
# sentinel monitor <master-name> <ip> <redis-port> <quorum>  
sentinel monitor mymaster 127.0.0.1 6379 2
# 当在Redis实例中开启了requirepass <foobared>，全部链接Redis实例的客户端都要提供密码。
# sentinel auth-pass <master-name> <password>  
sentinel auth-pass mymaster 123456  
# 指定主节点应答哨兵sentinel的最大时间间隔，超过这个时间，哨兵主观上认为主节点下线，默认30秒  
# sentinel down-after-milliseconds <master-name> <milliseconds>
sentinel down-after-milliseconds mymaster 30000  
# 指定了在发生failover主备切换时，最多能够有多少个slave同时对新的master进行同步。这个数字越小，完成failover所需的时间就越长；反之，可是若是这个数字越大，就意味着越多的slave由于replication而不可用。能够经过将这个值设为1，来保证每次只有一个slave，处于不能处理命令请求的状态。
# sentinel parallel-syncs <master-name> <numslaves>
sentinel parallel-syncs mymaster 1  
# 故障转移的超时时间failover-timeout，默认三分钟，能够用在如下这些方面：
## 1. 同一个sentinel对同一个master两次failover之间的间隔时间。  
## 2. 当一个slave从一个错误的master那里同步数据时开始，直到slave被纠正为从正确的master那里同步数据时结束。  
## 3. 当想要取消一个正在进行的failover时所须要的时间。
## 4.当进行failover时，配置全部slaves指向新的master所需的最大时间。不过，即便过了这个超时，slaves依然会被正确配置为指向master，可是就不按parallel-syncs所配置的规则来同步数据了
# sentinel failover-timeout <master-name> <milliseconds>  
sentinel failover-timeout mymaster 180000
# 当sentinel有任何警告级别的事件发生时（好比说redis实例的主观失效和客观失效等等），将会去调用这个脚本。一个脚本的最大执行时间为60s，若是超过这个时间，脚本将会被一个SIGKILL信号终止，以后从新执行。
# 对于脚本的运行结果有如下规则：  
## 1. 若脚本执行后返回1，那么该脚本稍后将会被再次执行，重复次数目前默认为10。
## 2. 若脚本执行后返回2，或者比2更高的一个返回值，脚本将不会重复执行。  
## 3. 若是脚本在执行过程当中因为收到系统中断信号被终止了，则同返回值为1时的行为相同。
# sentinel notification-script <master-name> <script-path>  
sentinel notification-script mymaster /var/redis/notify.sh
# 这个脚本应该是通用的，能被屡次调用，不是针对性的。
# sentinel client-reconfig-script <master-name> <script-path>
sentinel client-reconfig-script mymaster /var/redis/reconfig.sh

1.3. Redis Sentinel的节点规划

角色	IP地址	端口号
Redis Master	10.206.20.231	16379
Redis Slave1	10.206.20.231	26379
Redis Slave2	10.206.20.231	36379
Redis Sentinel1	10.206.20.231	16380
Redis Sentinel2	10.206.20.231	26380
Redis Sentinel3	10.206.20.231	36380

1.4. Redis Sentinel的配置搭建

搭建reids-server 集群:

1.4.1. Redis-Server的配置管理

分别拷贝三份 redis.conf 文件到 /usr/local/redis-sentinel 目录下面。三个配置文件分别对应 master、slave1 和 slave2 三个 Redis 节点的启动配置。

$ sudo cp /usr/local/redis-4.0.11/redis.conf /usr/local/redis-sentinel/redis-16379.conf

$ sudo cp /usr/local/redis-4.0.11/redis.conf /usr/local/redis-sentinel/redis-26379.conf

$ sudo cp /usr/local/redis-4.0.11/redis.conf /usr/local/redis-sentinel/redis-36379.conf

分别修改三份配置文件以下：

主节点：redis-16379.conf

daemonize yes
pidfile /var/run/redis-16379.pid
logfile /var/log/redis/redis-16379.log
port 16379
bind 0.0.0.0
timeout 300
databases 16
dbfilename dump-16379.db
dir ./redis-workdir
masterauth 123456
requirepass 123456

从节点1：redis-26379.conf

daemonize yes
pidfile /var/run/redis-26379.pid
logfile /var/log/redis/redis-26379.log
port 26379
bind 0.0.0.0
timeout 300
databases 16
dbfilename dump-26379.db
dir ./redis-workdir
masterauth 123456
requirepass 123456
slaveof 127.0.0.1 16379

从节点2：redis-36379.conf

daemonize yes
pidfile /var/run/redis-36379.pid
logfile /var/log/redis/redis-36379.log
port 36379
bind 0.0.0.0
timeout 300
databases 16
dbfilename dump-36379.db
dir ./redis-workdir
masterauth 123456
requirepass 123456
slaveof 127.0.0.1 16379

若是要作自动故障转移，建议全部的 redis.conf 都设置 masterauth。由于自动故障只会重写主从关系，即 slaveof，不会自动写入 masterauth。若是 Redis 本来没有设置密码，则能够忽略。

1.4.2. Redis-Server启动验证

按顺序分别启动 16379，26379 和 36379 三个 Redis 节点，启动命令和启动日志以下：

Redis 的启动命令：

$ sudo redis-server /usr/local/redis-sentinel/redis-16379.conf

$ sudo redis-server /usr/local/redis-sentinel/redis-26379.conf

$ sudo redis-server /usr/local/redis-sentinel/redis-36379.conf

查看 Redis 的启动进程：

$ ps -ef | grep redis-server

0 7127 1 0 2:16下午 ?? 0:01.84 redis-server 0.0.0.0:16379

0 7133 1 0 2:16下午 ?? 0:01.73 redis-server 0.0.0.0:26379

0 7137 1 0 2:16下午 ?? 0:01.70 redis-server 0.0.0.0:36379

查看 Redis 的启动日志：

节点 redis-16379

$ cat /var/log/redis/redis-16379.log 
C 22 Aug 14:16:38.907 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
C 22 Aug 14:16:38.908 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=7126, just started
C 22 Aug 14:16:38.908 # Configuration loaded
M 22 Aug 14:16:38.910 * Increased maximum number of open files to 10032 (it was originally set to 256).
M 22 Aug 14:16:38.912 * Running mode=standalone, port=16379.
M 22 Aug 14:16:38.913 # Server initialized
M 22 Aug 14:16:38.913 * Ready to accept connections
M 22 Aug 14:16:48.416 * Slave 127.0.0.1:26379 asks for synchronization
M 22 Aug 14:16:48.416 * Full resync requested by slave 127.0.0.1:26379
M 22 Aug 14:16:48.416 * Starting BGSAVE for SYNC with target: disk
M 22 Aug 14:16:48.416 * Background saving started by pid 7134
C 22 Aug 14:16:48.433 * DB saved on disk
M 22 Aug 14:16:48.487 * Background saving terminated with success
M 22 Aug 14:16:48.494 * Synchronization with slave 127.0.0.1:26379 succeeded
M 22 Aug 14:16:51.848 * Slave 127.0.0.1:36379 asks for synchronization
M 22 Aug 14:16:51.849 * Full resync requested by slave 127.0.0.1:36379
M 22 Aug 14:16:51.849 * Starting BGSAVE for SYNC with target: disk
M 22 Aug 14:16:51.850 * Background saving started by pid 7138
C 22 Aug 14:16:51.862 * DB saved on disk
M 22 Aug 14:16:51.919 * Background saving terminated with success
M 22 Aug 14:16:51.923 * Synchronization with slave 127.0.0.1:36379 succeeded

如下两行日志日志代表，redis-16379 做为 Redis 的主节点，redis-26379 和 redis-36379 做为从节点，从主节点同步数据。

7127:M 22 Aug 14:16:48.416 * Slave 127.0.0.1:26379 asks for synchronization

7127:M 22 Aug 14:16:51.848 * Slave 127.0.0.1:36379 asks for synchronization

节点 redis-26379

$ cat /var/log/redis/redis-26379.log 
C 22 Aug 14:16:48.407 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
C 22 Aug 14:16:48.408 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=7132, just started
C 22 Aug 14:16:48.408 # Configuration loaded
S 22 Aug 14:16:48.410 * Increased maximum number of open files to 10032 (it was originally set to 256).
S 22 Aug 14:16:48.412 * Running mode=standalone, port=26379.
S 22 Aug 14:16:48.413 # Server initialized
S 22 Aug 14:16:48.413 * Ready to accept connections
S 22 Aug 14:16:48.413 * Connecting to MASTER 127.0.0.1:16379
S 22 Aug 14:16:48.413 * MASTER <-> SLAVE sync started
S 22 Aug 14:16:48.414 * Non blocking connect for SYNC fired the event.
S 22 Aug 14:16:48.414 * Master replied to PING, replication can continue...
S 22 Aug 14:16:48.415 * Partial resynchronization not possible (no cached master)
S 22 Aug 14:16:48.417 * Full resync from master: 211d3b4eceaa3af4fe5c77d22adf06e1218e0e7b:0
S 22 Aug 14:16:48.494 * MASTER <-> SLAVE sync: receiving 176 bytes from master
S 22 Aug 14:16:48.495 * MASTER <-> SLAVE sync: Flushing old data
S 22 Aug 14:16:48.496 * MASTER <-> SLAVE sync: Loading DB in memory
S 22 Aug 14:16:48.498 * MASTER <-> SLAVE sync: Finished with success

节点 redis-36379

$ cat /var/log/redis/redis-36379.log 
C 22 Aug 14:16:51.839 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
C 22 Aug 14:16:51.840 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=7136, just started
C 22 Aug 14:16:51.841 # Configuration loaded
S 22 Aug 14:16:51.843 * Increased maximum number of open files to 10032 (it was originally set to 256).
S 22 Aug 14:16:51.845 * Running mode=standalone, port=36379.
S 22 Aug 14:16:51.845 # Server initialized
S 22 Aug 14:16:51.846 * Ready to accept connections
S 22 Aug 14:16:51.846 * Connecting to MASTER 127.0.0.1:16379
S 22 Aug 14:16:51.847 * MASTER <-> SLAVE sync started
S 22 Aug 14:16:51.847 * Non blocking connect for SYNC fired the event.
S 22 Aug 14:16:51.847 * Master replied to PING, replication can continue...
S 22 Aug 14:16:51.848 * Partial resynchronization not possible (no cached master)
S 22 Aug 14:16:51.850 * Full resync from master: 211d3b4eceaa3af4fe5c77d22adf06e1218e0e7b:14
S 22 Aug 14:16:51.923 * MASTER <-> SLAVE sync: receiving 176 bytes from master
S 22 Aug 14:16:51.923 * MASTER <-> SLAVE sync: Flushing old data
S 22 Aug 14:16:51.924 * MASTER <-> SLAVE sync: Loading DB in memory
S 22 Aug 14:16:51.927 * MASTER <-> SLAVE sync: Finished with success

配置Sentinel:

1.4.3. Sentinel的配置管理

分别拷贝三份 redis-sentinel.conf 文件到 /usr/local/redis-sentinel 目录下面。三个配置文件分别对应 master、slave1 和 slave2 三个 Redis 节点的哨兵配置。

$ sudo cp /usr/local/redis-4.0.11/sentinel.conf /usr/local/redis-sentinel/sentinel-16380.conf

$ sudo cp /usr/local/redis-4.0.11/sentinel.conf /usr/local/redis-sentinel/sentinel-26380.conf

$ sudo cp /usr/local/redis-4.0.11/sentinel.conf /usr/local/redis-sentinel/sentinel-36380.conf

节点1：sentinel-16380.conf

protected-mode no
bind 0.0.0.0
port 16380
daemonize yes
sentinel monitor master 127.0.0.1 16379 2
sentinel down-after-milliseconds master 5000
sentinel failover-timeout master 180000
sentinel parallel-syncs master 1
sentinel auth-pass master 123456
logfile /var/log/redis/sentinel-16380.log

节点2：sentinel-26380.conf

protected-mode no
bind 0.0.0.0
port 26380
daemonize yes
sentinel monitor master 127.0.0.1 16379 2
sentinel down-after-milliseconds master 5000
sentinel failover-timeout master 180000
sentinel parallel-syncs master 1
sentinel auth-pass master 123456
logfile /var/log/redis/sentinel-26380.log

节点3：sentinel-36380.conf

protected-mode no
bind 0.0.0.0
port 36380
daemonize yes
sentinel monitor master 127.0.0.1 16379 2
sentinel down-after-milliseconds master 5000
sentinel failover-timeout master 180000
sentinel parallel-syncs master 1
sentinel auth-pass master 123456
logfile /var/log/redis/sentinel-36380.log

1.4.4. Sentinel启动验证

按顺序分别启动 16380，26380 和 36380 三个 Sentinel 节点，启动命令和启动日志以下：

$ sudo redis-sentinel /usr/local/redis-sentinel/sentinel-16380.conf

$ sudo redis-sentinel /usr/local/redis-sentinel/sentinel-26380.conf

$ sudo redis-sentinel /usr/local/redis-sentinel/sentinel-36380.conf

查看 Sentinel 的启动进程：

$ ps -ef | grep redis-sentinel

0 7954 1 0 3:30下午 ?? 0:00.05 redis-sentinel 0.0.0.0:16380 [sentinel]

0 7957 1 0 3:30下午 ?? 0:00.05 redis-sentinel 0.0.0.0:26380 [sentinel]

0 7960 1 0 3:30下午 ?? 0:00.04 redis-sentinel 0.0.0.0:36380 [sentinel]

查看 Sentinel 的启动日志：

节点 sentinel-16380

$ cat /var/log/redis/sentinel-16380.log 
X 22 Aug 15:30:27.245 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
X 22 Aug 15:30:27.245 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=7953, just started
X 22 Aug 15:30:27.245 # Configuration loaded
X 22 Aug 15:30:27.247 * Increased maximum number of open files to 10032 (it was originally set to 256).
X 22 Aug 15:30:27.249 * Running mode=sentinel, port=16380.
X 22 Aug 15:30:27.250 # Sentinel ID is 69d05b86a82102a8919231fd3c2d1f21ce86e000
X 22 Aug 15:30:27.250 # +monitor master master 127.0.0.1 16379 quorum 2
X 22 Aug 15:30:32.286 # +sdown sentinel fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7 127.0.0.1 36380 @ master 127.0.0.1 16379
X 22 Aug 15:30:34.588 # -sdown sentinel fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7 127.0.0.1 36380 @ master 127.0.0.1 16379

sentinel-16380 节点的 Sentinel ID 为 69d05b86a82102a8919231fd3c2d1f21ce86e000，并经过 Sentinel ID 把自身加入 sentinel 集群中。

节点 sentinel-26380

$ cat /var/log/redis/sentinel-26380.log 
X 22 Aug 15:30:30.900 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
X 22 Aug 15:30:30.901 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=7956, just started
X 22 Aug 15:30:30.901 # Configuration loaded
X 22 Aug 15:30:30.904 * Increased maximum number of open files to 10032 (it was originally set to 256).
X 22 Aug 15:30:30.905 * Running mode=sentinel, port=26380.
X 22 Aug 15:30:30.906 # Sentinel ID is 21e30244cda6a3d3f55200bcd904d0877574e506
X 22 Aug 15:30:30.906 # +monitor master master 127.0.0.1 16379 quorum 2
X 22 Aug 15:30:30.907 * +slave slave 127.0.0.1:26379 127.0.0.1 26379 @ master 127.0.0.1 16379
X 22 Aug 15:30:30.911 * +slave slave 127.0.0.1:36379 127.0.0.1 36379 @ master 127.0.0.1 16379
X 22 Aug 15:30:36.311 * +sentinel sentinel fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7 127.0.0.1 36380 @ master 127.0.0.1 16379

sentinel-26380 节点的 Sentinel ID 为 21e30244cda6a3d3f55200bcd904d0877574e506，并经过 Sentinel ID 把自身加入 sentinel 集群中。此时 sentinel 集群中已有 sentinel-16380 和 sentinel-26380 两个节点。

节点 sentinel-36380

$ cat /var/log/redis/sentinel-36380.log 
X 22 Aug 15:30:34.273 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
X 22 Aug 15:30:34.274 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=7959, just started
X 22 Aug 15:30:34.274 # Configuration loaded
X 22 Aug 15:30:34.276 * Increased maximum number of open files to 10032 (it was originally set to 256).
X 22 Aug 15:30:34.277 * Running mode=sentinel, port=36380.
X 22 Aug 15:30:34.278 # Sentinel ID is fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7
X 22 Aug 15:30:34.278 # +monitor master master 127.0.0.1 16379 quorum 2
X 22 Aug 15:30:34.279 * +slave slave 127.0.0.1:26379 127.0.0.1 26379 @ master 127.0.0.1 16379
X 22 Aug 15:30:34.283 * +slave slave 127.0.0.1:36379 127.0.0.1 36379 @ master 127.0.0.1 16379
X 22 Aug 15:30:34.993 * +sentinel sentinel 21e30244cda6a3d3f55200bcd904d0877574e506 127.0.0.1 26380 @ master 127.0.0.1 16379

sentinel-36380 节点的 Sentinel ID 为 fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7，并经过 Sentinel ID 把自身加入 sentinel 集群中。此时 sentinel 集群中已有 sentinel-16380，sentinel-26380 和 sentinel-36380 三个节点。

1.4.5. Sentinel配置刷新(这里注意一下,sentinel的配置文件是自动刷新的)

sentinel-16380.conf 文件新生成以下的配置项：

# Generated by CONFIG REWRITE
dir "/usr/local/redis-sentinel"
sentinel config-epoch master 0
sentinel leader-epoch master 0
sentinel known-slave master 127.0.0.1 36379
sentinel known-slave master 127.0.0.1 26379
sentinel known-sentinel master 127.0.0.1 26380 21e30244cda6a3d3f55200bcd904d0877574e506
sentinel known-sentinel master 127.0.0.1 36380 fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7
sentinel current-epoch 0

能够注意到，sentinel-16380.conf 刷新写入了 Redis 主节点关联的全部从节点 redis-26379 和 redis-36379，同时写入了其他两个 Sentinel 节点 sentinel-26380 和 sentinel-36380 的 IP 地址，端口号和 Sentinel ID。

# Generated by CONFIG REWRITE
dir "/usr/local/redis-sentinel"
sentinel config-epoch master 0
sentinel leader-epoch master 0
sentinel known-slave master 127.0.0.1 26379
sentinel known-slave master 127.0.0.1 36379
sentinel known-sentinel master 127.0.0.1 36380 fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7
sentinel known-sentinel master 127.0.0.1 16380 69d05b86a82102a8919231fd3c2d1f21ce86e000
sentinel current-epoch 0

能够注意到，sentinel-26380.conf 刷新写入了 Redis 主节点关联的全部从节点 redis-26379 和 redis-36379，同时写入了其他两个 Sentinel 节点 sentinel-36380 和 sentinel-16380 的 IP 地址，端口号和 Sentinel ID。

# Generated by CONFIG REWRITE
dir "/usr/local/redis-sentinel"
sentinel config-epoch master 0
sentinel leader-epoch master 0
sentinel known-slave master 127.0.0.1 36379
sentinel known-slave master 127.0.0.1 26379
sentinel known-sentinel master 127.0.0.1 16380 69d05b86a82102a8919231fd3c2d1f21ce86e000
sentinel known-sentinel master 127.0.0.1 26380 21e30244cda6a3d3f55200bcd904d0877574e506
sentinel current-epoch 0

能够注意到，sentinel-36380.conf 刷新写入了 Redis 主节点关联的全部从节点 redis-26379 和 redis-36379，同时写入了其他两个 Sentinel 节点 sentinel-16380 和 sentinel-26380 的 IP 地址，端口号和 Sentinel ID。

1.5. Sentinel时客户端命令

检查其余 Sentinel 节点的状态，返回 PONG 为正常。

> PING sentinel

显示被监控的全部主节点以及它们的状态。

> SENTINEL masters

显示指定主节点的信息和状态。

> SENTINEL master <master_name>

显示指定主节点的全部从节点以及它们的状态。

> SENTINEL slaves <master_name>

返回指定主节点的 IP 地址和端口。若是正在进行 failover 或者 failover 已经完成，将会显示被提高为主节点的从节点的 IP 地址和端口。

> SENTINEL get-master-addr-by-name <master_name>

重置名字匹配该正则表达式的全部的主节点的状态信息，清除它以前的状态信息，以及从节点的信息。

> SENTINEL reset <pattern>

强制当前 Sentinel 节点执行 failover，而且不须要获得其余 Sentinel 节点的赞成。可是 failover 后会将最新的配置发送给其余 Sentinel 节点。

>SENTINEL failover <master_name>

2. Redis Sentinel故障切换与恢复

2.1. Redis CLI客户端跟踪

上面的日志显示，redis-16379 节点为主节点，它的进程 ID 为 7127。为了模拟 Redis 主节点故障，强制杀掉这个进程。

$ kill -9 7127

使用 redis-cli 客户端命令进入 sentinel-16380 节点，查看 Redis 节点的状态信息。

$ redis-cli -p 16380

查看 Redis 主从集群的主节点信息。能够发现 redis-26379 晋升为新的主节点。

127.0.0.1:16380> SENTINEL master master
 1) "name"
 2) "master"
 3) "ip"
 4) "127.0.0.1"
 5) "port"
 6) "26379"
 7) "runid"
 8) "b8ca3b468a95d1be5efe1f50c50636cafe48c59f"
 9) "flags"
10) "master"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "588"
19) "last-ping-reply"
20) "588"
21) "down-after-milliseconds"
22) "5000"
23) "info-refresh"
24) "9913"
25) "role-reported"
26) "master"
27) "role-reported-time"
28) "663171"
29) "config-epoch"
30) "1"
31) "num-slaves"
32) "2"
33) "num-other-sentinels"
34) "2"
35) "quorum"
36) "2"
37) "failover-timeout"
38) "180000"
39) "parallel-syncs"
40) "1"

2.2. Redis Sentinel日志跟踪

查看任意 Sentinel 节点的日志以下：

X 22 Aug 18:40:22.504 # +tilt #tilt mode entered
X 22 Aug 18:40:32.197 # +tilt #tilt mode entered
X 22 Aug 18:41:02.241 # -tilt #tilt mode exited
X 22 Aug 18:48:24.550 # +sdown master master 127.0.0.1 16379
X 22 Aug 18:48:24.647 # +new-epoch 1
X 22 Aug 18:48:24.651 # +vote-for-leader fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7 1
X 22 Aug 18:48:25.678 # +odown master master 127.0.0.1 16379 #quorum 3/2
X 22 Aug 18:48:25.678 # Next failover delay: I will not start a failover before Wed Aug 22 18:54:24 2018
X 22 Aug 18:48:25.709 # +config-update-from sentinel fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7 127.0.0.1 36380 @ master 127.0.0.1 16379
X 22 Aug 18:48:25.710 # +switch-master master 127.0.0.1 16379 127.0.0.1 26379
X 22 Aug 18:48:25.710 * +slave slave 127.0.0.1:36379 127.0.0.1 36379 @ master 127.0.0.1 26379
X 22 Aug 18:48:25.711 * +slave slave 127.0.0.1:16379 127.0.0.1 16379 @ master 127.0.0.1 26379
X 22 Aug 18:48:30.738 # +sdown slave 127.0.0.1:16379 127.0.0.1 16379 @ master 127.0.0.1 26379
X 22 Aug 19:38:23.479 # -sdown slave 127.0.0.1:16379 127.0.0.1 16379 @ master 127.0.0.1 26379

分析日志，能够发现 redis-16329 节点先进入 sdown 主观下线状态。

+sdown master master 127.0.0.1 16379

哨兵检测到 redis-16329 出现故障，Sentinel 进入一个新纪元，从 0 变为 1。

+new-epoch 1

三个 Sentinel 节点开始协商主节点的状态，判断其是否须要客观下线。

+vote-for-leader fd166dc66425dc1d9e2670e1f17cb94fe05f5fc7 1

超过 quorum 个数的 Sentinel 节点认为主节点出现故障，redis-16329 节点进入客观下线状态。

+odown master master 127.0.0.1 16379 #quorum 3/2

Sentinal 进行自动故障切换，协商选定 redis-26329 节点做为新的主节点。

+switch-master master 127.0.0.1 16379 127.0.0.1 26379

redis-36329 节点和已经客观下线的 redis-16329 节点成为 redis-26479 的从节点。

7954:X 22 Aug 18:48:25.710 * +slave slave 127.0.0.1:36379 127.0.0.1 36379 @ master 127.0.0.1 26379

7954:X 22 Aug 18:48:25.711 * +slave slave 127.0.0.1:16379 127.0.0.1 16379 @ master 127.0.0.1 26379

2.3. Redis的配置文件(自动刷新的)

分别查看三个 redis 节点的配置文件，发生主从切换时 redis.conf 的配置会自动发生刷新。

节点 redis-16379

daemonize yes
pidfile "/var/run/redis-16379.pid"
logfile "/var/log/redis/redis-16379.log"
port 16379
bind 0.0.0.0
timeout 300
databases 16
dbfilename "dump-16379.db"
dir "/usr/local/redis-sentinel/redis-workdir"
masterauth "123456"
requirepass "123456"

节点 redis-26379

daemonize yes
pidfile "/var/run/redis-26379.pid"
logfile "/var/log/redis/redis-26379.log"
port 26379
bind 0.0.0.0
timeout 300
databases 16
dbfilename "dump-26379.db"
dir "/usr/local/redis-sentinel/redis-workdir"
masterauth "123456"
requirepass "123456"

节点 redis-36379

daemonize yes
pidfile "/var/run/redis-36379.pid"
logfile "/var/log/redis/redis-36379.log"
port 36379
bind 0.0.0.0
timeout 300
databases 16
dbfilename "dump-36379.db"
dir "/usr/local/redis-sentinel/redis-workdir"
masterauth "123456"
requirepass "123456"
slaveof 127.0.0.1 26379

分析：redis-26379 节点 slaveof 配置被移除，晋升为主节点。redis-16379 节点处于宕机状态。redis-36379 的 slaveof 配置更新为 127.0.0.1 redis-26379，成为 redis-26379 的从节点。

重启节点 redis-16379。待正常启动后，再次查看它的 redis.conf 文件，配置以下：

daemonize yes
pidfile "/var/run/redis-16379.pid"
logfile "/var/log/redis/redis-16379.log"
port 16379
bind 0.0.0.0
timeout 300
databases 16
dbfilename "dump-16379.db"
dir "/usr/local/redis-sentinel/redis-workdir"
masterauth "123456"
requirepass "123456"
# Generated by CONFIG REWRITE
slaveof 127.0.0.1 26379

节点 redis-16379 的配置文件新增一行 slaveof 配置属性，指向 redis-26379，即成为新的主节点的从节点。

小结

本文首先对 Redis 实现高可用的几种模式作出了阐述，指出了 Redis 主从复制的不足之处，进一步引入了 Redis Sentinel 哨兵模式的相关概念，深刻说明了 Redis Sentinel 的具体功能，基本原理，高可用搭建和自动故障切换验证等。

固然，Redis Sentinel 仅仅解决了高可用的问题，对于主节点单点写入和单节点没法扩容等问题，还须要引入 Redis Cluster 集群模式予以解决。

3.哨兵节点管理

3.1 哨兵节点的增长和删除

增长sentinal，会自动发现

删除sentinal的步骤

（1）中止sentinal进程

（2）SENTINEL RESET *，在全部sentinal上执行，清理全部的master状态

（3）SENTINEL MASTER mastername，在全部sentinal上执行，查看全部sentinal对数量是否达成了一致

3.2 slave的永久下线

让master摘除某个已经下线的slave：SENTINEL RESET mastername，在全部的哨兵上面执行

3.3 slave切换为Master的优先级

slave->master选举优先级：slave-priority，值越小优先级越高

3.4 基于哨兵集群架构下的安全认证

每一个slave都有可能切换成master，因此每一个实例都要配置两个指令

master上启用安全认证，requirepass

master链接口令，masterauth

sentinal，sentinel auth-pass <master-group-name> <pass>

3.5 容灾演练

经过哨兵看一下当前的master：SENTINEL get-master-addr-by-name mymaster

把master节点kill -9掉，pid文件也删除掉

查看sentinal的日志，是否出现+sdown字样，识别出了master的宕机问题; 而后出现+odown字样，就是指定的quorum哨兵数量，都认为master宕机了

（1）三个哨兵进程都认为master是sdown了

（2）超过quorum指定的哨兵进程都认为sdown以后，就变为odown

（3）哨兵1是被选举为要执行后续的主备切换的那个哨兵

（4）哨兵1去新的master（slave）获取了一个新的config version

（5）尝试执行failover

（6）投票选举出一个slave区切换成master，每一个哨兵都会执行一次投票

（7）让salve，slaveof noone，不让它去作任何节点的slave了; 把slave提拔成master; 旧的master认为再也不是master了

（8）哨兵就自动认为以前的187:6379变成了slave了，19:6379变成了master了

（9）哨兵去探查了一下187:6379这个salve的状态，认为它sdown了

全部哨兵选举出了一个，来执行主备切换操做

若是哨兵的majority都存活着，那么就会执行主备切换操做

再经过哨兵看一下master：SENTINEL get-master-addr-by-name mymaster

尝试链接一下新的master

故障恢复，再将旧的master从新启动，查看是否被哨兵自动切换成slave节点

（1）手动杀掉master

（2）哨兵可否执行主备切换，将slave切换为master

（3）哨兵完成主备切换后，新的master可否使用

（4）故障恢复，将旧的master从新启动

（5）哨兵可否自动将旧的master变为slave，挂接到新的master上面去，并且也是可使用的

主从架构,单master的瓶颈:

可以存储多少数据量,受限于master

怎么横向扩展,master:

咱们能够作多个redis 主从集群,作业务的分流,不一样的请求,访问到不一样的master

redis cluster帮咱们作好了这样的功能:

能够支撑N个redis master node,每一个master node均可以挂载多个slave node

redis cluster (多master + 读写分离 + 高可用)

redis cluster 是用来支撑海量数据的