本文介绍使用开源的repmgr组件配置PostgreSQL 12的replication以及failover。node
全部节点安装PostgreSQL 12以及repmgr软件包。sql
[root@hwd04 ~]# dnf -y install https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm [root@hwd04 ~]# dnf -qy module disable postgresql [root@hwd04 ~]# dnf install postgresql12-server postgresql12-contrib repmgr12
[root@hwd04 ~]# /usr/pgsql-12/bin/postgresql-12-setup initdb Initializing database ... OK
[root@hwd04 ~]# vi /var/lib/pgsql/12/data/postgresql.conf listen_addresses = '*' max_wal_senders = 10 max_replication_slots = 10 wal_level = 'replica' wal_log_hints = on hot_standby = on archive_mode = on archive_command = '/bin/true'
重启PostgreSQL服务:数据库
[root@hwd04 ~]# systemctl enable postgresql-12.service [root@hwd04 ~]# systemctl restart postgresql-12.service
[root@hwd04 ~]# su - postgres [postgres@hwd04 ~]$ createuser --superuser repmgr [postgres@hwd04 ~]$ createdb --owner=repmgr repmgr [postgres@hwd04 ~]$ psql -c "ALTER USER repmgr SET search_path TO repmgr, public;"
编辑postgresql.conf文件,加入如下内容,表示当pg启动的时候载入repmgr组件:app
[root@hwd04 ~]# vi /var/lib/pgsql/12/data/postgresql.conf shared_preload_libraries = 'repmgr'
repmgr默认的配置文件路径为/etc/repmgr/12/repmgr.conf,主备节点分别加入如下内容。ide
--hwd04(primary) [root@hwd04 ~]# vi /etc/repmgr/12/repmgr.conf node_id=1 node_name='hwd04' conninfo='host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2' data_directory='/var/lib/pgsql/12/data' --hwd05(standby) [root@hwd05 ~]# vi /etc/repmgr/12/repmgr.conf node_id=2 node_name='hwd05' conninfo='host=192.168.120.26 user=repmgr dbname=repmgr connect_timeout=2' data_directory='/var/lib/pgsql/12/data' --hwd06(standby) [root@hwd06 ~]# vi /etc/repmgr/12/repmgr.conf node_id=3 node_name='hwd06' conninfo='host=192.168.120.27 user=repmgr dbname=repmgr connect_timeout=2' data_directory='/var/lib/pgsql/12/data'
#For Replication local replication repmgr trust host replication repmgr 127.0.0.1/32 trust host replication repmgr 192.168.120.0/24 trust local repmgr repmgr trust host repmgr repmgr 127.0.0.1/32 trust host repmgr repmgr 192.168.120.0/24 trust
重启pg服务:post
[root@hwd04 ~]# systemctl restart postgresql-12.service
standby节点验证是否能够访问primary节点:测试
[postgres@hwd05 ~]$ psql 'host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2' psql (12.3) Type "help" for help. repmgr=# \q [postgres@hwd06 ~]$ psql 'host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2' psql (12.3) Type "help" for help. repmgr=# \q
[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf primary register INFO: connecting to primary database... NOTICE: attempting to install extension "repmgr" NOTICE: "repmgr" extension successfully installed NOTICE: primary node record (ID: 1) registered
注册完成后,使用下面的命令验证集群状态:ui
在正式克隆以前,能够先进行预演,若是没有报错正式进行克隆,不然根据预演的报错信息,排查完成后,进行正式克隆。this
[postgres@hwd05 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.25 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone --dry-run NOTICE: destination directory "/var/lib/pgsql/12/data" provided INFO: connecting to source node DETAIL: connection string is: host=192.168.120.25 user=repmgr dbname=repmgr DETAIL: current installation size is 31 MB INFO: "repmgr" extension is installed in database "repmgr" INFO: parameter "max_wal_senders" set to 10 NOTICE: checking for available walsenders on the source node (2 required) INFO: sufficient walsenders available on the source node DETAIL: 2 required, 10 available NOTICE: checking replication connections can be made to the source server (2 required) INFO: required number of replication connections could be made to the source server DETAIL: 2 replication connections required WARNING: data checksums are not enabled and "wal_log_hints" is "off" DETAIL: pg_rewind requires "wal_log_hints" to be enabled NOTICE: standby will attach to upstream node 1 HINT: consider using the -c/--fast-checkpoint option INFO: all prerequisites for "standby clone" are met [postgres@hwd06 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.25 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone --dry-run NOTICE: destination directory "/var/lib/pgsql/12/data" provided INFO: connecting to source node DETAIL: connection string is: host=192.168.120.25 user=repmgr dbname=repmgr DETAIL: current installation size is 31 MB INFO: "repmgr" extension is installed in database "repmgr" INFO: parameter "max_wal_senders" set to 10 NOTICE: checking for available walsenders on the source node (2 required) INFO: sufficient walsenders available on the source node DETAIL: 2 required, 10 available NOTICE: checking replication connections can be made to the source server (2 required) INFO: required number of replication connections could be made to the source server DETAIL: 2 replication connections required WARNING: data checksums are not enabled and "wal_log_hints" is "off" DETAIL: pg_rewind requires "wal_log_hints" to be enabled NOTICE: standby will attach to upstream node 1 HINT: consider using the -c/--fast-checkpoint option INFO: all prerequisites for "standby clone" are met
有N个Standby节点,就执行N次standby克隆操做。操作系统
[postgres@hwd05 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.25 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone NOTICE: destination directory "/var/lib/pgsql/12/data" provided INFO: connecting to source node DETAIL: connection string is: host=192.168.120.25 user=repmgr dbname=repmgr DETAIL: current installation size is 31 MB NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: checking and correcting permissions on existing directory "/var/lib/pgsql/12/data" NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/pgsql/12/data -h 192.168.120.25 -p 5432 -U repmgr -X stream NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /var/lib/pgsql/12/data start HINT: after starting the server, you need to register this standby with "repmgr standby register" [postgres@hwd06 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.25 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone NOTICE: destination directory "/var/lib/pgsql/12/data" provided INFO: connecting to source node DETAIL: connection string is: host=192.168.120.25 user=repmgr dbname=repmgr DETAIL: current installation size is 31 MB NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: checking and correcting permissions on existing directory "/var/lib/pgsql/12/data" NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/pgsql/12/data -h 192.168.120.25 -p 5432 -U repmgr -X stream NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /var/lib/pgsql/12/data start HINT: after starting the server, you need to register this standby with "repmgr standby register"
克隆完成后,启动各个standby节点的PostgreSQL服务:
[root@hwd05 ~]# systemctl enable postgresql-12.service [root@hwd05 ~]# systemctl restart postgresql-12.service
[postgres@hwd05 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby register INFO: connecting to local node "hwd05" (ID: 2) INFO: connecting to primary database WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1) INFO: standby registration complete NOTICE: standby node "hwd05" (ID: 2) successfully registered [postgres@hwd06 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby register INFO: connecting to local node "hwd06" (ID: 3) INFO: connecting to primary database WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1) INFO: standby registration complete NOTICE: standby node "hwd06" (ID: 3) successfully registered
注册完成后,检查集群状态:
[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf cluster show --compact
到此,整个流复制服务配置完成。
[root@hwd12 ~]# /usr/pgsql-12/bin/postgresql-12-setup initdb Initializing database ... OK [root@hwd12 ~]# vi /var/lib/pgsql/12/data/postgresql.conf listen_addresses = '*' shared_preload_libraries = 'repmgr' [root@hwd12 ~]# vi /var/lib/pgsql/12/data/pg_hba.conf local replication repmgr trust host replication repmgr 127.0.0.1/32 trust host replication repmgr 192.168.120.0/24 trust local repmgr repmgr trust host repmgr repmgr 127.0.0.1/32 trust host repmgr repmgr 192.168.120.0/24 trust [root@hwd12 ~]# systemctl enable postgresql-12.service [root@hwd12 ~]# systemctl restart postgresql-12.service
[root@hwd12 ~]# su - postgres [postgres@hwd12 ~]$ createuser --superuser repmgr [postgres@hwd12 ~]$ createdb --owner=repmgr repmgr [postgres@hwd12 ~]$ psql -c "ALTER USER repmgr SET search_path TO repmgr, public;"
主节点链接witness节点测试:
[postgres@hwd04 ~]$ psql 'host=192.168.120.50 user=repmgr dbname=repmgr connect_timeout=2' psql (12.3) Type "help" for help. repmgr=# \q
[root@hwd12 ~]# vi /etc/repmgr/12/repmgr.conf node_id=4 node_name='hwd12' conninfo='host=192.168.120.50 user=repmgr dbname=repmgr connect_timeout=2' data_directory='/var/lib/pgsql/12/data'
[postgres@hwd12 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf witness register -h 192.168.120.25 INFO: connecting to witness node "hwd12" (ID: 4) INFO: connecting to primary node NOTICE: attempting to install extension "repmgr" NOTICE: "repmgr" extension successfully installed INFO: witness registration complete NOTICE: witness node "hwd12" (ID: 4) successfully registered
注册完成后,查询集群状态以下图所示:
加入如下内容:
[root@hwd12 ~]# vi /etc/sudoer Defaults:postgres !requiretty postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql-12.service, /usr/bin/systemctl start postgresql-12.service, /usr/bin/systemctl restart postgresql-12.service, /usr/bin/systemctl reload postgresql-12.service, /usr/bin/systemctl start repmgr12.service, /usr/bin/systemctl stop repmgr12.service
编辑全部节点的repmgr.conf文件,加入如下内容:
failover='automatic' priority=60 connection_check_type=ping reconnect_attempts=6 reconnect_interval=10 promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file' follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n' monitoring_history=yes monitor_interval_secs=2 standby_disconnect_on_failover=true primary_visibility_consensus=true log_status_interval=60 service_start_command = 'sudo /usr/bin/systemctl start postgresql-12.service' service_stop_command = 'sudo /usr/bin/systemctl stop postgresql-12.service' service_restart_command = 'sudo /usr/bin/systemctl restart postgresql-12.service' service_reload_command = 'sudo /usr/bin/systemctl reload postgresql-12.service' repmgrd_service_start_command = 'sudo /usr/bin/systemctl start repmgr12.service' repmgrd_service_stop_command = 'sudo /usr/bin/systemctl stop repmgr12.service'
注意:standby的priority值须要更改,由于默认是100,而primary使用的是默认值。这里设置hwd05的priority为60,hwd06的priority为40。而witness节点hwd12不须要设置priority参数。另外,priority的值越大,成为primary的优先级就越高。
编辑完成后,启动各个节点的repmgr服务:
[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf daemon start --dry-run INFO: prerequisites for starting repmgrd met DETAIL: following command would be executed: sudo /usr/bin/systemctl start repmgr12.service [postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf daemon start NOTICE: executing: "sudo /usr/bin/systemctl start repmgr12.service" NOTICE: repmgrd was successfully started
启动完成后,能够在primary或者standby节点查询集群的events,以下:
[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf cluster event --event=repmgrd_start
也能够经过操做系统日志文件,查询repmgr相关信息。
这里将hwd04的PostgreSQL服务中止掉,而后经过日志信息,是否能够实现自动将standby角色转为primary角色,其余正常节点从新链接到新的primary节点。
[postgres@hwd04 ~]$ sudo systemctl stop postgresql-12.service
中止后,查看集群信息,发现primary节点状态变为unreachable。
1分钟后,再查看witness节点的日志,就会发现hwd05已成为新的primary,其余节点已从新链接至hwd05,witness日志以下:
当旧primary故障恢复后,并不会自动转换为standby,而是以primary角色独自运行,这时就须要将其从新加入到集群中。以下:
[postgres@hwd04 ~]$ repmgr node service --action=stop --checkpoint NOTICE: issuing CHECKPOINT on node "hwd04" (ID: 1) DETAIL: executing server command "sudo /usr/bin/systemctl stop postgresql-12.service" [postgres@hwd04 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf -d 'host=192.168.120.26 user=repmgr dbname=repmgr' node rejoin --force-rewind NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2 DETAIL: rejoin target server's timeline 3 forked off current database system timeline 2 before current recovery point F/EB000028 NOTICE: executing pg_rewind DETAIL: pg_rewind command is "/usr/pgsql-12/bin/pg_rewind -D '/var/lib/pgsql/12/data' --source-server='host=192.168.120.26 user=repmgr dbname=repmgr connect_timeout=2'" pg_rewind: servers diverged at WAL location F/EA0000A0 on timeline 2 pg_rewind: rewinding from last common checkpoint at F/EA000028 on timeline 2 pg_rewind: Done! NOTICE: 0 files copied to /var/lib/pgsql/12/data NOTICE: setting node 1's upstream to node 2 WARNING: unable to ping "host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2" DETAIL: PQping() returned "PQPING_NO_RESPONSE" NOTICE: starting server using "sudo /usr/bin/systemctl start postgresql-12.service" NOTICE: NODE REJOIN successful DETAIL: node 1 is now attached to node 2
若是不能从新加入,能够将旧primary强制(-F)转换为standby,以下:
[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.26 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone -F NOTICE: destination directory "/var/lib/pgsql/12/data" provided INFO: connecting to source node DETAIL: connection string is: host=192.168.120.26 user=repmgr dbname=repmgr DETAIL: current installation size is 15 GB NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) WARNING: directory "/var/lib/pgsql/12/data" exists but is not empty NOTICE: -F/--force provided - deleting existing data directory "/var/lib/pgsql/12/data" NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup" -D /var/lib/pgsql/12/data -h 192.168.120.26 -p 5432 -U repmgr -X stream NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: sudo /usr/bin/systemctl start postgresql-12.service HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record [postgres@hwd04 ~]$ sudo systemctl start postgresql-12.service [postgres@hwd04 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf standby register -F INFO: connecting to local node "hwd04" (ID: 1) INFO: connecting to primary database INFO: standby registration complete NOTICE: standby node "hwd04" (ID: 1) successfully registered
也能够经过查询pg_stat_replication视图获取相关信息,以下:
postgres=# select pid,usesysid,usename,application_name,client_addr,client_port,state,sent_lsn,write_lsn,flush_lsn,sync_state from pg_stat_replication;
这里将hwd06提高为primary节点,当前集群信息以下图:
首先进行预演操做:
[postgres@hwd06 ~]$ repmgr standby switchover --siblings-follow --dry-run NOTICE: checking switchover on node "hwd06" (ID: 3) in --dry-run mode INFO: SSH connection to host "192.168.120.25" succeeded INFO: able to execute "repmgr" on remote host "192.168.120.25" INFO: all sibling nodes are reachable via SSH INFO: 3 walsenders required, 10 available INFO: demotion candidate is able to make replication connection to promotion candidate INFO: 0 pending archive files INFO: replication lag on this standby is 0 seconds INFO: would pause repmgrd on node "hwd04" (ID 1) INFO: would pause repmgrd on node "hwd05" (ID 2) INFO: would pause repmgrd on node "hwd06" (ID 3) INFO: would pause repmgrd on node "hwd12" (ID 4) NOTICE: local node "hwd06" (ID: 3) would be promoted to primary; current primary "hwd04" (ID: 1) would be demoted to standby INFO: following shutdown command would be run on node "hwd04": "sudo /usr/bin/systemctl stop postgresql-12.service" INFO: parameter "shutdown_check_timeout" is set to 60 seconds INFO: prerequisites for executing STANDBY SWITCHOVER are met
预演无报错,下面正式执行switchover操做:
[postgres@hwd06 ~]$ repmgr standby switchover --siblings-follow NOTICE: executing switchover on node "hwd06" (ID: 3) NOTICE: local node "hwd06" (ID: 3) will be promoted to primary; current primary "hwd04" (ID: 1) will be demoted to standby NOTICE: stopping current primary node "hwd04" (ID: 1) NOTICE: issuing CHECKPOINT on node "hwd04" (ID: 1) DETAIL: executing server command "sudo /usr/bin/systemctl stop postgresql-12.service" INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout") NOTICE: current primary has been cleanly shut down at location F/EB000028 NOTICE: promoting standby to primary DETAIL: promoting server "hwd06" (ID: 3) using pg_promote() NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete NOTICE: STANDBY PROMOTE successful DETAIL: server "hwd06" (ID: 3) was successfully promoted to primary INFO: local node 1 can attach to rejoin target node 3 DETAIL: local node's recovery point: F/EB000028; rejoin target node's fork point: F/EB0000A0 NOTICE: setting node 1's upstream to node 3 WARNING: unable to ping "host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2" DETAIL: PQping() returned "PQPING_NO_RESPONSE" NOTICE: starting server using "sudo /usr/bin/systemctl start postgresql-12.service" NOTICE: NODE REJOIN successful DETAIL: node 1 is now attached to node 3 NOTICE: node "hwd06" (ID: 3) promoted to primary, node "hwd04" (ID: 1) demoted to standby NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings INFO: node 4 received notification to follow node 3 INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes NOTICE: switchover was successful DETAIL: node "hwd06" is now primary and node "hwd04" is attached as standby NOTICE: STANDBY SWITCHOVER has completed successfully
操做完成后,集群信息以下图: