生产上选用repmgr给PostgreSQL数据库作高可用集群,在给生产上一套库作高可用改造时发现standby clone时报错,没法复制备库,报错内容以下:node
先说缘由,是由于对PG和pg_basebackup比较了解的同窗可能本身就能够想出解决方案,不须要再继续往下看了。缘由是因为建立的独立表空间指定的目录放在$PGDATA目录下,repmgr的standby clone调用的是pg_basebackup,并且没有指定输出格式,默认为plain,会复制主库目录时把PGDATA目录下全部文件、目录和独立表空间目录,因此会报错File exists。sql
因为方案一涉及到对主库作操做,不建议在生产上操做,除非不介意对应用的影响。数据库
添加表空间、建立database、写表 app
postgres=# create user pguser login password 'pguser'; CREATE ROLE postgres=# create tablespace tbs_mydb owner pguser location '/home/postgres/data/pg_tbs/tbs_mydb'; WARNING: tablespace location should not be inside the data directory CREATE TABLESPACE postgres=# create database mydb with owner=pguser template=template0 encoding='UTF8' tablespace =tbs_mydb; CREATE DATABASE postgres=# grant all on database mydb to pguser with grant option; GRANT postgres=# grant all on tablespace tbs_mydb to pguser; GRANT postgres=# \c mydb pguser You are now connected to database "mydb" as user "pguser". mydb=> create table t1 (id int); CREATE TABLE mydb=> insert into t1 values(1); INSERT 0 1 mydb=> select * from t1; id ---- 1 (1 row)
第一次尝试 standby clone,出现与生产上一致的报错,报错信息与生产一致socket
INFO: checking and correcting permissions on existing directory "/home/postgres/data" NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup" -D /home/postgres/data -h 192.168.56.111 -p 6000 -U repmgr -X stream pg_basebackup: could not create directory "/home/postgres/data/pg_tbs": File exists pg_basebackup: removing contents of data directory "/home/postgres/data" pg_basebackup: changes to tablespace directories will not be undone ERROR: unable to take a base backup of the source server HINT: data directory ("/home/postgres/data") may need to be cleaned up manually
修改repmgr.conf中的data_directory='/home/postgres/repmgr'ide
再次尝试 standby clone,成功post
[postgres@repmgr2 ~]$ repmgr -h 192.168.56.111 -U repmgr -d repmgr -f ~/repmgr.conf standby clone -p6000 NOTICE: destination directory "/home/postgres/repmgr" provided INFO: connecting to source node DETAIL: connection string is: host=192.168.56.111 user=repmgr port=6000 dbname=repmgr DETAIL: current installation size is 45 MB DEBUG: 1 node records returned by source node DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path=" DEBUG: upstream_node_id determined as 111 INFO: replication slot usage not requested; no replication slot will be set up for this standby NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: checking and correcting permissions on existing directory "/home/postgres/repmgr" NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup" -D /home/postgres/repmgr -h 192.168.56.111 -p 6000 -U repmgr -X stream DEBUG: create_recovery_file(): creating "/home/postgres/repmgr/recovery.conf"... DEBUG: recovery.conf line: standby_mode = 'on' DEBUG: recovery.conf line: primary_conninfo = 'host=192.168.56.111 user=repmgr port=6000 application_name=repmgr2 connect_timeout=2' DEBUG: recovery.conf line: recovery_target_timeline = 'latest' NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /home/postgres/repmgr start HINT: after starting the server, you need to register this standby with "repmgr standby register"
修改repmgr.conf为原来的配置,并把repmgr目录下的全部文件mv到data目录下测试
data_directory='/home/postgres/data' [postgres@repmgr2 repmgr]$ mv * ~/data/ mv: cannot move ‘pg_tbs’ to ‘/home/postgres/data/pg_tbs’: File exists
修改配置文件中的cluster_name参数并启动数据库ui
[postgres@repmgr2 data]$ pg_ctl -D /home/postgres/data/ start waiting for server to start....2021-02-28 10:09:15.905 CST [3498] LOG: listening on IPv4 address "0.0.0.0", port 6000 2021-02-28 10:09:15.912 CST [3498] LOG: listening on Unix socket "/tmp/.s.PGSQL.6000" 2021-02-28 10:09:15.949 CST [3498] LOG: redirecting log output to logging collector process 2021-02-28 10:09:15.949 CST [3498] HINT: Future log output will appear in directory "log". . done server started
注册备库成功this
[postgres@repmgr2 data]$ repmgr -f ../repmgr.conf standby register INFO: connecting to local node "repmgr2" (ID: 113) DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.113 port=6000 fallback_application_name=repmgr options=-csearch_path=" INFO: connecting to primary database DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path=" WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 111) INFO: standby registration complete NOTICE: standby node "repmgr2" (ID: 113) successfully registered
检查集群状态
[postgres@repmgr2 data]$ repmgr -f ../repmgr.conf cluster show DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.113 port=6000 fallback_application_name=repmgr options=-csearch_path=" DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path=" DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.113 port=6000 fallback_application_name=repmgr options=-csearch_path=" DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path=" ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string -----+---------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------- 111 | repmgr1 | primary | * running | | default | 100 | 5 | host=192.168.56.111 port=6000 user=repmgr dbname=repmgr connect_timeout=2 113 | repmgr2 | standby | running | repmgr1 | default | 100 | 5 | host=192.168.56.113 port=6000 user=repmgr dbname=repmgr connect_timeout=2
主库测试添加数据
mydb=> insert into t1 values(2); INSERT 0 1 mydb=> select * from t1; id ---- 1 2 (2 rows)
从库查询
[postgres@repmgr2 data]$ psql psql (10.11) Type "help" for help. postgres=# \c mydb pguser You are now connected to database "mydb" as user "pguser". mydb=> select * from t1; id ---- 1 2 (2 rows)
其实在建立独立表空间时PG已经作了提示表空间不该用在DATA目录,因此出现上面的报错就是掉进了前人的坑。
WARNING: tablespace location should not be inside the data directory
若是想尝试方案一的能够提供一下思路
#新建立一个表空间 postgres=# create tablespace zhijian owner pguser location '/data/pgdata/11/pg_tbs/tbs_zhijian'; CREATE TABLESPACE #更改数据库的表空间 mydb=> \c postgres postgres postgres=# alter database mydb set tablespace zhijian; ALTER DATABASE