repmgr配置备库报错File exists处理一例

时间 2021-03-02

标签 node sql 数据库 app socket ide post 测试 this 栏目 SQL 繁體版

原文原文链接

背景

生产上选用repmgr给PostgreSQL数据库作高可用集群，在给生产上一套库作高可用改造时发现standby clone时报错，没法复制备库，报错内容以下：
node

缘由

先说缘由，是由于对PG和pg_basebackup比较了解的同窗可能本身就能够想出解决方案，不须要再继续往下看了。缘由是因为建立的独立表空间指定的目录放在$PGDATA目录下，repmgr的standby clone调用的是pg_basebackup，并且没有指定输出格式，默认为plain，会复制主库目录时把PGDATA目录下全部文件、目录和独立表空间目录，因此会报错File exists。sql

解决方案

迁移主库独立表空间到PGDATA之外的目录(会阻塞写)
指定新目录作standby clone，clone完后把把文件移到实际PGDATA目录

因为方案一涉及到对主库作操做，不建议在生产上操做，除非不介意对应用的影响。数据库

测试方案二

节点1操做

添加表空间、建立database、写表 app

postgres=# create user pguser login  password 'pguser';
CREATE ROLE
postgres=# create tablespace tbs_mydb owner pguser location '/home/postgres/data/pg_tbs/tbs_mydb';
WARNING:  tablespace location should not be inside the data directory
CREATE TABLESPACE
postgres=# create  database mydb with owner=pguser template=template0 encoding='UTF8' tablespace =tbs_mydb;
CREATE DATABASE
postgres=# grant all on database mydb to pguser with grant option;
GRANT
postgres=# grant all on tablespace tbs_mydb to pguser;
GRANT
postgres=# \c mydb pguser
You are now connected to database "mydb" as user "pguser".
mydb=> create table t1 (id int);
CREATE TABLE
mydb=> insert into t1 values(1);
INSERT 0 1
mydb=> select * from t1;
 id 
----
  1
(1 row)

节点2操做

第一次尝试 standby clone，出现与生产上一致的报错，报错信息与生产一致socket

INFO: checking and correcting permissions on existing directory "/home/postgres/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /home/postgres/data -h 192.168.56.111 -p 6000 -U repmgr -X stream 
pg_basebackup: could not create directory "/home/postgres/data/pg_tbs": File exists
pg_basebackup: removing contents of data directory "/home/postgres/data"
pg_basebackup: changes to tablespace directories will not be undone
ERROR: unable to take a base backup of the source server
HINT: data directory ("/home/postgres/data") may need to be cleaned up manually

修改repmgr.conf中的data_directory='/home/postgres/repmgr'ide

再次尝试 standby clone，成功post

[postgres@repmgr2 ~]$ repmgr -h 192.168.56.111 -U repmgr -d repmgr -f ~/repmgr.conf standby clone -p6000 
NOTICE: destination directory "/home/postgres/repmgr" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.56.111 user=repmgr port=6000 dbname=repmgr
DETAIL: current installation size is 45 MB
DEBUG: 1 node records returned by source node
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path="
DEBUG: upstream_node_id determined as 111
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/home/postgres/repmgr"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /home/postgres/repmgr -h 192.168.56.111 -p 6000 -U repmgr -X stream 
DEBUG: create_recovery_file(): creating "/home/postgres/repmgr/recovery.conf"...
DEBUG: recovery.conf line: standby_mode = 'on'

DEBUG: recovery.conf line: primary_conninfo = 'host=192.168.56.111 user=repmgr port=6000 application_name=repmgr2 connect_timeout=2'

DEBUG: recovery.conf line: recovery_target_timeline = 'latest'

NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /home/postgres/repmgr start
HINT: after starting the server, you need to register this standby with "repmgr standby register"

修改repmgr.conf为原来的配置，并把repmgr目录下的全部文件mv到data目录下测试

data_directory='/home/postgres/data'

[postgres@repmgr2 repmgr]$ mv * ~/data/
mv: cannot move ‘pg_tbs’ to ‘/home/postgres/data/pg_tbs’: File exists

修改配置文件中的cluster_name参数并启动数据库ui

[postgres@repmgr2 data]$ pg_ctl -D /home/postgres/data/ start
waiting for server to start....2021-02-28 10:09:15.905 CST [3498] LOG:  listening on IPv4 address "0.0.0.0", port 6000
2021-02-28 10:09:15.912 CST [3498] LOG:  listening on Unix socket "/tmp/.s.PGSQL.6000"
2021-02-28 10:09:15.949 CST [3498] LOG:  redirecting log output to logging collector process
2021-02-28 10:09:15.949 CST [3498] HINT:  Future log output will appear in directory "log".
. done
server started

注册备库成功this

[postgres@repmgr2 data]$ repmgr -f ../repmgr.conf standby register
INFO: connecting to local node "repmgr2" (ID: 113)
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.113 port=6000 fallback_application_name=repmgr options=-csearch_path="
INFO: connecting to primary database
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path="
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 111)
INFO: standby registration complete
NOTICE: standby node "repmgr2" (ID: 113) successfully registered

检查集群状态

[postgres@repmgr2 data]$ repmgr -f ../repmgr.conf cluster show
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.113 port=6000 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.113 port=6000 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path="
 ID  | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                        
-----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------
 111 | repmgr1 | primary | * running |          | default  | 100      | 5        | host=192.168.56.111 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 113 | repmgr2 | standby |   running | repmgr1  | default  | 100      | 5        | host=192.168.56.113 port=6000 user=repmgr dbname=repmgr connect_timeout=2

测试数据同步

主库测试添加数据

mydb=> insert into t1 values(2);
INSERT 0 1
mydb=> select * from t1;
 id 
----
  1
  2
(2 rows)

从库查询

[postgres@repmgr2 data]$ psql
psql (10.11)
Type "help" for help.

postgres=# \c mydb pguser
You are now connected to database "mydb" as user "pguser".
mydb=> select * from t1;
 id 
----
  1
  2
(2 rows)

写在最后

其实在建立独立表空间时PG已经作了提示表空间不该用在DATA目录，因此出现上面的报错就是掉进了前人的坑。

WARNING:  tablespace location should not be inside the data directory

若是想尝试方案一的能够提供一下思路

#新建立一个表空间
postgres=# create tablespace zhijian  owner pguser location '/data/pgdata/11/pg_tbs/tbs_zhijian';
CREATE TABLESPACE
#更改数据库的表空间
mydb=> \c postgres postgres
postgres=# alter database mydb set  tablespace zhijian;
ALTER DATABASE