greenplum常见问题及解决方法

一、错误:数据库初始化:gpinitsystem -c gpconfigs/gpinitsystem_config -h list

错误提示:
2018-08-29 16:51:01.338476 CST,,,p21229,th406714176,,,,0,,,seg-999,,,,,"FATAL","XX000","could not create semaphores: No space left on device (pg_sema.c:129)","Failed system call was semget(127, 17, 03600).","This error does *not* mean that you have run out of disk space.
It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded.  You need to raise the respective kernel parameter.  Alternatively, reduce PostgreSQL's consumption ofsemaphores by reducing its max_connections parameter (currently 753).
The PostgreSQL documentation contains more information about configuring your system for PostgreSQL.",,,,,,"InternalIpcSemaphoreCreate","pg_sema.c",129,1    0x95661b postgres errstart (elog.c:521)

解决办法:
[root@bj-ksy-g1-mongos-02 primary]# cat /proc/sys/kernel/sem
250	32000	32	128

修改kernel.sem为:
[root@bj-ksy-g1-mongos-02 primary]# cat /etc/sysctl.conf
kernel.sem = 250 512000 100 2048

二、错误 :执行检查:gpcheck -f list

错误提示:
XFS filesystem on device /dev/vdb1 is missing the recommended mount option 'allocsize=16m'

解决办法:
[gpadmin@bj-ksy-g1-mongos-01 ~]$ cat /etc/fstab
/dev/vdb1 /opt  xfs  defaults,allocsize=16348k,inode64,noatime        1 1

三、错误:gpadmin-[CRITICAL]:-gpstate failed. (Reason=‘Environment Variable MASTER_DATA_DIRECTORY not set!’) exiting…

错误提示:node

[gpadmin@bj-ksy-g1-mongos-01 ~]$ gpstop
20180830:09:11:42:011904 gpstop:bj-ksy-g1-mongos-01:gpadmin-[INFO]:-Starting gpstop with args:
20180830:09:11:42:011904 gpstop:bj-ksy-g1-mongos-01:gpadmin-[INFO]:-Gathering information and validating the environment...
20180830:09:11:42:011904 gpstop:bj-ksy-g1-mongos-01:gpadmin-[CRITICAL]:-gpstop failed. (Reason='Environment Variable MASTER_DATA_DIRECTORY not set!') exiting...
[gpadmin@bj-ksy-g1-mongos-01 ~]$ gpstop -M fast
20180830:09:12:07:011962 gpstop:bj-ksy-g1-mongos-01:gpadmin-[INFO]:-Starting gpstop with args: -M fast
20180830:09:12:07:011962 gpstop:bj-ksy-g1-mongos-01:gpadmin-[INFO]:-Gathering information and validating the environment...
20180830:09:12:07:011962 gpstop:bj-ksy-g1-mongos-01:gpadmin-[CRITICAL]:-gpstop failed. (Reason='Environment Variable MASTER_DATA_DIRECTORY not set!') exiting...
[gpadmin@bj-ksy-g1-mongos-01 ~]$ gpstate
20180830:09:13:03:012093 gpstate:bj-ksy-g1-mongos-01:gpadmin-[INFO]:-Starting gpstate with args:
20180830:09:13:03:012093 gpstate:bj-ksy-g1-mongos-01:gpadmin-[CRITICAL]:-gpstate failed. (Reason='Environment Variable MASTER_DATA_DIRECTORY not set!') exiting...

解决方法:python

[gpadmin@bj-ksy-g1-mongos-01 ~]$ vim ~/.bashrc
添加:
MASTER_DATA_DIRECTORY=/opt/data/master/gpseg-1
export MASTER_DATA_DIRECTORY

四、错误: Reason=’[Errno 12] Cannot allocate memory’

gpstart、gpstate、gpstop操做会报一样的错误linux

错误提示:web

[gpadmin@bj-ksy-g1-mongos-01 ~]$ gpstate -s
20180830:09:22:01:013309 gpstate:bj-ksy-g1-mongos-01:gpadmin-[INFO]:-Starting gpstate with args: -s
20180830:09:22:01:013309 gpstate:bj-ksy-g1-mongos-01:gpadmin-[CRITICAL]:-gpstate failed. (Reason='[Errno 12] Cannot allocate memory') exiting...

解决方法:sql

使用root用户

[root@bj-ksy-g1-mongos-01 ~]# swapon -s #查看swap状况
[root@bj-ksy-g1-mongos-01 ~]# dd if=/dev/zero of=/swapfile bs=1024 count=1024k
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 3.20053 s, 335 MB/s
[root@bj-ksy-g1-mongos-01 ~]# mkswap /swapfile
Setting up swapspace version 1, size = 1048572 KiB
no label, UUID=3e8ef2b3-5d9e-4e04-9718-36caefbfc21d
[root@bj-ksy-g1-mongos-01 ~]# swapon /swapfile
swapon: /swapfile: insecure permissions 0644, 0600 suggested.

[root@bj-ksy-g1-mongos-01 ~]#vim /etc/fstab  #使swap持久化
添加:
/swapfile none swap sw 0 0

进入gpadmin
验证结果
[gpadmin@bj-ksy-g1-mongos-01 ~]$ gpstate -s
20180830:09:34:56:015816 gpstate:bj-ksy-g1-mongos-01:gpadmin-[INFO]:-Starting gpstate with args: -s
20180830:09:34:56:015816 gpstate:bj-ksy-g1-mongos-01:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.0 build commit:1971b301f52979ac74fb3d0a141bbaae06b70857'
20180830:09:34:56:015816 gpstate:bj-ksy-g1-mongos-01:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.0 build commit:1971b301f52979ac74fb3d0a141bbaae06b70857) on x86_64-pc-linux-gnu, compiled by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 12 2018 21:15:36'
20180830:09:34:56:015816 gpstate:bj-ksy-g1-mongos-01:gpadmin-[INFO]:-Obtaining Segment details from master...
20180830:09:34:56:015816 gpstate:bj-ksy-g1-mongos-01:gpadmin-[INFO]:-Gathering data from segments...
20180830:09:34:57:015816 gpstate:bj-ksy-g1-mongos-01:gpadmin-[INFO]:-----------------------------------------------------
20180830:09:34:57:015816 gpstate:bj-ksy-g1-mongos-01:gpadmin-[INFO]:--Master Configuration & Status

五、ERROR: permission denied: “gp_segment_configuration” is a system catalog

错误:数据库

ERROR: permission denied: “gp_segment_configuration” is a system catalogvim

解决:bash

postgres=# delete from gp_segment_configuration where role='m';
ERROR:  permission denied: "gp_segment_configuration" is a system catalog
postgres=# set allow_system_table_mods='dml';
SET
postgres=# delete from gp_segment_configuration where role='m';
DELETE 9
postgres=#

六、错误:FATAL",“XX000”,“could not create shared memory segment: Cannot allocate memory (pg_shmem.c:183)”

2018-10-15 19:45:37.841672 CST,,,p10296,th624441152,,,,0,,,seg-1,,,,,"FATAL","XX000","could not create shared memory segment: Cannot allocate memory (pg_shmem.c:183)","Failed system call was shmget(key=40002001, size=267762784, 03600).","This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory or swap space. To reduce the request size (currently 267762784 bytes), reduce PostgreSQL's shared_buffers parameter (currently 4000) and/or its max_connections parameter (currently 753).
The PostgreSQL documentation contains more information about shared memory configuration.",,,,,,"InternalIpcMemoryCreate","pg_shmem.c",183,1    0x95661b postgres errstart (elog.c:521)
2    0x7bc723 postgres <symbol not found> (pg_shmem.c:145)
3    0x7bc9ba postgres PGSharedMemoryCreate (pg_shmem.c:387)
4    0x812d69 postgres CreateSharedMemoryAndSemaphores (ipci.c:242)
5    0x7d47dc postgres PostmasterMain (postmaster.c:3996)
6    0x4c8af7 postgres main (main.c:206)
7    0x7f372083ab15 libc.so.6 __libc_start_main + 0xf5
8    0x4c904c postgres <symbol not found> + 0x4c904c

解决方法:服务器

使用root用户

[root@bj-ksy-g1-mongos-01 ~]# swapon -s #查看swap状况
[root@bj-ksy-g1-mongos-01 ~]# dd if=/dev/zero of=/swapfile bs=1024 count=1024k
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 3.20053 s, 335 MB/s
[root@bj-ksy-g1-mongos-01 ~]# mkswap /swapfile
Setting up swapspace version 1, size = 1048572 KiB
no label, UUID=3e8ef2b3-5d9e-4e04-9718-36caefbfc21d
[root@bj-ksy-g1-mongos-01 ~]# swapon /swapfile
swapon: /swapfile: insecure permissions 0644, 0600 suggested.

[root@bj-ksy-g1-mongos-01 ~]#vim /etc/fstab  #使swap持久化
添加:
/swapfile none swap sw 0 0

七、修改shared_buffer,使没法启动数据库

gpconfig -c shared_buffers -v "8192MB"
greenplum修改shared_buffer,使没法启动数据库。
缘由:kernel.shmmax的值为500000000(476MB),shared_buffer大于476MB时,数据库就没法正常启动。kernel.shmmax参数设置太小。

解决办法:增长kernel.shmmax,最好把此参数设置为总内存的50%。

八、

greenplum运行一段时间链接失败,而且pg_stat_activity的链接数没有达到设置的限制。session

net.core.somaxconn=65535

net.core.rmem_max=16777216

net.core.wmem_max=16777216

net.core.somaxconn是Linux中的一个kernel参数,表示socket监听(listen)的backlog上限。什么是backlog呢?backlog就是socket的监听队列,当一个请求(request)还没有被处理或创建时,他会进入backlog。而socket server能够一次性处理backlog中的全部请求,处理后的请求再也不位于监听队列中。当server处理请求较慢,以致于监听队列被填满后,新来的请求会被拒绝。
Linux的参数net.core.somaxconn默认值一样为128。当服务端繁忙时,如NameNode或JobTracker,128是远远不够的。这样就须要增大backlog,例如咱们的3000台集群就将ipc.server.listen.queue.size设成了32768,为了使得整个参数达到预期效果,一样须要将kernel参数net.core.somaxconn设成一个大于等于32768的值。

九、File “/home/gpadmin/greenplum-db/lib/python/gppylib/commands/base.py”, line 243, in run

错误提示:
gpstate -s
全部的segment出现故障

开始停掉greenplum
gpstop -a
错误输出:

'
20181227:10:18:11:2243549 gpstop:hrdskf-k:gpadmin-[ERROR]:-ExecutionError: 'non-zero rc: 1' occured.  Details: 'ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=60 hrdskf-k ". /home/gpadmin/greenplum-db/./greenplum_path.sh; $GPHOME/sbin/gpoperation.py"'  cmd had rc=1 completed=True halted=False
  stdout=''
  stderr='\S
Kernel \r on an \m
Warm tips :Authorized for Haier Utility's Uses only. All activity may be monitored and reported.
If you have any questions,please contact us.
Mailbox:dts.jxjg@haier.com
Phone:68066686 / 1000 / 8173
WARNING: Your password has expired.
Password change required but no TTY available.
'
Traceback (most recent call last):
  File "/home/gpadmin/greenplum-db/lib/python/gppylib/commands/base.py", line 243, in run
    self.cmd.run()
  File "/home/gpadmin/greenplum-db/lib/python/gppylib/operations/__init__.py", line 53, in run
    self.ret = self.execute()
  File "/home/gpadmin/greenplum-db/lib/python/gppylib/operations/utils.py", line 48, in execute
    cmd.run(validateAfter=True)
  File "/home/gpadmin/greenplum-db/lib/python/gppylib/commands/base.py", line 717, in run
    self.validate()
  File "/home/gpadmin/greenplum-db/lib/python/gppylib/commands/base.py", line 764, in validate
    raise ExecutionError("non-zero rc: %d" % self.results.rc, self)
ExecutionError: ExecutionError: 'non-zero rc: 1' occured.  Details: 'ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=60 hrdskf-k ". /home/gpadmin/greenplum-db/./greenplum_path.sh; $GPHOME/sbin/gpoperation.py"'  cmd had rc=1 completed=True halted=False
  stdout=''
  stderr='\S
Kernel \r on an \m
Warm tips :Authorized for Haier Utility's Uses only. All activity may be monitored and reported.
If you have any questions,please contact us.
Mailbox:dts.jxjg@haier.com
Phone:68066686 / 1000 / 8173
WARNING: Your password has expired.
Password change required but no TTY available.

解决思路:
经过日志分析ssh问题
一、验证是否能够免密登录
二、结果须要从新设置密码
三、ssh hostname 提示修改密码

服务器的普通设置,默认有实效时间
查看并修改密码有效时间

[root@hrdskf-m ~]# chage -l gpadmin
Last password change                                    : Dec 27, 2018
Password expires                                        : Feb 25, 2019
Password inactive                                       : never
Account expires                                         : never
Minimum number of days between password change          : 1
Maximum number of days between password change          : 60
Number of days of warning before password expires       : 14
[root@hrdskf-m ~]# chage -l root
Last password change                                    : Dec 24, 2018
Password expires                                        : never
Password inactive                                       : never
Account expires                                         : never
Minimum number of days between password change          : 0
Maximum number of days between password change          : 99999
Number of days of warning before password expires       : 7
[root@hrdskf-m ~]# chage -M 99999 gpadmin   #此设置永不过时
[root@hrdskf-m ~]# chage -l gpadmin
Last password change                                    : Dec 27, 2018
Password expires                                        : never
Password inactive                                       : never
Account expires                                         : never
Minimum number of days between password change          : 1
Maximum number of days between password change          : 99999
Number of days of warning before password expires       : 14
[root@hrdskf-m ~]#

十、ould not create shared memory segment: Invalid argument (pg_shmem.c:136),Failed

error:"could not create shared memory segment: Invalid argument (pg_shmem.c:136),Failed "

解决:You will need to reduce the value of the parameter max_connections.

十一、“failed to acquire resources on one or more segments”,"connection pointer is NULL

错误:2018-11-09 10:08:13.279910 CST,"gpadmin","xn_report",p119553,th-1821042816,"172.23.0.74","16532",2018-11-09 10:08:13 CST,0,con10783,,seg-1,,dx2364872,,sx1,"ERROR","58M01","failed to acquire resources on one or more segments","connection pointer is NULL

这与Master上的Query Dispatcher(QD)进程有关。它显示链接到主服务器上的postmaster进程的主服务器上的QD进程链接问题。
能够将参数gp_reject_internal_tcp_connection更改成“off”。此参数的默认值为“on”。此参数用于容许与主服务器的内部TCP链接。理想状况下,应使用UNIX域套接字而不是TCP链接,这就是参数gp_reject_internal_tcp_connection的默认值为“on”的缘由。
此参数是受限制的参数,在设置此参数时,您须要使用“–skipvalidation”值。要设置参数,您须要运行如下命令:
gpconfig -c gp_reject_internal_tcp_connection -v off --skipvalidation
注意 - 设置此参数后,须要从新启动数据库。

https://community.pivotal.io/s/article/Error-Failed-to-acquire-resources-on-one-or-more-segments-in-Pivotal-Greenplum

十二、

max_connections 数据库服务器的最大并发链接数。在Greenplum系统中,用户客户端链接仅经过Greenplum主实例。段实例应该容许5-10倍的数量。增长此参数时,还必须增长max_prepared_transactions。
max_prepared_transactions:
设置能够同时处于准备状态的最大事务数。Greenplum在内部使用准备好的事务来确保各个段的数据完整性。该值必须至少与主服务器上的max_connections值同样大。段实例应设置为与主节点相同的值。
gpconfig -c max_prepared_transactions -v 500
gpconfig -c max_connections -v 2500 -m 500

1三、VM protect failed to allocate 131080 bytes from system, VM Protect 8098 MB available

VM protect failed to allocate 131080 bytes from system, VM Protect 8098 MB available

gpconfig -c gp_max_plan_size -v "200MB"

1四、psql: FATAL: DTM initialization: failure during startup recovery, retry failed, check segment status (cdbtm.c:1602)

psql: FATAL:  DTM initialization: failure during startup recovery, retry failed, check segment status (cdbtm.c:1602)

数据库启动节点都是up正常状态

解决办法:

GOPTIONS='-c gp_session_role=utility' psql -d postgres

交流群:725450393
在这里插入图片描述