简单来讲GPDB是一个分布式数据库软件,其能够管理和处理分布在多个不一样主机上的海量数据。对于GPDB来讲,一个DB实例其实是由多个独立的PostgreSQL实例组成的,它们分布在不一样的物理主机上,协同工做,呈现给用户的是一个DB的效果。Master是GPDB系统的访问入口,其负责处理客户端的链接及SQL 命令、协调系统中的其余Instance(Segment)工做,Segment负责管理和处理用户数据。node
环境准备:
操做系统:CentOS Linux release 7.6.1810 (Core) 64位
master 1台(架构图中的主节点),Standby 1台(架构图中的从节点),Segment 2台。共4台服务器。linux
1. 修改/etc/hosts文件,添加下面内容(注:4台服务器相同的配置)算法
vim /etc/hostssql
192.168.18.130 gp-master 192.168.18.131 gp-standby 192.168.18.132 gp-node1 192.168.18.133 gp-node2
2. 服务器关闭selinux,防火墙4台服务器相互开放,测试环境能够直接先关闭防火墙。(注:4台服务器相同的配置)数据库
关闭Firewalldvim
systemctl stop firewalld
systemctl disable firewalld
永久关闭Selinuxbash
vim /etc/selinux/conf服务器
# This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=disabled # SELINUXTYPE= can take one of three two values: # targeted - Targeted processes are protected, # minimum - Modification of targeted policy. Only selected processes are protected. # mls - Multi Level Security protection. SELINUXTYPE=targeted
注:查看Selinux运行状态:getenforce,CLI界面非永久生效设置SeLinux:setenforce 0(0-1对应关闭和开启)cookie
3. 操做系统参数设置架构
vim /etc/sysctl.conf (注:4台服务器相同的配置)
kernel.shmmax = 500000000 kernel.shmmni = 4096 kernel.shmall = 4000000000 kernel.sem = 250 512000 100 2048 kernel.sysrq = 1 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 net.ipv4.tcp_syncookies = 1 net.ipv4.ip_forward = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.conf.all.arp_filter = 1 net.ipv4.conf.default.arp_filter = 1 net.core.netdev_max_backlog = 10000 vm.overcommit_memory = 2 kernel.msgmni = 2048 net.ipv4.ip_local_port_range = 1025 65535
vim /etc/security/limits.conf (注:4台服务器相同的配置)
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
磁盘预读参数及 deadline算法修改 (注:4台服务器相同的配置)
blockdev --setra 65536 /dev/sda echo deadline > /sys/block/sda/queue/scheduler
注:磁盘盘符sda需根据本身的实际状况进行配置
软件下载地址:https://network.pivotal.io/products/pivotal-gpdb,下载:greenplum-db-5.21.1-rhel7-x86_64.rpm
在Master主机上安装GP二进制文件,也就是主机名是mdw的服务器。(注:在master上安装便可,后面经过批量的方法安装剩下的服务器)
rpm -ivh greenplum-db-5.21.1-rhel7-x86_64.rpm
注:默认安装目录:/usr/local
在Master上添加gpadmin用户
adduser gpadmin echo gpadmin | passwd --stdin gpadmin
注:设置密码为了后面gpssh-exkeys -f hostfile_allhosts 使用
在Master上给gpadmin用户提权
[root@gp-master ~]# visudo gpadmin ALL=(ALL) ALL gpadmin ALL=(ALL) NOPASSWD:ALL
在Master主机上赋予gpadmin用户Greenplum文件夹的的权限
chown -R gpadmin.gpadmin /usr/local/greenplum-db*
准备用于批量安装软件以及后续集群的初始化文件,hostfile_allhosts,hostfile_segments,hostfile_mshosts,存放到/home/gpadmin
su - gpadmin
vim hostfile_allhosts
gp-master
gp-standby
gp-node1
gp-node2
vim hostfile_segments
gp-node1 gp-node2
vim hostfile_mshosts
gp-master gp-standby
设置各主机之间免密登陆
gpssh-exkeys -f hostfile_allhosts
注:需输入gpadmin用户的密码,此处为:gpadmin
设置用于安装Greenplum的文件夹权限
gpssh -f hostfile_allhosts => sudo chown gpadmin.gpadmin /usr/local => exit
建立及赋权master/standby主机元数据存储目录
gpssh -f hostfile_mshosts =>sudo mkdir /data/greenplum_data/gpmaster =>sudo chown -R gpadmin.gpadmin /data
=>exit
建立及赋权Segments主机数据存储目录
gpssh -f hostfile_segments =>sudo mkdir /data/greenplum_data/{primary,mirror}
=>sudo chown -R gpadmin.gpadmin /data
=>exit
批量安装软件(GP)
cd /home/gpadmin/ source /usr/local/greenplum-db/greenplum_path.sh gpseginstall -f hostfile_allhosts -u gpadmin -p gpadmin
设置NTP同步
Yum下载安装NTP服务器,已安装的能够略过
sudo yum install ntp -y
若出现以下报错,可看下一步解决方法
There was a problem importing one of the Python modules required to run yum. The error leading to this problem was: No module named yum Please install a package which provides this module, or verify that the module is installed correctly. It's possible that the above module doesn't match the current version of Python, which is: 2.7.13 (r266:84292, Jan 22 2014, 09:37:14) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] If you cannot solve this problem yourself, please go to the yum faq at: http://yum.baseurl.org/wiki/Faq
解决方法:
unset PYTHONHOME
unset PYTHONPATH
unset LD_LIBRARY_PATH
再进行yum安装以后,再修改回来,使得GP能正常使用
source /usr/local/greenplum-db/greenplum_path.sh
注:报错缘由:在安装GP集群以后,会在master节点中的环境变量中会增长 PYTHONHOME,PYTHONPATH,LD_LIBRARY_PATH几项,而且会修改本来的path。
补充:LD_LIBRARY_PATH 该环境变量主要用于指定查找共享库(动态连接库)时除了默认路径以外的其余路径。
在每一个Segment主机,编辑/etc/ntp.conf文件。设置第一个server参数指向Master主机,第二个server参数指向Standby主机。以下面:
sudo vim /etc/ntp.conf
server gp-master prefer
server gp-standby
在Standby主机,编辑/etc/ntp.conf文件。设置第一个server参数指向Master主机,第二个参数指向数据中心的时间服务器。
sudo vim /etc/ntp.conf
server gp-master prefer
在Master主机,使用NTP守护进程同步全部Segment主机的系统时钟。例如,使用gpssh来完成:
gpssh -f hostfile_allhosts -v -e 'ntpd'
输出以下表明成功:
[root@gp-master gpadmin]# gpssh -f all_hosts -v -e 'ntpd' [WARN] Reference default values as $MASTER_DATA_DIRECTORY/gpssh.conf could not be found Using delaybeforesend 0.05 and prompt_validation_timeout 1.0 [Reset ...] [INFO] login mdw [INFO] login smdw [INFO] login sdw1 [INFO] login sdw2 [ mdw] ntpd [smdw] ntpd [sdw1] ntpd [sdw2] ntpd [INFO] completed successfully [Cleanup...]
配置Greenplum初始化文件
cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpinitsystem_config chmod 775 gpinitsystem_config
相关配置以下:
[gpadmin@gp-master ~]$ cat gpinitsystem_config # FILE NAME: gpinitsystem_config # Configuration file needed by the gpinitsystem ################################################ #### REQUIRED PARAMETERS ################################################ #### Name of this Greenplum system enclosed in quotes. ARRAY_NAME="Greenplum Data Platform" #### Naming convention for utility-generated data directories. SEG_PREFIX=gpseg #### Base number by which primary segment port numbers #### are calculated. PORT_BASE=40000 #### File system location(s) where primary segment data directories #### will be created. The number of locations in the list dictate #### the number of primary segments that will get created per #### physical host (if multiple addresses for a host are listed in #### the hostfile, the number of segments will be spread evenly across #### the specified interface addresses). declare -a DATA_DIRECTORY=(/data/greenplum_data/primary) #### OS-configured hostname or IP address of the master host. MASTER_HOSTNAME=k8s-master #### File system location where the master data directory #### will be created. MASTER_DIRECTORY=/data/greenplum_data/gpmaster #### Port number for the master instance. MASTER_PORT=5432 #### Shell utility used to connect to remote hosts. TRUSTED_SHELL=ssh #### Maximum log file segments between automatic WAL checkpoints. CHECK_POINT_SEGMENTS=8 #### Default server-side character set encoding. ENCODING=UTF-8 ################################################ #### OPTIONAL MIRROR PARAMETERS ################################################ #### Base number by which mirror segment port numbers #### are calculated. MIRROR_PORT_BASE=43000 #### Base number by which primary file replication port #### numbers are calculated. REPLICATION_PORT_BASE=34000 #### Base number by which mirror file replication port #### numbers are calculated. MIRROR_REPLICATION_PORT_BASE=44000 #### File system location(s) where mirror segment data directories #### will be created. The number of mirror locations must equal the #### number of primary locations as specified in the #### DATA_DIRECTORY parameter. declare -a MIRROR_DATA_DIRECTORY=(/data/greenplum_data/mirror) ################################################ #### OTHER OPTIONAL PARAMETERS ################################################ #### Create a database of this name after initialization. DATABASE_NAME=testDB #### Specify the location of the host address file here instead of #### with the the -h option of gpinitsystem. MACHINE_LIST_FILE=/home/gpadmin/hostfile_segments
运行初始化工具初始化数据库
source /usr/local/greenplum-db/greenplum_path.sh gpinitsystem -c gpinitsystem_config
初始化日志:
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-Review options for gpinitstandby
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-------------------------------------------------------
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-The Master /data/master/gpseg-1/pg_hba.conf post gpinitsystem
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-has been configured to allow all hosts within this new
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-new array must be explicitly added to this file
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-located in the /usr/local/greenplum-db/./docs directory
20160827:16:23:11:002458 gpinitsystem:mdw:gpadmin-[INFO]:-------------------------------------------------------
如今只有1个master,2个segment,没有standby,那么接下来把standby加入集群。
在Master服务器上执行
gpinitstandby -s gp-standby
输出以下:
[gpadmin@mdw ~]$ gpinitstandby -s smdw 20160827:16:59:24:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Validating environment and parameters for standby initialization... 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Checking for filespace directory /data/master/gpseg-1 on smdw 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master initialization parameters 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master hostname = mdw 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master data directory = /data/master/gpseg-1
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum master port = 5432
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master hostname = smdw 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master port = 5432
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum standby master data directory = /data/master/gpseg-1
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Greenplum update system catalog = On 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:- Filespace locations 20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:------------------------------------------------------
20160827:16:59:25:023346 gpinitstandby:mdw:gpadmin-[INFO]:-pg_system -> /data/master/gpseg-1 Do you want to continue with standby master initialization? Yy|Nn (default=N): > y 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Syncing Greenplum Database extensions to standby 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-The packages on smdw are consistent. 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Adding standby master to catalog... 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Database catalog updated successfully. 20160827:16:59:31:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Updating pg_hba.conf file... 20160827:16:59:37:023346 gpinitstandby:mdw:gpadmin-[INFO]:-pg_hba.conf files updated successfully. 20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Updating filespace flat files... 20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Filespace flat file updated successfully. 20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Starting standby master 20160827:16:59:39:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Checking if standby master is running on host: smdw in directory: /data/master/gpseg-1
20160827:16:59:40:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Cleaning up pg_hba.conf backup files... 20160827:16:59:46:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Backup files of pg_hba.conf cleaned up successfully. 20160827:16:59:46:023346 gpinitstandby:mdw:gpadmin-[INFO]:-Successfully created standby master on gp-standby
查看启动进程:
[gpadmin@gp-master ~]$ ps -ef | grep postgres gpadmin 10975 1 0 00:57 ? 00:00:00 /usr/local/greenplum-db-5.21.1/bin/postgres -D /data/greenplum_data/gpmaster/gpseg-1 -p 5432 --gp_dbid=1 --gp_num_contents_in_cluster=2 --silent-mode=true -i -M master --gp_contentid=-1 -x 0 -E gpadmin 10976 10975 0 00:57 ? 00:00:00 postgres: 5432, master logger process gpadmin 10979 10975 0 00:57 ? 00:00:00 postgres: 5432, stats collector process gpadmin 10980 10975 0 00:57 ? 00:00:01 postgres: 5432, writer process gpadmin 10981 10975 0 00:57 ? 00:00:00 postgres: 5432, checkpointer process gpadmin 10982 10975 0 00:57 ? 00:00:00 postgres: 5432, seqserver process gpadmin 10983 10975 0 00:57 ? 00:00:00 postgres: 5432, ftsprobe process gpadmin 10984 10975 0 00:57 ? 00:00:00 postgres: 5432, sweeper process gpadmin 10985 10975 0 00:57 ? 00:00:05 postgres: 5432, stats sender process gpadmin 10986 10975 0 00:57 ? 00:00:01 postgres: 5432, wal writer process gpadmin 11279 10975 0 00:59 ? 00:00:00 postgres: 5432, wal sender process gpadmin 192.168.18.131(53573) streaming 0/C05A028 gpadmin 16800 16608 0 04:15 pts/0 00:00:00 grep --color=auto postgres
设置gpadmin用户环境变量,Master,Standby都需设置。
vim /home/gpadmin/.bashrc
[gpadmin@gp-master ~]$ cat .bashrc # .bashrc # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi # Uncomment the following line if you don't like systemctl's auto-paging feature: # export SYSTEMD_PAGER= # User specific aliases and functions source /usr/local/greenplum-db/greenplum_path.sh export MASTER_DATA_DIRECTORY=/data/greenplum_data/gpmaster/gpseg-1 export PGPRORT=5432 export PGDATABASE=testDB
[gpadmin@gp-master ~]$ scp .bashrc gp-standby:`pwd`
启动和中止数据库测试是否能正常启动和关闭,命令以下
gpstart
gpstop
到此 Greenplum 就部署完成了。下面进行一些简单的测试。
登陆数据库:psql -d postgres
建表,插入,查询
postgres=# create table student ( no int primary key,student_name varchar(40),age int);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "student_pkey" for table "student"
CREATE TABLE
postgres=# insert into student values(1,'yayun',18);
INSERT 0 1
postgres=# select * from student;
no | student_name | age
----+--------------+-----
1 | yayun | 18
(1 row)