在RAC环境下配置OGG,要想实现RAC节点故障时,OGG能自动的failover到正常节点,要保证两点:html
1. OGG的checkpoint,trail,BR文件放置在共享的集群文件系统上,RAC各节点都能访问到node
2. 须要有集群软件的来监测OGG进程,以及发生故障时,自动在正常节点重启OGG(failover)数据库
Oracle Grid Infrastructure Standalone Agents (XAG)搭配Oracle支持的集群文件系统,能够实现OGG的自动failover,本文介绍相关的配置步骤。oracle
要想使用XAG实现自动failover,相关软件的版本必须知足要求:app
至于集群文件系统,Oracle官方文档给出的建议是ACFS,DBFS和OCFS,我以为其余集群文件系统,好比Veritas 的集群文件系统应该也能够。工具
本文示例使用的是ACFS。测试
源端数据库:11.2.0.4 RAC (ASM)操作系统
目标端数据库:12.1.0.2 RAC(ASM).net
GoldenGate : 12.2.0.1.1命令行
操做系统:源端和目标端都是Oracle Enterprise Linux 6.5 (64bit)
XAG须要单独去Oracle官网下载安装 ,下载位置是:http://www.oracle.com/technetwork/database/database-technologies/clusterware/downloads/index.html
目前的版本是7,文件是xagpack_7b.zip
解压缩文件,而后用GI的安装用户(通常是“grid”),执行xagsetup.sh进行安装:
[grid@rac1 xag]$ ./xagsetup.sh --install --directory /u01/app/grid/xaghome --all_nodes
Installing Oracle Grid Infrastructure Agents on: rac1
Installing Oracle Grid Infrastructure Agents on: rac2
Done.
在目标端也安装XAG,方法和源端相同。
11.2.0.4 在OEL上若是想用ACFS,必须安装PSU补丁到11.2.0.4.4以上。补丁过程略过。
使用ACFS的磁盘组的属性值COMPATIBLE.ASM和COMPATIBLE.ADVM必须设置为11.2 :
使用ASMCMD或ASMCA建立ACFS卷:
建立通用ACFS
此时ACFS还不是CRS管理的,可使用ASMCMD的volinfo命令或/sbin/acfsutil registry查看ACFS信息
ASMCMD> volinfo -a
Diskgroup Name: DATA
Volume Name: VOLOGG1
Volume Device: /dev/asm/vologg1-426
State: ENABLED
Size (MB): 3072
Resize Unit (MB): 32
Redundancy: UNPROT
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /u01/app/grid/acfsmounts/data_vol1
[root@rac1 ~]# /sbin/acfsutil registry
Mount Object:
Device: /dev/asm/vologg1-426
Mount Point: /u01/app/grid/acfsmounts/data_vol1
Disk Group: DATA
Volume: VOLOGG1
Options: none
Nodes: all
首先从通用ACFS的注册信息中删除咱们刚才建立的ACFS的条目
[root@rac1 ~]# /sbin/acfsutil registry -d /u01/app/grid/acfsmounts/data_vol1
acfsutil registry: successfully removed ACFS mount point /u01/app/grid/acfsmounts/data_vol1 from Oracle Registry
而后,用SRVCTL工具进行CRS资源注册:
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl add filesystem -d /dev/asm/vologg1-426 -v VOLOGG1 -g DATA -m /u01/app/grid/acfsmounts/data_vol1 -u grid
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.data.vologg1.acfs
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
手工启动资源,(mount ACFS)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl start filesystem -d /dev/asm/vologg1-426
[root@rac1 ~]#
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.data.vologg1.acfs
ONLINE ONLINE rac1 mounted on /u01/app /grid/acfsmounts/dat a_vol1
ONLINE ONLINE rac2 mounted on /u01/app/grid/acfsmounts/dat a_vol1
[root@rac1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_rac1-lv_root 45G 32G 12G 74% /
tmpfs 2.0G 437M 1.6G 23% /dev/shm
/dev/sda1 477M 55M 397M 13% /boot
/dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1
[root@rac2 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_rac1-lv_root 45G 25G 19G 58% /
tmpfs 2.0G 440M 1.6G 23% /dev/shm
/dev/sda1 477M 55M 397M 13% /boot
/dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1
12c建立ACFS和11g的主要区别是,没有了通用和数据库home用文件系统的选项,建立后会生成注册文件系统到CRS的脚本。
运行系统生成的脚本,完成注册及挂载:
[root@oel65vm11 scripts]# ./acfs_script.sh
ACFS file system /u01/app/grid/acfsmounts/ogg_vol1 is mounted on nodes oel65vm11,oel65vm12
查看资源信息:
[root@oel65vm11 bin]# ./crsctl status resource -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.VOLOGG2.advm
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.DATA.dg
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.asm
ONLINE ONLINE oel65vm11 Started,STABLE
ONLINE ONLINE oel65vm12 Started,STABLE
ora.data.vologg2.acfs
ONLINE ONLINE oel65vm11 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE
ONLINE ONLINE oel65vm12 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE
ora.net1.network
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.ons
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
注意,全部节点必须关掉SELINUX,不然会出现ACFS无权写入的错误。
这个版本的ogg同时支持11g和12c的数据库,在图形界面安装时,用户能够选择对应不一样数据库版本的ogg
将OGG安装在前面建立的ACFS上:
源端的安装位置:/u01/app/grid/acfsmounts/data_vol1/ogg12
目标端的安装位置:/u01/app/grid/acfsmounts/ogg_vol1/ogg12
选择自动启动Manager进程。
l 变动源端数据库为归档模式,过程省略。
l 源端数据库增长相关日志及修改参数:
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
Database altered.
SQL> ALTER DATABASE FORCE LOGGING;
Database altered.
SQL> SELECT supplemental_log_data_min, force_logging FROM v$database;
SUPPLEME FORCE_LOGGING
-------- ---------------------------------------
YES YES
SQL> ALTER SYSTEM SWITCH LOGFILE;
System altered.
SQL> alter system set ENABLE_GOLDENGATE_REPLICATION=true;
System altered.
l 在源端和目标端建立OGG数据库用户及受权,个人例子里建立的用户是GGADM。
OGG用户须要的权限能够参阅联机文档《Installing and Configuring Oracle GoldenGate for Oracle Database 12c (12.2.0.1)》中的章节 4.1.4.1 Oracle 11.2.0.4 or Later Database Privileges,咱们这个测试为了方便,授予用户DBA角色,以及使用特定系统包的受权:
SQL> BEGIN
dbms_goldengate_auth.grant_admin_privilege
2 3 (
grantee => 'GGADM',
privilege_type => 'CAPTURE',
grant_select_privileges => TRUE
);
END;
/ 4 5 6 7 8 9
PL/SQL procedure successfully completed.
l 登陆数据库:
GGSCI (rac1.hthorizontest.com) 1> dblogin userid ggadm password ggadm
Successfully logged into database.
l 注册集成式抽取
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 2> register extract ext1 database;
2016-04-07 23:44:38 INFO OGG-02003 Extract EXT1 successfully registered with database at SCN 1291634.
l 增长抽取进程
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 3> ADD EXTRACT ext1 INTEGRATED TRANLOG, BEGIN NOW
EXTRACT (Integrated) added.
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 4> ADD EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et, EXTRACT ext1
EXTTRAIL added.
l 增长传送进程
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 5> ADD EXTRACT pump1 EXTTRAILSOURCE /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et
EXTRACT added.
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 6>EDIT PARAMS EXT1
加入下面内容:
EXTRACT ext1
USERID ggadm, PASSWORD ggadm
TRANLOGOPTIONS INTEGRATED PARAMS (MAX_SGA_SIZE 100)
EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et
TABLE test.*;
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 7>EDIT PARAMS PUMP1
加入下面内容:
EXTRACT pump1
USERID ggadm, PASSWORD ggadm
RMTHOST 192.168.0.11, MGRPORT 7809
RMTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt
TABLE TEST.*;
而后启动全部进程。
在11.2.0.4版本,若是实现集成的capture模式,在启动抽取进程时,会提示须要安装补丁17030189,主要是由于使用集成的capture,须要修改数据字典表。
可是在安装了PSU后,有时会致使这个补丁和其余补丁冲突,也能够手工执行prvtlmpg.plb来解决问题。
(EXTRACT Abending With OGG-02912 (Doc ID 2091679.1))
GGSCI (oel65vm11.hthorizon.com) 8> dblogin userid ggadm password ggadm
Successfully logged into database.
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 9>ADD CHECKPOINTTABLE ggadm.checkpointtab
Successfully created checkpoint table ggadm.checkpointtab
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 10> ADD REPLICAT rep1, EXTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt checkpointtable ggadm.checkpointtab
REPLICAT added.
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 11>EDIT PARAMS REP1
加入下面内容:
REPLICAT rep1
USERID ggadm, PASSWORD ggadm
ASSUMETARGETDEFS
DISCARDFILE /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt, PURGE
MAP TEST.* TARGET TEST.*;
而后启动进程,测试OGG数据复制是否正常
为了让OGG的Manager进程可以自动启动复制进程,须要将下列配置加进Manager的配置文件:
AUTORESTART ER *, RETRIES 5, WAITMINUTES 1, RESETMINUTES 60
AUTOSTART ER *
重启Manager进程使之生效。
源端和目标端都要修改。
l 添加APP VIP(以root身份)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.36 -vipname=xag.gg_1-vip.vip -user=oracle
l 容许grid用户启动资源(以root身份)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x
l 启动VIP(以grid身份)
[root@rac1 ~]# su - grid
[grid@rac1 ~]$ /u01/app/11.2.0/grid/bin/crsctl start resource xag.gg_1-vip.vip
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'rac1'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'rac1' succeeded
l 查看状态
[grid@rac1 ~]$ crsctl status resource xag.gg_1-vip.vip
NAME=xag.gg_1-vip.vip
TYPE=app.appvip_net1.type
TARGET=ONLINE
STATE=ONLINE on rac1
l 建立OGG对应的CRS资源(以root身份)
[root@rac1 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_1 --gg_home /u01/app/grid/acfsmounts/data_vol1/ogg12 --instance_type source --nodes rac1,rac2 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg1.acfs --databases ora.tdb.db --oracle_home /u01/app/oracle/product/11.2.0/dbhome_1 --monitor_extracts ext1,pump1
[root@rac1 ~]# cd /u01/app/grid/xaghome/bin
[root@rac1 bin]# ./agctl status goldengate gg_1
Goldengate instance 'gg_1' is not running
l 受权grid启动资源
上面的命令执行完毕,会自动建立一个对应ogg的CRS资源,须要受权grid有权管理它:
[root@oel65vm11 bin]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1.goldengate -u user:grid:r-x
过程和源端相似,
l 建立VIP资源:
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.26 -vipname=xag.gg_1-vip.vip -user=oracle
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl start resource xag.gg_1-vip.vip
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm12'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl relocate resource xag.gg_1-vip.vip -n oel65vm11
CRS-2673: Attempting to stop 'xag.gg_1-vip.vip' on 'oel65vm12'
CRS-2677: Stop of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm11'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm11' succeeded
l 建立ogg 对应的CRS资源
[root@oel65vm11 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_2 --gg_home /u01/app/grid/acfsmounts/ogg_vol1/ogg12 --instance_type target --nodes oel65vm11,oel65vm12 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg2.acfs --databases ora.racdb.db --oracle_home /u01/app/oracle/product/12.1.0/dbhome_1 --monitor_replicats rep1
l 受权
[root@oel65vm11 bin]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_2.goldengate -u user:grid:r-x
将PUMP进程对应的源端地址修改成咱们刚才建立的VIP
RMTHOST 192.168.0.26, MGRPORT 7809
重启PUMP进程
进入ggsci命令行,将源端和目标段进程都停掉
l 启动目标端资源
[grid@oel65vm11 ~]$ cd $ORACLE_BASE
[grid@oel65vm11 grid]$ cd xaghome/bin
[grid@oel65vm11 bin]$ ./agctl start goldengate gg_2 --node oel65vm11
[grid@oel65vm11 bin]$ crsctl status resource xag.gg_2.goldengate
NAME=xag.gg_2.goldengate
TYPE=xag.goldengate.type
TARGET=ONLINE
STATE=ONLINE on oel65vm11
l 启动源端资源
[grid@rac1 bin]$ cd $ORACLE_BASE
[grid@rac1 grid]$ cd xaghome/bin
[grid@rac1 bin]$ ./agctl start goldengate gg_1 --node rac1
[grid@rac1 bin]$ crsctl status resource xag.gg_1.goldengate
NAME=xag.gg_1.goldengate
TYPE=xag.goldengate.type
TARGET=ONLINE
STATE=ONLINE on rac1
启动后,进入GGSCI命令行,查看进程状态,若是进程都自动启动了,说明配置没有问题。
使用命令测试源端切换:
[grid@rac1 bin]$ ./agctl relocate goldengate gg_1 --node rac2
[grid@rac1 bin]$ crsctl status resource –t
。。。。。。
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
。。。。。。
xag.gg_1-vip.vip
1 ONLINE ONLINE rac2
xag.gg_1.goldengate
1 ONLINE ONLINE rac2
再作一个切断电源的测试,咱们以“关掉电源”的方式关闭目标端的主机oel65vm11
在主机oel65vm12上,能够看到RAC的vip failover到了本节点,ogg的vip和gg_2对应的资源也自动failover到了本节点:
[grid@oel65vm12 ~]$ crsctl status resource -t
。。。。。。
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
。。。。。。
ora.oel65vm11.vip
1 ONLINE INTERMEDIATE oel65vm12 FAILED OVER,STABLE
ora.oel65vm12.vip
1 ONLINE ONLINE oel65vm12 STABLE
ora.racdb.db
1 ONLINE OFFLINE STABLE
2 ONLINE ONLINE oel65vm12 Open,STABLE
ora.scan1.vip
1 ONLINE ONLINE oel65vm12 STABLE
xag.gg_1-vip.vip
1 ONLINE ONLINE oel65vm12 STABLE
xag.gg_2.goldengate
1 ONLINE ONLINE oel65vm12 STABLE
上面只是一个最简单的例子,没有考虑各类复杂的状况,例如,同时部署有监控jagent,或者downstream复制等等,因此现实的生产环境每每比这个例子复杂得多。