OGG ora-01403错误的快捷处理方案

OGG运维中有一个经典错误-1403。现象是目标端复制update或者delete操作导致复制进程abended,原因是update或delete时找不到目标数据。至于该数据为什么不在目标端有很多可能,比如人为删除、trigger没有禁用导致删除、级联外键删除没有禁用导致删除等等。通常我们的排查手段是确认目标端的trigger、级联外键删除、job是否启动了?如果启动了禁用它。然后再排查源端表是否有主键,主键在trandata中是否生效。上述排查都没有问题的话就开始做表级初始化吧,数据泵导出导入,同步变化…

但是有时候我们也可以不这么折腾,可以采取“补缺”的方式让复制进程迅速恢复。思路如下:

  1. 通过目标端ggserr日志和replcat.dsc文件来定位丢失的数据
  2. 在源端使用database link执行insert into 目标端 select * from 源表 where=(步骤一确认的条件)的方式来手工补缺。
  3. 启动复制进程,复制进程会重新操作abended之前失败的操作。

下面通过一个实验来演示上述过程

  1. source插入第一条测试数据
    Insert into FM_TAX_RATE_TEST (TEST_ID,COUNTRY, STATE, TAX_TYPE, TAX_RATE)
    Values (1,‘CN’, ‘68’, ‘WT3’, 0.0015);

commit;

  1. target确认同步
    select * from fm_tax_rate_test;

COUNTR STAT TAX_TY TAX_RATE TEST_ID


CN 68 WT3 .0015 1

  1. target删除复制记录,人为制造1403错误
    delete from fm_tax_rate_test where test_id=1;

commit;

  1. source对第一条测记录执行update操作会导致target复制进程中断。中断原因是update语句中的where字句定位的数据在target端不存在,因为我刚刚手工删除了这条记录。
    update FM_TAX_RATE_TEST set country=‘US’ where test_id=1;

commit;

此时target端已经中断,在source增加数据变化,期待target重启后会应用这些故障后产生的变化。
Insert into FM_TAX_RATE_TEST (TEST_ID,COUNTRY, STATE, TAX_TYPE, TAX_RATE)
Values (2,‘TW’, ‘68’, ‘WT3’, 0.0015);

Insert into FM_TAX_RATE_TEST (TEST_ID,COUNTRY, STATE, TAX_TYPE, TAX_RATE)
Values (3,‘JP’, ‘68’, ‘WT3’, 0.0015); 2

commit;

target复制进程中断
GGSCI (cdbsym3) 6> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
REPLICAT RUNNING REPSYM 00:00:00 00:00:02
REPLICAT ABENDED REPSYM_T 00:10:20 00:00:01

target端ggserr.log中错误信息片段
2015-03-31 13:50:26 WARNING OGG-01004 Oracle GoldenGate Delivery for Oracle, repsym_t.prm: Aborted grouped transaction on ‘OGG_TEST.FM_TAX_RATE_TEST’, Database error 1403 (OCI Error ORA-01403: no data found, SQL <UPDATE “OGG_TEST”.“FM_TAX_RATE_TEST” SET “COUNTRY” = :a1,“STATE” = :a2,“TAX_TYPE” = :a3,“TAX_RATE” = :a4,“TEST_ID” = :a5 WHERE “TEST_ID” = :b0>).
2015-03-31 13:50:26 WARNING OGG-01003 Oracle GoldenGate Delivery for Oracle, repsym_t.prm: Repositioning to rba 170249512 in seqno 12.
2015-03-31 13:50:26 WARNING OGG-01154 Oracle GoldenGate Delivery for Oracle, repsym_t.prm: SQL error 1403 mapping OGG_TEST.FM_TAX_RATE_TEST to OGG_TEST.FM_TAX_RATE_TEST OCI Error ORA-01403: no data found, SQL .
2015-03-31 13:50:26 WARNING OGG-01003 Oracle GoldenGate Delivery for Oracle, repsym_t.prm: Repositioning to rba 170249512 in seqno 12.
2015-03-31 13:50:26 ERROR OGG-01296 Oracle GoldenGate Delivery for Oracle, repsym_t.prm: Error mapping from OGG_TEST.FM_TAX_RATE_TEST to OGG_TEST.FM_TAX_RATE_TEST.
2015-03-31 13:50:26 ERROR OGG-01668 Oracle GoldenGate Delivery for Oracle, repsym_t.prm: PROCESS ABENDING.

target端discard文件中记录了test_id=1的数据执行udpate失败

more repsym_t.dsc
Oracle GoldenGate Delivery for Oracle process started, group REPSYM_T discard file opened: 2015-03-31 13:50:25

Current time: 2015-03-31 13:50:26
Discarded record from action ABEND on error 1403

OCI Error ORA-01403: no data found, SQL
RATE" = :a4,“TEST_ID” = :a5 WHERE “TEST_ID” = :b0>
Aborting transaction on ./dirdat/yt beginning at seqno 12 rba 170249512
error at seqno 12 rba 170249512
Problem replicating OGG_TEST.FM_TAX_RATE_TEST to OGG_TEST.FM_TAX_RATE_TEST
Record not found
Mapping problem with compressed key update record (target format)…
*
TEST_ID = 1
COUNTRY = US
STATE = 68
TAX_TYPE = WT3
TAX_RATE = .00150000
TEST_ID = 1

这时候很多运维人员最常用的就是按照csn一致性导出source表,重新初始化target端数据不一致的表。在使用下面的方式来修改复制进程参数文件,重启复制进程追进度。
map schema.table, target schema.table, filter (@GETENV (“TRANSACTION”, “CSN”) > 9527);
如果同步的表比较大,这个过程会很漫长。

如果只是缺少那么几条数据,别人被认为误删除了造成的,也需要这么大动干戈处理么?其实可以用个简单的方法来处理,在源库创建一个database link,将target端缺少的数据手工insert过去补全这个漏洞,然后启动复制进程。复制进程会再次尝试失败的update语句,where字句锁定刚才手工插入的数据,修改成功。 复制进程继续应用source端数据变化。

  1. 源端创建database link。其中SERVICE_NAME = data为target数据库的SID
    5-1 在tnsnames.ora中添加target端数据库的字符串
    to19 =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 10.78.2.19)(PORT = 1553))
    )
    (CONNECT_DATA =
    (SERVICE_NAME = data)
    )
    )

5-2 创建database link指向target数据库; 其中ogg_test为target数据库的schema。
create public databbase link to19 connect to ogg_test identified by ogg_test;

5-3 通过database link手工同步丢失语句。其中select语句是源表的数据,insert into是目标数据库。
insert into [email protected] select * from ogg_test.fm_tax_rate_test where test_id=1;

  1. target启动复制进程
    GGSCI (cdbsym3) 4> start repsym

Sending START request to MANAGER …
REPLICAT REPSYM starting

GGSCI (cdbsym3) 5> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
REPLICAT RUNNING REPSYM 00:00:00 00:00:00
REPLICAT RUNNING REPSYM_T 00:00:00 00:00:01

数据变化已经被应用到复制端了
GGSCI (cdbsym3) 8> stats repsym total table dbp.rb_restraints

Sending STATS request to REPLICAT REPSYM …

Start of Statistics at 2015-03-31 11:09:14.

Replicating from SYMBOLS.RB_RESTRAINTS to DBP.RB_RESTRAINTS:

*** Total statistics since 2015-03-31 11:08:13 ***
Total inserts 1.00
Total updates 4.00
Total deletes 0.00
Total discards 0.00
Total operations 5.00

End of Statistics.

  1. 在数据库中查看复制进程启动后的数据变化
    [email protected]> select * from ogg_test.fm_tax_rate_test;

COUNTR STAT TAX_TY TAX_RATE TEST_ID


US 68 WT3 .0015 1
TW 68 WT3 .0015 2
JP 68 WT3 .0015 3

其中第一条数据就是我们通过手工同步的数据,后面两条数据是故障之后的数据变化。

注意:如果手工同步之前源表的数据也执行delete操作就无法通过isnert into select 的方式获取并同步到target端了。祈祷吧

该方案在特定条件下还是很好用的,节省了很多时间。祝同学们好运

还有一个办法

复制进程配置文件增加
HANDLECOLLISIONS
在这里插入图片描述