数据库 alert.log 日志中出现 "[Oracle][ODBC SQL Server Wire Protocol driver][SQL Server] 'RECOVER'"报错信息

时间 2019-12-11

标签数据库 alert.log alert log 日志出现 oracle odbc sql server wire protocol driver recover 报错信息栏目 SQL 繁體版

原文原文链接

现象描述：html

(1).数据库经过调用透明网络实现分布式事务，但透明网关停用后，失败的分布式事务并未清理。

(2).数据库 alert 日志

Thu Sep 06 06:53:00 2018

Errors in file /u01/app/oracle/diag/rdbms/zszdb/ZSZDB/trace/ZSZDB_reco_12245.trc:

ORA-01017: invalid username/password; logon denied

[Oracle][ODBC SQL Server Wire Protocol driver][SQL Server] 'RECOVER' ʧ {28000,NativeErr = 18456}

ORA-02063: preceding 2 lines from MSQL

(3).数据库 RECO 进程 trc 日志

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

With the Partitioning, Automatic Storage Management, OLAP, Data Mining

and Real Application Testing options

ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1

System name: Linux

Node name: zszdb

Release: 2.6.27.19-5-default

Version: #1 SMP 2009-02-28 04:40:21 +0100

Machine: x86_64

Instance name: ZSZDB

Redo thread mounted by this instance: 1

Oracle process number: 19

Unix process pid: 12245, image: oracle@zszdb (RECO)

*** 2018-09-06 06:06:59.158

*** SESSION ID:(325.1) 2018-09-06 06:06:59.158

*** CLIENT ID:() 2018-09-06 06:06:59.158

*** SERVICE NAME:(SYS$BACKGROUND) 2018-09-06 06:06:59.158

*** MODULE NAME:() 2018-09-06 06:06:59.158

*** ACTION NAME:() 2018-09-06 06:06:59.158

ERROR, tran=9.13.220456, session#=1, ose=0:

ORA-01017: invalid username/password; logon denied

[Oracle][ODBC SQL Server Wire Protocol driver][SQL Server] 'RECOVER' ʧ {28000,NativeErr = 18456}

ORA-02063: preceding 2 lines from MSQL

故障缘由：

Or a cle 数据库中，RECO 进程用于自动地解决分布式事务发生错误的状况。一个节点上的 RECO 进程将会自动链接至存在 in-doubt 分布式事务的数据库上。当 RECO 进程创建了数据库链接后，它将会自动解决处于 in-dount 状态的分布式事务，并将解决后的事务从 pending transation 表中删除。

（ In a distributed database , the RECO) automatically resolves failures in distributed transactions . The RECO process of a node automatically connects to other databases involved in an in-doubt distributed transaction. When RECO reestablishes a connection between the databases, it automatically resolves all in-doubt transactions, removing from each database's pending transaction table any rows that correspond to the resolved transactions.）

故障场景中，数据库经过调用透明网络实现分布式事务，但 透明网关停用后，失败的分布式事务并未清理。

例如：分布式事务于 PREPARE PHASE 阶段出现异常。

则在本地端查询 SQL> select local_tran_id,state from dba_2pc_pending; 可得相似以下结果：

LOCAL_TRAN_ID STATE

---------------------- ----------------

2.12.64845 collecting

在远端查询 SQL> select local_tran_id,state from dba_2pc_pending; 可得相似以下结果：

no rows selected

即表示本地数据库要求其余点作好 commit 或者 rollback 准备，如今正在“收集”其余点的数据库的返回信息，可是此时出现了错误，远端数据库未知状态（in doubt）。

而 RECO 进程不断自动处理没法解决的分布式事务，所以在数据库 alert.log 日志中不断报错。

故障处理步骤：

为防止 RECO 进程不断自动处理没法解决的分布式事务，须要将本地端的全局协调者（Global Coordinator）的 pending transation 清除掉。

依照上述例子的场景，分布式事务于 PREPARE PHASE 阶段出现异常。所以，须要以 SYS 用户登陆本地端数据库，执行以下清理命令。

SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY(' local_tran_id ');

其中， local_tran_id 为本地端的事务 ID。

参考文档：

(1).http://blog.sina.com.cn/s/blog_6cfadffb0100m48t.html