环境:3台虚拟机 RHEL 7.3 + Oracle RAC 11.2.0.4
问题现象:RAC运行正常,ASM磁盘组Normal冗余,节点主机重启,offline状态的asm disk短期内能够直接online。html
在《测试一体机ASM failgroup的相关问题处理》以前的文章,描述了从新添加磁盘的场景,其实若是故障发现及时(默认3.6h内),是能够直接online对应磁盘的,这时候现象相似以下:ide
SQL> select group_number, disk_number, name, path, failgroup, mode_status, voting_file from v$asm_disk order by 1, 2; GROUP_NUMBER DISK_NUMBER NAME PATH FAILGROUP MODE_STATUS VO ------------ ----------- ------------------------------ ----------------------------------- ------------------------------ -------------- -- 0 0 /dev/CELL01-data2 ONLINE N 0 1 /dev/CELL01-data1 ONLINE N 0 2 /dev/CELL01-crs1 ONLINE Y 1 0 CRS_0000 CRS_0000 OFFLINE N 1 1 CRS_0001 /dev/CELL02-crs2 CRS_0001 ONLINE Y 1 2 CRS_0002 /dev/CELL03-crs3 CRS_0002 ONLINE Y 2 0 DATA_0000 /dev/CELL03-data1 CELL03 ONLINE N 2 1 DATA_0001 /dev/CELL03-data2 CELL03 ONLINE N 2 2 DATA_0002 /dev/CELL02-data1 CELL02 ONLINE N 2 3 DATA_0003 /dev/CELL02-data2 CELL02 ONLINE N 2 4 DATA_0004 CELL01 OFFLINE N 2 5 DATA_0005 CELL01 OFFLINE N 12 rows selected.
这种状况就能够直接online对应磁盘:测试
SQL> alter diskgroup CRS online disk CRS_0000; Diskgroup altered. SQL> alter diskgroup DATA online disk DATA_0004,DATA_0005; alter diskgroup DATA online disk DATA_0004,DATA_0005 * ERROR at line 1: ORA-15032: not all alterations performed ORA-15282: ASM disk "DATA_0005" is not visible cluster-wide ORA-15282: ASM disk "DATA_0004" is not visible cluster-wide
直接online对应磁盘若是遇到上面这个报错,由于有其余节点没有看到要online的磁盘,检查其余节点:code
[root@db02 ~]# ls -l /dev/CELL* lrwxrwxrwx 1 root root 3 Dec 2 21:16 /dev/CELL01-crs1 -> sdc lrwxrwxrwx 1 root root 3 Dec 2 21:16 /dev/CELL02-crs2 -> sdi lrwxrwxrwx 1 root root 3 Dec 2 21:16 /dev/CELL02-data1 -> sdj lrwxrwxrwx 1 root root 3 Dec 2 21:16 /dev/CELL02-data2 -> sdk lrwxrwxrwx 1 root root 3 Dec 2 21:16 /dev/CELL03-crs3 -> sdd lrwxrwxrwx 1 root root 3 Dec 2 21:16 /dev/CELL03-data1 -> sdf lrwxrwxrwx 1 root root 3 Dec 2 21:16 /dev/CELL03-data2 -> sdh
此时先要确认lsscsi底层正常(若是不正常先解决iscsi层的问题,我这里是正常的):orm
[root@db02 ~]# lsscsi [1:0:0:0] cd/dvd VBOX CD-ROM 1.0 /dev/sr0 [2:0:0:0] disk ATA VBOX HARDDISK 1.0 /dev/sda [3:0:0:0] disk ATA VBOX HARDDISK 1.0 /dev/sdb [4:0:0:0] disk LIO-ORG disk1 4.0 /dev/sdc [4:0:0:1] disk LIO-ORG disk2 4.0 /dev/sde [4:0:0:2] disk LIO-ORG disk3 4.0 /dev/sdg [5:0:0:0] disk LIO-ORG disk1 4.0 /dev/sdd [5:0:0:1] disk LIO-ORG disk2 4.0 /dev/sdf [5:0:0:2] disk LIO-ORG disk3 4.0 /dev/sdh [6:0:0:0] disk LIO-ORG disk1 4.0 /dev/sdi [6:0:0:1] disk LIO-ORG disk2 4.0 /dev/sdj [6:0:0:2] disk LIO-ORG disk3 4.0 /dev/sdk
从新刷新udev规则,确认全部磁盘识别正确:htm
[root@db02 ~]# udevadm control --reload [root@db02 ~]# udevadm trigger [root@db02 ~]# ls -l /dev/CELL* lrwxrwxrwx 1 root root 3 Dec 2 21:18 /dev/CELL01-crs1 -> sdc lrwxrwxrwx 1 root root 3 Dec 2 21:18 /dev/CELL01-data1 -> sde lrwxrwxrwx 1 root root 3 Dec 2 21:18 /dev/CELL01-data2 -> sdg lrwxrwxrwx 1 root root 3 Dec 2 21:18 /dev/CELL02-crs2 -> sdi lrwxrwxrwx 1 root root 3 Dec 2 21:18 /dev/CELL02-data1 -> sdj lrwxrwxrwx 1 root root 3 Dec 2 21:18 /dev/CELL02-data2 -> sdk lrwxrwxrwx 1 root root 3 Dec 2 21:18 /dev/CELL03-crs3 -> sdd lrwxrwxrwx 1 root root 3 Dec 2 21:18 /dev/CELL03-data1 -> sdf lrwxrwxrwx 1 root root 3 Dec 2 21:18 /dev/CELL03-data2 -> sdh
而后再次进行online磁盘成功:blog
SQL> alter diskgroup DATA online disk DATA_0004,DATA_0005; Diskgroup altered.
这种能够直接online disk的状况就免去了ASM磁盘组rebalance的大量时间,因此这类问题发现后最好要及时联系工程师进行处理。get