一个客户的RAC环境出现了故障,一个节点操作系统崩溃,重装系统后,CLUSTER添加成功,但是ASM实例添加报错。

当时通过电话简单了解了一下情况:Oracle 10204 RAC for Linux X86-64的环境,前一段时间操作系统出现了故障,导致一个节点重装。重装过程到了添加ASM实例的时候出现了错误。

一般来说,重装过程最复杂的是CLUSTER的清除和添加,如果CLUSTER已经添加成功,那么剩下的就是数据库级的操作,相对来说会简单得多。根据上面的信息初步判断,问题可能发生在几点,新节点的ORACLE_HOME的版本和旧节点不一致,或者10204补丁没有在CLUSTER上安装,或者是在新节点上没有给oracle用户授权,还有可能就是共享存储在新节点上挂载存在问题。

到了现场后才发现,问题和我的判断大相径庭,数据库的版本,权限等都不存在问题,事实上,新节点上的ASM实例已经启动,且两个磁盘组中的一个已经MOUNT,但另外一个磁盘组无法MOUNT

SQL> alter diskgroup datag1 mount;
alter diskgroup datag1 mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATAG1"

ORA-15032只是一个描述性错误,而关键的ORA-15063错误比较少见。

由于另外一个节点从RAC环境出现故障后一直没有停机,因此可以从这个节点的ASM实例上检查磁盘组和磁盘的状态:

SQL> select group_number, name, state, total_mb, free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME STATE TOTAL_MB FREE_MB
------------ ------------------------------------ ---------- ----------
1 DATAG1 MOUNTED 512000 217396
2 DATAG2 MOUNTED 1024000 347457

SQL> select group_number, disk_number, mount_status, header_status, name, path from v$asm_disk where group_number = 1;

GROUP_NUMBER DISK_NUMBER MOUNT_STATUS HEADER_STATUS NAME PATH
------------ ----------- ------------ -------------- ----------- -------------
1 0 CACHED PROVISIONED DATAG1_0000 /dev/raw/raw4
1 1 CACHED MEMBER DATAG1_0001 /dev/raw/raw5

SQL> select instance_number, instance_name from v$instance;

INSTANCE_NUMBER INSTANCE_NAME
--------------- --------------------------------
2 +ASM2

可以看到,磁盘组DATAG1中的磁盘DATAG1_0000的文件头状态不正确。从新添加的实例1上检查ASM磁盘信息:

SQL> select group_number, name, state, total_mb, free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME STATE TOTAL_MB FREE_MB
------------ ------------------------------------ ---------- ----------
1 DATAG1 DISMOUNTED 0 0
2 DATAG2 MOUNTED 1024000 347457

SQL> select group_number, disk_number, mount_status, header_status, name, path from v$asm_disk;

GROUP_NUMBER DISK_NUMBER MOUNT_STATUS HEADER_STATUS NAME PATH
------------ ----------- ------------ -------------- ----------- --------------
0 0 CLOSED FOREIGN /dev/raw/raw1
0 1 CLOSED FOREIGN /dev/raw/raw2
0 2 CLOSED FOREIGN /dev/raw/raw8
0 3 CLOSED FOREIGN /dev/raw/raw10
0 4 CLOSED FOREIGN /dev/raw/raw9
0 8 CLOSED PROVISIONED /dev/raw/raw4
0 5 CLOSED MEMBER /dev/raw/raw5
2 1 CACHED MEMBER DATAG2_0001 /dev/raw/raw7
2 0 CACHED MEMBER DATAG2_0000 /dev/raw/raw6

SQL> select instance_number, instance_name from v$instance;

INSTANCE_NUMBER INSTANCE_NAME
--------------- --------------------------------
1 +ASM1

由于节点1上的ASM磁盘组1无法MOUNT,因此可以看到/dev/raw/raw4/dev/raw/raw5两个磁盘对应的GROUP_NUMBER0,且MOUNT_STATUSCLOSED,而raw4对应的HEADER_STATUS也是PROVISIONED,显然就是这个错误的状态值导致当前节点无法MOUNT磁盘组。

由于另外一个实例一直没有关闭过,因此虽然磁盘头的状态不正常,但是并不影响这个磁盘组的正常使用,而如果这时对ASM实例进行重启,则这个磁盘组必然无法再次MOUNT

根据Oracle的文档的描述,PROVISIONEDCANDIDATE状态有所区别,二者虽然表示的都是这个磁盘为候选磁盘,不属于任何一个磁盘组,不过CANDIDATE是正常的候选状态,而PROVISIONED则说明磁盘经过特殊的处理,比如操作系统工具对磁盘进行过特殊的操作。

既然这个磁盘的状态是PROVISIONED,那么能否通过再次添加这个磁盘到磁盘组,使得磁盘头的状态变成正常。于是尝试在问题节点再次添加这个磁盘,但是由于磁盘组处于NOMOUNT状态,导致添加操作失败,而尝试在目前正常工作的节点添加磁盘,结果同样报错:

SQL> alter diskgroup datag1 add disk '/dev/raw/raw4';
alter diskgroup datag1 add disk '/dev/raw/raw4'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15029: disk '/dev/raw/raw4' is already mounted by this instance

看来常规手段已经无法解决这个问题,那么现在只剩两个办法,一是重建ASM磁盘,二是直接修改ASM磁盘头信息。

重建意味着较长的停机时间,既是客户当前环境中配置了DATA GUARD,重建ASM磁盘组也不是一个轻松的事情。当前的PRIMARY数据库和STANDBY数据库都是两节点的RAC环境,而在数据库中还配置STREAMCAPTURE,这本身使得数据库架构异常复杂,进行SWITCHOVER的难度相对也比较大。而且对于当前运行的节点而言,一旦关闭,很有可能导致ASM磁盘无法MOUNT,这就会导致原本计划中的SWITCHOVER变成FAILOVER,甚至有可能损失ONLINE REDO LOGFILE的风险,因此无论从哪个角度看,重建都不是一个最佳的解决方案,显然如果能直接将ASM磁盘的头信息改写正确,这无疑是最直接代价最小的解决方案。

想要修改ASM磁盘头信息,要借助Oracle的工具kfed,而默认这个工具是没有编译的,进入$ORACLE_HOME/lib目录,对kfed工具信息编译:

[oracle@node1 lib]$ make -f ins_rdbms.mk ikfed

Linking KFED utility (kfed)
rm -f /u01/app/oracle/product/10.2.0/db_1/rdbms/lib/kfed
gcc -o /u01/app/oracle/product/10.2.0/db_1/rdbms/lib/kfed -L/u01/app/oracle/product/10.2.0/db_1/rdbms/lib/ -L/u01/app/oracle/product/10.2.0/db_1/lib/ -L/u01/app/oracle/product/10.2.0/db_1/lib/stubs/ /u01/app/oracle/product/10.2.0/db_1/lib/s0main.o /u01/app/oracle/product/10.2.0/db_1/rdbms/lib/sskfeded.o /u01/app/oracle/product/10.2.0/db_1/rdbms/lib/skfedpt.o /u01/app/oracle/product/10.2.0/db_1/rdbms/lib/defopt.o -ldbtools10 -lclntsh `cat /u01/app/oracle/product/10.2.0/db_1/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/app/oracle/product/10.2.0/db_1/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lclient10 -lnnetd10 -lvsn10 -lcommon10 -lgeneric10 -lmm -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /u01/app/oracle/product/10.2.0/db_1/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/app/oracle/product/10.2.0/db_1/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lclient10 -lnnetd10 -lvsn10 -lcommon10 -lgeneric10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lclient10 -lnnetd10 -lvsn10 -lcommon10 -lgeneric10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /u01/app/oracle/product/10.2.0/db_1/lib/sysliblist` -Wl,-rpath,/u01/app/oracle/product/10.2.0/db_1/lib -lm `cat /u01/app/oracle/product/10.2.0/db_1/lib/sysliblist` -ldl -lm -L/u01/app/oracle/product/10.2.0/db_1/lib
mv -f /u01/app/oracle/product/10.2.0/db_1/bin/kfed /u01/app/oracle/product/10.2.0/db_1/bin/kfedO
mv: cannot stat `/u01/app/oracle/product/10.2.0/db_1/bin/kfed': No such file or directory
make: [ikfed] Error 1 (ignored)
mv /u01/app/oracle/product/10.2.0/db_1/rdbms/lib/kfed /u01/app/oracle/product/10.2.0/db_1/bin/kfed
chmod 751 /u01/app/oracle/product/10.2.0/db_1/bin/kfed

下面就可以用kfed来检查ASM磁盘裸设备的文件头信息了:

[oracle@node1 data]$ kfed read /dev/raw/raw4 > raw4.txt
[oracle@node1 data]$ kfed read /dev/raw/raw5 > raw5.txt
[oracle@node1 data]$ kfed read /dev/raw/raw6 > raw6.txt

将磁盘组DATAG1的两个磁盘头以及另一个磁盘组DATAG2的第一个文件的头信息输出到文本文件,将三个文件进行对比,找到raw4文件中异常的部分。

由于每个磁盘组中有两个文件,且有两个磁盘组,使得比对的工作非常顺利,定位到raw4中存在两个标志位异常:

[oracle@nccpxdb1 lib]$ kfed read /dev/raw/raw4
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check: 4024565597 ; 0x00c: 0xefe1ff5d
kfbh.fcn.base: 952047 ; 0x010: 0x000e86ef
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 0 ; 0x024: 0x0000
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DATAG1_0000 ; 0x028: length=11
kfdhdb.grpname: DATAG1 ; 0x048: length=6
kfdhdb.fgname: DATAG1_0000 ; 0x068: length=11
kfdhdb.capname: ; 0x088: length=0
kfdhdb.crestmp.hi: 32928501 ; 0x0a8: HOUR=0x15 DAYS=0x17 MNTH=0xc YEAR=0x7d9
kfdhdb.crestmp.lo: 2195144704 ; 0x0ac: USEC=0x0 MSEC=0x1d0 SECS=0x2d MINS=0x20
kfdhdb.mntstmp.hi: 32940275 ; 0x0b0: HOUR=0x13 DAYS=0x7 MNTH=0x8 YEAR=0x7da
kfdhdb.mntstmp.lo: 3201116160 ; 0x0b4: USEC=0x0 MSEC=0x34a SECS=0x2c MINS=0x2f
kfdhdb.secsize: 512 ; 0x0b8: 0x0200
kfdhdb.blksize: 4096 ; 0x0ba: 0x1000
kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact: 113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize: 512000 ; 0x0c4: 0x0007d000
kfdhdb.pmcnt: 6 ; 0x0c8: 0x00000006
kfdhdb.fstlocn: 1 ; 0x0cc: 0x00000001
kfdhdb.altlocn: 2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002
kfdhdb.redomirrors[0]: 0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]: 65535 ; 0x0da: 0xffff
kfdhdb.redomirrors[2]: 65535 ; 0x0dc: 0xffff
kfdhdb.redomirrors[3]: 65535 ; 0x0de: 0xffff
kfdhdb.dbcompat: 168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi: 32928501 ; 0x0e4: HOUR=0x15 DAYS=0x17 MNTH=0xc YEAR=0x7d9
kfdhdb.grpstmp.lo: 2195053568 ; 0x0e8: USEC=0x0 MSEC=0x177 SECS=0x2d MINS=0x20
kfdhdb.ub4spare[0]: 0 ; 0x0ec: 0x00000000
kfdhdb.ub4spare[1]: 0 ; 0x0f0: 0x00000000
kfdhdb.ub4spare[2]: 0 ; 0x0f4: 0x00000000
kfdhdb.ub4spare[3]: 0 ; 0x0f8: 0x00000000
kfdhdb.ub4spare[4]: 0 ; 0x0fc: 0x00000000
kfdhdb.ub4spare[5]: 0 ; 0x100: 0x00000000
kfdhdb.ub4spare[6]: 0 ; 0x104: 0x00000000
kfdhdb.ub4spare[7]: 0 ; 0x108: 0x00000000
kfdhdb.ub4spare[8]: 0 ; 0x10c: 0x00000000
kfdhdb.ub4spare[9]: 0 ; 0x110: 0x00000000
kfdhdb.ub4spare[10]: 0 ; 0x114: 0x00000000
kfdhdb.ub4spare[11]: 0 ; 0x118: 0x00000000
kfdhdb.ub4spare[12]: 0 ; 0x11c: 0x00000000
kfdhdb.ub4spare[13]: 0 ; 0x120: 0x00000000
kfdhdb.ub4spare[14]: 0 ; 0x124: 0x00000000
kfdhdb.ub4spare[15]: 0 ; 0x128: 0x00000000
kfdhdb.ub4spare[16]: 0 ; 0x12c: 0x00000000
kfdhdb.ub4spare[17]: 0 ; 0x130: 0x00000000
kfdhdb.ub4spare[18]: 0 ; 0x134: 0x00000000
kfdhdb.ub4spare[19]: 0 ; 0x138: 0x00000000
kfdhdb.ub4spare[20]: 0 ; 0x13c: 0x00000000
kfdhdb.ub4spare[21]: 0 ; 0x140: 0x00000000
kfdhdb.ub4spare[22]: 0 ; 0x144: 0x00000000
kfdhdb.ub4spare[23]: 0 ; 0x148: 0x00000000
kfdhdb.ub4spare[24]: 0 ; 0x14c: 0x00000000
kfdhdb.ub4spare[25]: 0 ; 0x150: 0x00000000
kfdhdb.ub4spare[26]: 0 ; 0x154: 0x00000000
kfdhdb.ub4spare[27]: 0 ; 0x158: 0x00000000
kfdhdb.ub4spare[28]: 0 ; 0x15c: 0x00000000
kfdhdb.ub4spare[29]: 0 ; 0x160: 0x00000000
kfdhdb.ub4spare[30]: 0 ; 0x164: 0x00000000
kfdhdb.ub4spare[31]: 0 ; 0x168: 0x00000000
kfdhdb.ub4spare[32]: 0 ; 0x16c: 0x00000000
kfdhdb.ub4spare[33]: 0 ; 0x170: 0x00000000
kfdhdb.ub4spare[34]: 0 ; 0x174: 0x00000000
kfdhdb.ub4spare[35]: 0 ; 0x178: 0x00000000
kfdhdb.ub4spare[36]: 0 ; 0x17c: 0x00000000
kfdhdb.ub4spare[37]: 0 ; 0x180: 0x00000000
kfdhdb.ub4spare[38]: 0 ; 0x184: 0x00000000
kfdhdb.ub4spare[39]: 0 ; 0x188: 0x00000000
kfdhdb.ub4spare[40]: 0 ; 0x18c: 0x00000000
kfdhdb.ub4spare[41]: 0 ; 0x190: 0x00000000
kfdhdb.ub4spare[42]: 0 ; 0x194: 0x00000000
kfdhdb.ub4spare[43]: 104436 ; 0x198: 0x000197f4
kfdhdb.ub4spare[44]: 0 ; 0x19c: 0x00000000
kfdhdb.ub4spare[45]: 0 ; 0x1a0: 0x00000000
kfdhdb.ub4spare[46]: 0 ; 0x1a4: 0x00000000
kfdhdb.ub4spare[47]: 0 ; 0x1a8: 0x00000000
kfdhdb.ub4spare[48]: 0 ; 0x1ac: 0x00000000
kfdhdb.ub4spare[49]: 0 ; 0x1b0: 0x00000000
kfdhdb.ub4spare[50]: 0 ; 0x1b4: 0x00000000
kfdhdb.ub4spare[51]: 0 ; 0x1b8: 0x00000000
kfdhdb.ub4spare[52]: 0 ; 0x1bc: 0x00000000
kfdhdb.ub4spare[53]: 0 ; 0x1c0: 0x00000000
kfdhdb.ub4spare[54]: 0 ; 0x1c4: 0x00000000
kfdhdb.ub4spare[55]: 0 ; 0x1c8: 0x00000000
kfdhdb.ub4spare[56]: 0 ; 0x1cc: 0x00000000
kfdhdb.ub4spare[57]: 0 ; 0x1d0: 0x00000000
kfdhdb.acdb.aba.seq: 0 ; 0x1d4: 0x00000000
kfdhdb.acdb.aba.blk: 0 ; 0x1d8: 0x00000000
kfdhdb.acdb.ents: 0 ; 0x1dc: 0x0000
kfdhdb.acdb.ub2spare: 43605 ; 0x1de: 0xaa55

raw5磁盘的标识位全为0

[oracle@nccpxdb1 lib]$ kfed read /dev/raw/raw5
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483649 ; 0x008: TYPE=0x8 NUMB=0x1
kfbh.check: 1802212223 ; 0x00c: 0x6b6b937f
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 1 ; 0x024: 0x0001
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DATAG1_0001 ; 0x028: length=11
kfdhdb.grpname: DATAG1 ; 0x048: length=6
kfdhdb.fgname: DATAG1_0001 ; 0x068: length=11
kfdhdb.capname: ; 0x088: length=0
kfdhdb.crestmp.hi: 32928501 ; 0x0a8: HOUR=0x15 DAYS=0x17 MNTH=0xc YEAR=0x7d9
kfdhdb.crestmp.lo: 2195144704 ; 0x0ac: USEC=0x0 MSEC=0x1d0 SECS=0x2d MINS=0x20
kfdhdb.mntstmp.hi: 32940275 ; 0x0b0: HOUR=0x13 DAYS=0x7 MNTH=0x8 YEAR=0x7da
kfdhdb.mntstmp.lo: 3201116160 ; 0x0b4: USEC=0x0 MSEC=0x34a SECS=0x2c MINS=0x2f
kfdhdb.secsize: 512 ; 0x0b8: 0x0200
kfdhdb.blksize: 4096 ; 0x0ba: 0x1000
kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact: 113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize: 512000 ; 0x0c4: 0x0007d000
kfdhdb.pmcnt: 6 ; 0x0c8: 0x00000006
kfdhdb.fstlocn: 1 ; 0x0cc: 0x00000001
kfdhdb.altlocn: 2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn: 0 ; 0x0d4: 0x00000000
kfdhdb.redomirrors[0]: 0 ; 0x0d8: 0x0000
kfdhdb.redomirrors[1]: 0 ; 0x0da: 0x0000
kfdhdb.redomirrors[2]: 0 ; 0x0dc: 0x0000
kfdhdb.redomirrors[3]: 0 ; 0x0de: 0x0000
kfdhdb.dbcompat: 168820736 ; 0x0e0: 0x0a100000
kfdhdb.grpstmp.hi: 32928501 ; 0x0e4: HOUR=0x15 DAYS=0x17 MNTH=0xc YEAR=0x7d9
kfdhdb.grpstmp.lo: 2195053568 ; 0x0e8: USEC=0x0 MSEC=0x177 SECS=0x2d MINS=0x20
kfdhdb.ub4spare[0]: 0 ; 0x0ec: 0x00000000
kfdhdb.ub4spare[1]: 0 ; 0x0f0: 0x00000000
kfdhdb.ub4spare[2]: 0 ; 0x0f4: 0x00000000
kfdhdb.ub4spare[3]: 0 ; 0x0f8: 0x00000000
kfdhdb.ub4spare[4]: 0 ; 0x0fc: 0x00000000
kfdhdb.ub4spare[5]: 0 ; 0x100: 0x00000000
kfdhdb.ub4spare[6]: 0 ; 0x104: 0x00000000
kfdhdb.ub4spare[7]: 0 ; 0x108: 0x00000000
kfdhdb.ub4spare[8]: 0 ; 0x10c: 0x00000000
kfdhdb.ub4spare[9]: 0 ; 0x110: 0x00000000
kfdhdb.ub4spare[10]: 0 ; 0x114: 0x00000000
kfdhdb.ub4spare[11]: 0 ; 0x118: 0x00000000
kfdhdb.ub4spare[12]: 0 ; 0x11c: 0x00000000
kfdhdb.ub4spare[13]: 0 ; 0x120: 0x00000000
kfdhdb.ub4spare[14]: 0 ; 0x124: 0x00000000
kfdhdb.ub4spare[15]: 0 ; 0x128: 0x00000000
kfdhdb.ub4spare[16]: 0 ; 0x12c: 0x00000000
kfdhdb.ub4spare[17]: 0 ; 0x130: 0x00000000
kfdhdb.ub4spare[18]: 0 ; 0x134: 0x00000000
kfdhdb.ub4spare[19]: 0 ; 0x138: 0x00000000
kfdhdb.ub4spare[20]: 0 ; 0x13c: 0x00000000
kfdhdb.ub4spare[21]: 0 ; 0x140: 0x00000000
kfdhdb.ub4spare[22]: 0 ; 0x144: 0x00000000
kfdhdb.ub4spare[23]: 0 ; 0x148: 0x00000000
kfdhdb.ub4spare[24]: 0 ; 0x14c: 0x00000000
kfdhdb.ub4spare[25]: 0 ; 0x150: 0x00000000
kfdhdb.ub4spare[26]: 0 ; 0x154: 0x00000000
kfdhdb.ub4spare[27]: 0 ; 0x158: 0x00000000
kfdhdb.ub4spare[28]: 0 ; 0x15c: 0x00000000
kfdhdb.ub4spare[29]: 0 ; 0x160: 0x00000000
kfdhdb.ub4spare[30]: 0 ; 0x164: 0x00000000
kfdhdb.ub4spare[31]: 0 ; 0x168: 0x00000000
kfdhdb.ub4spare[32]: 0 ; 0x16c: 0x00000000
kfdhdb.ub4spare[33]: 0 ; 0x170: 0x00000000
kfdhdb.ub4spare[34]: 0 ; 0x174: 0x00000000
kfdhdb.ub4spare[35]: 0 ; 0x178: 0x00000000
kfdhdb.ub4spare[36]: 0 ; 0x17c: 0x00000000
kfdhdb.ub4spare[37]: 0 ; 0x180: 0x00000000
kfdhdb.ub4spare[38]: 0 ; 0x184: 0x00000000
kfdhdb.ub4spare[39]: 0 ; 0x188: 0x00000000
kfdhdb.ub4spare[40]: 0 ; 0x18c: 0x00000000
kfdhdb.ub4spare[41]: 0 ; 0x190: 0x00000000
kfdhdb.ub4spare[42]: 0 ; 0x194: 0x00000000
kfdhdb.ub4spare[43]: 0 ; 0x198: 0x00000000
kfdhdb.ub4spare[44]: 0 ; 0x19c: 0x00000000
kfdhdb.ub4spare[45]: 0 ; 0x1a0: 0x00000000
kfdhdb.ub4spare[46]: 0 ; 0x1a4: 0x00000000
kfdhdb.ub4spare[47]: 0 ; 0x1a8: 0x00000000
kfdhdb.ub4spare[48]: 0 ; 0x1ac: 0x00000000
kfdhdb.ub4spare[49]: 0 ; 0x1b0: 0x00000000
kfdhdb.ub4spare[50]: 0 ; 0x1b4: 0x00000000
kfdhdb.ub4spare[51]: 0 ; 0x1b8: 0x00000000
kfdhdb.ub4spare[52]: 0 ; 0x1bc: 0x00000000
kfdhdb.ub4spare[53]: 0 ; 0x1c0: 0x00000000
kfdhdb.ub4spare[54]: 0 ; 0x1c4: 0x00000000
kfdhdb.ub4spare[55]: 0 ; 0x1c8: 0x00000000
kfdhdb.ub4spare[56]: 0 ; 0x1cc: 0x00000000
kfdhdb.ub4spare[57]: 0 ; 0x1d0: 0x00000000
kfdhdb.acdb.aba.seq: 0 ; 0x1d4: 0x00000000
kfdhdb.acdb.aba.blk: 0 ; 0x1d8: 0x00000000
kfdhdb.acdb.ents: 0 ; 0x1dc: 0x0000
kfdhdb.acdb.ub2spare: 0 ; 0x1de: 0x0000

而磁盘组的DATAG2的两个磁盘raw6raw7的标识位也都是全0

确定了问题后,手工修改这个文本文件,将kfdhdb.ub4spare[43]kfdhdb.acdb.ub2spare均改为0

kfdhdb.ub4spare[43]: 0 ; 0x198: 0x00000000
kfdhdb.ub4spare[44]: 0 ; 0x19c: 0x00000000
.
.
.
kfdhdb.acdb.ents: 0 ; 0x1dc: 0x0000
kfdhdb.acdb.ub2spare: 0 ; 0x1de: 0x0000

然后利用kfedmerge将其写回磁盘头:

[oracle@node1 data]$ kfed merge /dev/raw/raw4 text=raw4_new.txt
[oracle@node1 data]$ export ORACLE_SID=+ASM1
[oracle@node1 data]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Jun 13 18:45:54 2011

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.

Connected to:

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,
Data Mining and Real Application Testing options

SQL> set pages 100 lines 120
SQL> select group_number, name, state, total_mb, free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME STATE TOTAL_MB FREE_MB
------------ ------------------------------------ ---------- ----------
1 DATAG1 DISMOUNT 0 0
2 DATAG2 MOUNTED 1024000 347457

SQL> select group_number, disk_number, mount_status, header_status, name, path from v$asm_disk;

GROUP_NUMBER DISK_NUMBER MOUNT_STATUS HEADER_STATUS NAME PATH
------------ ----------- ------------ -------------- ----------- --------------
0 0 CLOSED FOREIGN /dev/raw/raw1
0 1 CLOSED FOREIGN /dev/raw/raw2
0 2 CLOSED FOREIGN /dev/raw/raw8
0 3 CLOSED FOREIGN /dev/raw/raw10
0 4 CLOSED FOREIGN /dev/raw/raw9
0 8 CLOSED MEMBER /dev/raw/raw4
0 5 CLOSED MEMBER /dev/raw/raw5
2 1 CACHED MEMBER DATAG2_0001 /dev/raw/raw7
2 0 CACHED MEMBER DATAG2_0000 /dev/raw/raw6

SQL> alter diskgroup datag1 mount;

Diskgroup altered.

SQL> shutdown immediate
ASM diskgroups dismounted
ASM instance shutdown
SQL> startup
ASM instance started

Total System Global Area 130023424 bytes
Fixed Size 2082208 bytes
Variable Size 102775392 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> exit

利用MERGE修改磁盘头后,磁盘状态正常,磁盘组顺利挂载。此时检查节点1上的磁盘头信息,状态也恢复了正常。

在运行MERGE命令的时候,整个磁盘组并没有卸载,数据库也在正常运行,也就是说,在整个操作过程中,并没有1秒钟的停机时间,所有的修改都是ONLINE进行的。

只要磁盘组恢复了正常,剩下的操作就很简单了,手工将第二个实例添加到数据库中就可以了。



转自:http://blog.itpub.net/4227/viewspace-703092/   http://blog.itpub.net/4227/viewspace-703174/

Logo

更多推荐