背景:IDC异地搬迁,存储用货车拉到新机房上架,不少磁盘自己就坏了或在路上被颠坏,找台换完盘没修复完的机器玩玩~ui
注意,如下操做尽可能在没有IO操做的状况下进行。code
一、查看全部磁盘的状态,这没啥好说的get
./MegaCli64 -PDList -a0
二、有块盘Firmware state是Unconfigured(bad),这是今天要拯救的目标it
Enclosure Device ID: 0 Slot Number: 9 Device Id: 8 Sequence Number: 7 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 3.638 TB [0x1d1c0beb0 Sectors] Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors] Coerced Size: 3.637 TB [0x1d1b00000 Sectors] Firmware state: Unconfigured(bad) SAS Address(0): 0x5001c4500077d8a9 Connected Port Number: 0(path0) Inquiry Data: 手动马赛克 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device
三、先让这块磁盘变成goodast
./MegaCli64 -PDMakeGood -PhysDrv[0:9] -a0
这里-PhysDrv[0:9]对应上面的Enclosure Device ID和Slot Number,-a确定是Adapter #0,再看磁盘的状态List
Enclosure Device ID: 0 Slot Number: 9 Device Id: 8 Sequence Number: 8 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 3.638 TB [0x1d1c0beb0 Sectors] Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors] Coerced Size: 3.637 TB [0x1d1b00000 Sectors] Firmware state: Unconfigured(good), Spun Up SAS Address(0): 0x5001c4500077d8a9 Connected Port Number: 0(path0) Inquiry Data: 手动马赛克 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: Foreign Foreign Secure: Drive is not secured by a foreign lock key Device Speed: Unknown Link Speed: Unknown Media Type: Hard Disk Device
四、如今看看原来RAID阵列谁掉了,也就是说被换掉的坏盘原来在阵列里的位置closure
./MegaCli64 -pdgetmissing -a0 Adapter 0 - Missing Physical drives No. Array Row Size Expected 0 1 0 3814912 MB Exit Code: 0x00
五、记住是Array 1,Row 0,下面用新盘替换这个位置数据
./MegaCli64 -PdReplaceMissing -physdrv[0:9] -array1 -row0 -a0 Adapter: 0: Missing PD at Array 1, Row 0 is replaced. Exit Code: 0x00
六、能够看到成功了,可是RAID还不能用,咱们只是拿一块空盘替换原来装着数据的坏盘,要先恢复数据才行。怎么恢复?RAID5能够经过校验其余盘来恢复坏盘的数据,恢复的过程叫Rebuild。下面先把Rebuild开起来dict
./MegaCli64 -PDRbld -Start -PhysDrv[0:9] -a0 Started rebuild progress on device(Encl-0 Slot-9) Exit Code: 0x00
七、rebuild已经开始了,这个过程很是耗时间,对磁盘IO带来很大压力,因此尽可能不要读写数据。我也经历过Rebuild 2天以后没好,反而把其余磁盘搞坏了的倒霉事儿。因此,有这个空去拜个佛烧柱香,成功的几率可能会大一些。怎么知道Rebuild 进度呢?di
./MegaCli64 -pdrbld -showprog -physdrv[0:9] -a0 Rebuild Progress on Device at Enclosure 0, Slot 9 Completed 1% in 6 Minutes. Exit Code: 0x00
这表示:已经用了6分钟,完成了1% 。。。。照这速度大概10个小时之后能完成,因此下班去拜佛烧香明天上班来看结果仍是很科学哒~