MegaCli入坑指南

MegaCli 是LSI公司官方提供的SCSI卡管理工具,因为LSI被收购变成了如今的Broadcom,因此如今想下载MegaCli,须要去Broadcom官网查找Legacy产品支持,搜索MegaRAID便可。算法

如今官方有storcli,整合了LSI和3ware全部产品。可是我的认为Megacli用起来更顺手,并且线上用了几家国产厂商服务器,用Megacli都能管理好RAID,因此换不换无所谓。服务器

查看Adapter 信息:工具

./MegaCli64 -AdpAllInfo -aALL

返回结果太长不少都看不懂但不要紧,新手先记住第一行,表示个人机器上有个0号适配器。MegaCli64不少命令都要在最后用-a指定Adapter,我只有Adapter #0 因此从此都写-a0就行,还能够-a0,1,2或-aALLui

Adapter #0
==============================================================================
                    Versions
                ================
Product Name    : PERC H710 Adapter
Serial No       : 31P003R
FW Package Build: 21.1.0-0007
                    Mfg. Data
                ================
Mfg. Date       : 01/26/13
Rework Date     : 01/26/13
Revision No     : A00
Battery FRU     : N/A
...

查看Adapter的具体配置,这台机器插了12块盘,一块作RAID0装系统,剩下的盘作了RAID5:spa

./MegaCli64 -CfgDsply -aALL
==============================================================================
Adapter: 0
Product Name: PERC H710 Adapter
Memory: 512MB
BBU: Present
Serial No: 31P003R
==============================================================================
Number of DISK GROUPS: 2 #有俩磁盘组
DISK GROUPS: 0 #0号磁盘组
Number of Spans: 1
SPAN: 0
Span Reference: 0x00
Number of PDs: 1
Number of VDs: 1
Number of dedicated Hotspares: 0
Virtual Disk Information:
Virtual Disk: 0 (Target Id: 0)
Name:
RAID Level: Primary-0, Secondary-0, RAID Level Qualifier-0 #作了RAID0
Size:2.728 TB
State: Optimal
Stripe Size: 64 KB
Number Of Drives:1
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Encryption Type: None
Physical Disk Information:
Physical Disk: 0
Enclosure Device ID: 32
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abee
Connected Port Number: 0(path0) 
Inquiry Data:             手动马赛克 #这里是序列号
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device

DISK GROUPS: 1 #1号磁盘组
Number of Spans: 1
SPAN: 0
Span Reference: 0x01
Number of PDs: 11 #11块物理盘
Number of VDs: 1 #作成了1块虚拟盘
Number of dedicated Hotspares: 0
Virtual Disk Information:
Virtual Disk: 0 (Target Id: 1)
Name:
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3 #作了RAID5
Size:27.285 TB
State: Optimal
Stripe Size: 64 KB
Number Of Drives:11
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Encryption Type: None
Physical Disk Information:
Physical Disk: 0 #第一块物理盘
Enclosure Device ID: 32
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abec
Connected Port Number: 0(path0) 
Inquiry Data:             手动马赛克 #这里是磁盘的序列号,跟磁盘标签一致
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
...

查看每块物理盘的信息和状态,跟前面同样,只是少了Adapter信息。code

./MegaCli64 -PDList -a0
 
Adapter #0
Enclosure Device ID: 32
Slot Number: 0
Device Id: 0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abee
Connected Port Number: 0(path0) 
Inquiry Data:             手动马赛克 #这里是磁盘的序列号,跟磁盘标签一致
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
Enclosure Device ID: 32
Slot Number: 1
Device Id: 1
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Firmware state: Online
SAS Address(0): 0x500056b37789abec
Connected Port Number: 0(path0) 
Inquiry Data:             手动马赛克 #这里是磁盘的序列号,跟磁盘标签一致
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Foreign State: None 
Device Speed: 3.0Gb/s 
Link Speed: 3.0Gb/s 
Media Type: Hard Disk Device
...

这里会拿到不少有用的信息:orm

一、Slot Number:slot号,应该跟机器外观上的标识一致。若是机器上有多块盘,直接告诉现场工程师slot X的硬盘有问题,工程师就会直接换盘。图片

二、Inquiry Data: 这里是磁盘的序列号,跟磁盘标签上一致。磁盘标签须要拔盘才能看到,按slot拔盘看到磁盘的序列号应该跟Inquiry Data一致。ip

三、Firmware state: 这里能看到磁盘的状态,Online是咱们指望看到的最好状态,除此以外还有 Unconfigured Offline Failed等等,大多表达一个悲伤的事实:你要加班报修/修复他们了。。。get

四、须要特别关注这几个指标:Media Error / Other Error / Predictive Failure Count / Last Predictive Failure Event Seq Number 都有可能不是0。这意味着磁盘虽然能用但已经再也不可靠,颇有可能存在坏簇、坏道之类的问题,必须尽快换掉这块盘。若是坚持使用,那磁盘就离完全坏掉不远了。网上流传的说法是前3个Count越大表明磁盘状态越差,实际并非这样,如下2个截图就能够说明。

图片描述
图片描述

同事为这个问题专门与服务器RAID卡磁盘厂家沟通,获得的反馈是:查到以前的资料,Medium error、other error数值的绝对值,不能直接反应硬盘的状态。根据与RAID卡、硬盘厂家的沟通,建议作法是监控Predictive Failure 的数值,不为零说明硬盘有问题。另外,若是硬盘failed,也能够直接报修。Predictive Failure Count指令:storcli /c0/eall/sall show all监控关键字Predictive Failure Count,标准为不能大于0,如有计数,将对应的硬盘换掉;Predictive Failure中已经涵盖media error,并且比media error的范围更广、更全面。硬盘的 SMART 子系统已经具有一套完整的算法来评估硬盘的健康情况SMART 子系统算法会参考硬盘运行时各个方面的参数,media error 是其中一项SMART 对于 media error 的评估是基于单位时间增加数来计算的当 SMART 子系统中任何一个评估项达到对应的阈值时,硬盘会报告 Sense Code: 01 5D 00 (FAILURE PREDICTION THRESHOLD EXCEEDED)遵循 SCSI 协议标准的 host (OS SCSI 子系统,SAS 控制器, RAID 卡等) 能够正确解析出该 Sense Code综上,因为 media error 已经被硬盘 SMART 子系统所涵盖,而且会依据 SCSI 协议标准上报 predictive failure,全部硬盘部分只须要在Raid卡下监控Predictive Failure就好,标准为不能大于0。

相关文章
相关标签/搜索