MegaCli 是LSI公司官方提供的SCSI卡管理工具,因为LSI被收购变成了如今的Broadcom,因此如今想下载MegaCli,须要去Broadcom官网查找Legacy产品支持,搜索MegaRAID便可。算法
如今官方有storcli,整合了LSI和3ware全部产品。可是我的认为Megacli用起来更顺手,并且线上用了几家国产厂商服务器,用Megacli都能管理好RAID,因此换不换无所谓。服务器
查看Adapter 信息:工具
./MegaCli64 -AdpAllInfo -aALL
返回结果太长不少都看不懂但不要紧,新手先记住第一行,表示个人机器上有个0号适配器。MegaCli64不少命令都要在最后用-a指定Adapter,我只有Adapter #0 因此从此都写-a0就行,还能够-a0,1,2或-aALLui
Adapter #0 ============================================================================== Versions ================ Product Name : PERC H710 Adapter Serial No : 31P003R FW Package Build: 21.1.0-0007 Mfg. Data ================ Mfg. Date : 01/26/13 Rework Date : 01/26/13 Revision No : A00 Battery FRU : N/A ...
查看Adapter的具体配置,这台机器插了12块盘,一块作RAID0装系统,剩下的盘作了RAID5:spa
./MegaCli64 -CfgDsply -aALL ============================================================================== Adapter: 0 Product Name: PERC H710 Adapter Memory: 512MB BBU: Present Serial No: 31P003R ============================================================================== Number of DISK GROUPS: 2 #有俩磁盘组 DISK GROUPS: 0 #0号磁盘组 Number of Spans: 1 SPAN: 0 Span Reference: 0x00 Number of PDs: 1 Number of VDs: 1 Number of dedicated Hotspares: 0 Virtual Disk Information: Virtual Disk: 0 (Target Id: 0) Name: RAID Level: Primary-0, Secondary-0, RAID Level Qualifier-0 #作了RAID0 Size:2.728 TB State: Optimal Stripe Size: 64 KB Number Of Drives:1 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Encryption Type: None Physical Disk Information: Physical Disk: 0 Enclosure Device ID: 32 Slot Number: 0 Device Id: 0 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online SAS Address(0): 0x500056b37789abee Connected Port Number: 0(path0) Inquiry Data: 手动马赛克 #这里是序列号 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: 3.0Gb/s Link Speed: 3.0Gb/s Media Type: Hard Disk Device DISK GROUPS: 1 #1号磁盘组 Number of Spans: 1 SPAN: 0 Span Reference: 0x01 Number of PDs: 11 #11块物理盘 Number of VDs: 1 #作成了1块虚拟盘 Number of dedicated Hotspares: 0 Virtual Disk Information: Virtual Disk: 0 (Target Id: 1) Name: RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3 #作了RAID5 Size:27.285 TB State: Optimal Stripe Size: 64 KB Number Of Drives:11 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Encryption Type: None Physical Disk Information: Physical Disk: 0 #第一块物理盘 Enclosure Device ID: 32 Slot Number: 1 Device Id: 1 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online SAS Address(0): 0x500056b37789abec Connected Port Number: 0(path0) Inquiry Data: 手动马赛克 #这里是磁盘的序列号,跟磁盘标签一致 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: 3.0Gb/s Link Speed: 3.0Gb/s Media Type: Hard Disk Device ...
查看每块物理盘的信息和状态,跟前面同样,只是少了Adapter信息。code
./MegaCli64 -PDList -a0 Adapter #0 Enclosure Device ID: 32 Slot Number: 0 Device Id: 0 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online SAS Address(0): 0x500056b37789abee Connected Port Number: 0(path0) Inquiry Data: 手动马赛克 #这里是磁盘的序列号,跟磁盘标签一致 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: 3.0Gb/s Link Speed: 3.0Gb/s Media Type: Hard Disk Device Enclosure Device ID: 32 Slot Number: 1 Device Id: 1 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online SAS Address(0): 0x500056b37789abec Connected Port Number: 0(path0) Inquiry Data: 手动马赛克 #这里是磁盘的序列号,跟磁盘标签一致 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Foreign State: None Device Speed: 3.0Gb/s Link Speed: 3.0Gb/s Media Type: Hard Disk Device ...
这里会拿到不少有用的信息:orm
一、Slot Number:slot号,应该跟机器外观上的标识一致。若是机器上有多块盘,直接告诉现场工程师slot X的硬盘有问题,工程师就会直接换盘。图片
二、Inquiry Data: 这里是磁盘的序列号,跟磁盘标签上一致。磁盘标签须要拔盘才能看到,按slot拔盘看到磁盘的序列号应该跟Inquiry Data一致。ip
三、Firmware state: 这里能看到磁盘的状态,Online是咱们指望看到的最好状态,除此以外还有 Unconfigured Offline Failed等等,大多表达一个悲伤的事实:你要加班报修/修复他们了。。。get
四、须要特别关注这几个指标:Media Error / Other Error / Predictive Failure Count / Last Predictive Failure Event Seq Number 都有可能不是0。这意味着磁盘虽然能用但已经再也不可靠,颇有可能存在坏簇、坏道之类的问题,必须尽快换掉这块盘。若是坚持使用,那磁盘就离完全坏掉不远了。网上流传的说法是前3个Count越大表明磁盘状态越差,实际并非这样,如下2个截图就能够说明。
同事为这个问题专门与服务器RAID卡磁盘厂家沟通,获得的反馈是:查到以前的资料,Medium error、other error数值的绝对值,不能直接反应硬盘的状态。根据与RAID卡、硬盘厂家的沟通,建议作法是监控Predictive Failure 的数值,不为零说明硬盘有问题。另外,若是硬盘failed,也能够直接报修。Predictive Failure Count指令:storcli /c0/eall/sall show all监控关键字Predictive Failure Count,标准为不能大于0,如有计数,将对应的硬盘换掉;Predictive Failure中已经涵盖media error,并且比media error的范围更广、更全面。硬盘的 SMART 子系统已经具有一套完整的算法来评估硬盘的健康情况SMART 子系统算法会参考硬盘运行时各个方面的参数,media error 是其中一项SMART 对于 media error 的评估是基于单位时间增加数来计算的当 SMART 子系统中任何一个评估项达到对应的阈值时,硬盘会报告 Sense Code: 01 5D 00 (FAILURE PREDICTION THRESHOLD EXCEEDED)遵循 SCSI 协议标准的 host (OS SCSI 子系统,SAS 控制器, RAID 卡等) 能够正确解析出该 Sense Code综上,因为 media error 已经被硬盘 SMART 子系统所涵盖,而且会依据 SCSI 协议标准上报 predictive failure,全部硬盘部分只须要在Raid卡下监控Predictive Failure就好,标准为不能大于0。