MegaCli

MegaCli是一款管理维护硬件RAID软件,可以通过它来了解当前raid卡的所有信息,包括 raid卡的型号,raid的阵列类型,raid 上各磁盘状态,等等。通常,我们对硬盘当前的状态不太好确定,一般通过机房人员巡检来完成,有没有通过软件的方式来检查确定这个问题呢。MegaCli就可以做到,一般通过 MegaCli 的Media Error Count: 0 Other Error Count: 0 这两个数值来确定阵列中磁盘是否有问题;Medai Error Count 表示磁盘可能错误,可能是磁盘有坏道,这个值不为0值得注意,数值越大,危险系数越高,Other Error Count 表示磁盘可能存在松动,可能需要重新再插入。MegaCli 可以对阵列中所有的磁盘进行检测,我们可以通过脚本的方式来检测相关参数,从而通知管理人员。

0.安装

  • 方法一

本文所使用系统均为Ubuntu20.04.1

添加镜像源与更新
vi /etc/apt/sources.list
在最后加入
deb http://hwraid.le-vert.net/ubuntu precise main
保存并退出(:wq),添加GPG密钥
wget -O - https://hwraid.le-vert.net/debian/hwraid.le-vert.net.gpg.key | sudo apt-key add -
然后更新镜像源
apt install
安装megacli,并确认是否安装成功
root@user:~# apt install megacli
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  megacli
0 upgraded, 1 newly installed, 0 to remove and 15 not upgraded.
Need to get 4,427 kB of archives.
After this operation, 6,880 kB of additional disk space will be used.
Get:1 http://hwraid.le-vert.net/ubuntu precise/main amd64 megacli amd64 8.07.14-2+Ubuntu.precise.12.04 [4,427 kB]
Fetched 4,427 kB in 3s (1,554 kB/s)
Selecting previously unselected package megacli.
(Reading database ... 109717 files and directories currently installed.)
Preparing to unpack .../megacli_8.07.14-2+Ubuntu.precise.12.04_amd64.deb ...
Unpacking megacli (8.07.14-2+Ubuntu.precise.12.04) ...
Setting up megacli (8.07.14-2+Ubuntu.precise.12.04) ...
root@user:~# megacli -v


      MegaCLI SAS RAID Management Tool  Ver 8.07.14 Dec 16, 2013

    (c)Copyright 2013, LSI Corporation, All Rights Reserved.

Exit Code: 0x00
  • 方法二

1.0 megacli工具基础用法介绍

# 查raid级别
$ megacli -LDInfo -Lall -aALL 

# 查raid卡信息
$ megacli -AdpAllInfo -aALL 

# 查看硬盘信息
$ megacli -PDList -aALL 

# 查看电池信息
$ megacli -AdpBbuCmd -aAll 

# 查看raid卡日志
$ megacli -FwTermLog -Dsply -aALL 

# 显示适配器个数
$ megacli -adpCount 

# 显示适配器时间
$ megacli -AdpGetTime –aALL 

# 显示所有适配器信息
$ megacli -AdpAllInfo -aAll # 显示所有逻辑磁盘组信息
$ megacli -LDInfo -LALL -aAll # 显示所有的物理信息
$ megacli -PDList -aAll # 查看充电状态
$ megacli -AdpBbuCmd -GetBbuStatus -aALL |grep 'Charger Status' 

# 显示BBU状态信息
$ megacli -AdpBbuCmd -GetBbuStatus -aALL 

# 显示BBU容量信息
$ megacli -AdpBbuCmd -GetBbuCapacityInfo -aALL 

# 显示BBU设计参数
$ megacli -AdpBbuCmd -GetBbuDesignInfo -aALL # 显示当前BBU属性
$ megacli -AdpBbuCmd -GetBbuProperties -aALL # 显示Raid卡型号,Raid设置,Disk相关信息
$ megacli -cfgdsply -aALL ## 磁带状态的变化,从拔盘,到插盘的过程中。
Device |Normal |Damage  |Rebuild |Normal
Virtual Drive |Optimal|Degraded|Degraded|Optimal
Physical Drive   |Online |Failed Unconfigured|Rebuild|Online

# 查看物理磁盘状态:
$ megacli -PDRbld -ShowProg -PhysDrv  [Enclosure Device ID:Slot Number]  -a0
## Rebuild 中的物理磁盘状态中会显示:"Firmware state: Rebuild"

# 查询 Rebuild 进度:
$ megacli -pdrbld -showprog -physdrv[E:S] -aALL
## 返回内容类似于下面这样:
Rebuild Progress on Device at Enclosure 32, Slot 5 Completed 77% in 101 Minutes.

# 以文本进度条样式显示 Rebuild 进度:
$ megacli -pdrbld -progdsply -physdrv[E:S] -aALL
## 屏幕显示类似下面的内容:
Rebuild progress of physical drives...
Enclosure:Slot Percent Complete Time Elps 032 :05   #######################87 %################*******  01:59:07 
Press key to quit...

# 查看 RAID 卡 Rebuild 参数:
$ megacli -AdpAllinfo -aALL | grep -i rebuild
## 返回结果类似下面这样
Rebuild Rate : 30%
Auto Rebuild : Enabled
Rebuild Rate : YesForce 
Rebuild : Yes

# 设置 RAID 卡 Rebuild 比例为60%:
$ megacli -AdpSetProp { RebuildRate -60} -aALL
## 设置成功后返回:
Adapter 0: Set rebuild rate to 60% success.

# 设置全局热备
$ megacli -PDHSP -Set [-EnclAffinity] [-nonRevertible] -PhysDrv[252:0] -a0

# 删除全局热备
$ megacli-PDHSP -Rmv -PhysDrv[32:5] -a0

1.1 分类整理Megacli用法

常用命令使用

$ megacli -LDInfo -Lall -aALL [查raid级别]
$ megacli -AdpAllInfo -aALL [查raid卡信息]
$ megacli -PDList -aALL [查看硬盘信息]
$ megacli -AdpBbuCmd -aAll [查看电池信息]
$ megacli -FwTermLog -Dsply -aALL [查看raid卡日志]
$ megacli -adpCount [显示适配器个数]
$ megacli -AdpGetTime –aALL [显示适配器时间]
$ megacli -AdpAllInfo -aAll [显示所有适配器信息]
$ megacli -LDInfo -LALL -aAll [显示所有逻辑磁盘组信息]
$ megacli -PDList -aAll [显示所有的物理信息]、
$ megacli -PdLocate -start -physdrv[252:2] -a0  [点亮指定硬盘(定位)]
$ megacli -CfgForeign -Clear -a0  [清除Foreign状态]
$ megacli -AdpBbuCmd -GetBbuStatus -aALL |grep 'ChargerStatus' [查看充电状态]
$ megacli -AdpBbuCmd -GetBbuStatus -aALL[显示BBU状态信息]
$ megacli -AdpBbuCmd -GetBbuCapacityInfo -aALL[显示BBU容量信息]
$ megacli -AdpBbuCmd -GetBbuDesignInfo -aALL [显示BBU设计参数]
$ megacli -AdpBbuCmd -GetBbuProperties -aALL [显示当前BBU属性]
$ megacli -cfgdsply -aALL [显示Raid卡型号,Raid设置,Disk相关信息]
$ megacli -PDList -aAll -NoLog  [查看所有硬盘的状态]
$ megacli -LdPdInfo -aAll -NoLog [查看所有Virtual Disk的状态]

查看磁盘缓存策略

$ megacli -LDGetProp -Cache -L0 -a0
$ megacli -LDGetProp -Cache -L1 -a0
$ megacli -LDGetProp -Cache -LALL -a0
$ megacli -LDGetProp -Cache -LALL -aALL
$ megacli -LDGetProp -DskCache -LALL -aALL

设置磁盘缓存策略

缓存策略解释
WT (Write through
WB (Write back)
NORA (No read ahead)
RA (Read ahead)
ADRA (Adaptive read ahead)
Cached
Direct

$ megacli -LDSetProp WT|WB|NORA|RA|ADRA -L0 -a0
$ megacli -LDSetProp -Cached|-Direct -L0 -a0 enable / disable disk cache
$ megacli -LDSetProp -EnDskCache|-DisDskCache -L0 -a0

缓存控制示例

# 设置磁盘的缓存模式和访问方式 (Change Virtual Disk Cache and Access Parameters)
Description Allows you to change the following virtual disk parameters:
-WT (Write through), WB (Write back): Selects write policy.
-NORA (No read ahead), RA (Read ahead), ADRA (Adaptive read ahead): Selects read policy.
-Cached, -Direct: Selects cache policy.
-RW, -RO, Blocked: Selects access policy.
-EnDskCache: Enables disk cache.
-DisDskCache: Disables disk cache.
MegaCli -LDSetProp { WT | WB|NORA |RA | ADRA|-Cached|Direct} |
{-RW|RO|Blocked} |
{-Name[string]} |
{-EnDskCache|DisDskCache} –Lx |
-L0,1,2|-Lall -aN|-a0,1,2|-aALL
MegaCli -LDSetProp WT -L0 -a0

# 显示磁盘缓存和访问方式(Display Virtual Disk Cache and Access Parameters)
MegaCli -LDGetProp -Cache | -Access | -Name | -DskCache -Lx|-L0,1,2|
-Lall -aN|-a0,1,2|-aALL
Displays the cache and access policies of the virtual disk(s):
-WT (Write through), WB (Write back): Selects write policy.
-NORA (No read ahead), RA (Read ahead), ADRA (Adaptive read ahead): Selects read policy.
-Cache, -Cached, Direct: Displays cache policy.
-Access, -RW, -RO, Blocked: Displays access policy.
-DskCache: Displays physical disk cache policy.

Raid 管理

RAID Level对应关系
Raid信息Raid级别
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0RAID 1
RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0RAID 0
RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3RAID 50
RAID Level : Primary-1, Secondary-3, RAID Level Qualifier-0RAID 10
# 创建一个 raid5 阵列,由物理盘 2,3,4 构成,该阵列的热备盘是物理盘 5
$ megacli -CfgLdAdd -r5 [1:2,1:3,1:4] WB Direct -Hsp[1:5] -a0
# 创建阵列,不指定热备
$ megacli -CfgLdAdd -r5 [1:2,1:3,1:4] WB Direct -a0
# 查看RAID阵列中掉线的盘
$ megacli -pdgetmissing -a0
# 删除阵列
$ megacli -CfgLdDel -L1 -a0
# 替换坏掉的模块
$ megacli -pdreplacemissing -physdrv[12:10] -Array5 -row0 -a0
# 在线添加磁盘
$ megacli -LDRecon -Start -r5 -Add -PhysDrv[1:4] -L1 -a0
# 阵列创建完后,会有一个初始化同步块的过程,可以看看其进度。
$ megacli -LDInit -ShowProg -LALL -aALL 或者以动态可视化文字界面显示
$ megacli -LDInit -ProgDsply -LALL -aALL
# 查看阵列后台初始化进度
$ megacli -LDBI -ShowProg -LALL -aALL
# 或者以动态可视化文字界面显示
$ megacli -LDBI -ProgDsply -LALL -aALL
# 指定第 5 块盘作为全局热备
$ megacli -PDHSP -Set [-EnclAffinity] [-nonRevertible] -PhysDrv[1:5] -a0
# 指定为某个阵列的专用热备
$ megacli -PDHSP -Set [-Dedicated [-Array1]] [-EnclAffinity] [-nonRevertible] -PhysDrv[1:5] -a0
# 删除全局热备
$ megacli -PDHSP -Rmv -PhysDrv[1:5] -a0
# 将某块物理盘下线/上线
$ megacli -PDOffline -PhysDrv [1:4] -a0
$ megacli -PDOnline -PhysDrv [1:4] -a0
# 手动开启 rebuid
$ megacli -pdrbld -start -physdrv[12:10] -a0
# 关闭 rebuild
$ megacli -AdpAutoRbld -Dsbl -a0
# 设置rebuild的速率
$ megacli -AdpSetProp RebuildRate -30 -a0
# 查看物理磁盘重建进度 Rebuild
$ megacli -PDRbld -ShowProg -PhysDrv [1:5] -a0
# 或者以动态可视化文字界面显示
$ megacli -PDRbld -ProgDsply -PhysDrv [1:5] -a0
# 查看 ES
$ megacli -PDList -aAll -NoLog | grep -Ei "(enclosure|slot)"

raid 电池设置相关

# 查看电池状态信息(Display BBU Status Information)
$ megacli -AdpBbuCmd -GetBbuStatus -aN|-a0,1,2|-aALL
$ megacli -AdpBbuCmd -GetBbuStatus -aALL

# 查看电池容量(Display BBU Capacity Information)
$ megacli -AdpBbuCmd -GetBbuCapacityInfo -aN|-a0,1,2|-aALL
$ megacli -AdpBbuCmd -GetBbuCapacityInfo –aALL

# 查看电池设计参数(Display BBU Design Parameters)
$ megacli -AdpBbuCmd -GetBbuDesignInfo -aN|-a0,1,2|-aALL
$ megacli -AdpBbuCmd -GetBbuDesignInfo –aALL

# 查看电池属性(Display Current BBU Properties)
$ megacli -AdpBbuCmd -GetBbuProperties -aN|-a0,1,2|-aALL
$ megacli -AdpBbuCmd -GetBbuProperties –aALL

# 设置电池为学习模式为循环模式(Start BBU Learning Cycle)
Description Starts the learning cycle on the BBU.
No parameter is needed for this option.
$ megacli -AdpBbuCmd -BbuLearn -aN|-a0,1,2|-aALL

megacli必知必会

# 使用 LSI 的 megaraid 可以对 raid 进行有效监控。别的厂商比如 HP,IBM 也有自己的 raid API
$ MegaCli -ldinfo -lall -aall 查询raid级别,磁盘数量,容量,条带大小。
$ MegaCli -cfgdsply -aALL |grep Policy 查询控制器cache策略
$ MegaCli -LDSetProp WB -L0 -a0 设置write back功能
$ MegaCli -LDSetProp CachedBadBBU -L0 -a0 设置即使电池坏了还是保持WB功能
$ MegaCli -AdpBbuCmd -BbuLearn a0 手动充电
$ MegaCli -FwTermLog -Dsply -aALL 查询日志
$ MegaCli -adpCount 显示适配器个数

# 显示所有适配器信息
$ MegaCli -AdpAllInfo -aAll
Critical Disks : 0
Failed Disks : 0

# 显示所有逻辑磁盘组信息
$ MegaCli -LDInfo -LALL -aAll

# 显示所有的物理信息
$ MegaCli -PDList -aAll
Media Error Count: 0
Other Error Count: 0

# 查看充电状态
$ MegaCli -AdpBbuCmd -GetBbuStatus -aALL
Learn Cycle Requested : No
Fully Charged : Yes

显示BBU(后备电池)状态信息: MegaCli -AdpBbuCmd -GetBbuStatus -aALL
显示BBU容量信息: MegaCli -AdpBbuCmd -GetBbuCapacityInfo -aALL
显示BBU设计参数: MegaCli -AdpBbuCmd -GetBbuDesignInfo -aALL
显示当前BBU属性: MegaCli -AdpBbuCmd -GetBbuProperties -aALL
显示Raid卡型号,Raid设置,Disk相关信息: MegaCli -cfgdsply -aALL
查看Cache 策略设置: MegaCli -cfgdsply -aALL |grep -i Policy Current Cache Policy: WriteBack, ReadAheadNone, Direct, Write Cache OK if Bad BBU
查看充电进度百分比: MegaCli -AdpBbuCmd -GetBbuStatus -aALL

详细参数以及用法megacli note

2.重要参数

参数名称含义
Firmware state磁盘状态
Firmware state: Online, Spun Up磁盘正常
Firmware state: Unconfigured(good), Spun Up磁盘已安装,但未启用
Firmware state: Unconfigured(bad)故障, 对应hwcheck的 Non-Critical
Firmware state: Failed故障, 对应hwcheck的Critical
Firmware state: Rebuild重建,一般在更换磁盘时显示
Enclosure Device ID: 32设备
Slot Number: 1磁盘在服务器上的槽位
Adapter #0适配器编号,对应 -a 参数

3.实战

megacli LDPDInfo -Aall

重点关注以下几点:

Media Error CountOther Error CountPredictive Failure CountLast Predictive FailureDrive has flagged a S.M.A.R.T alert
如果这几个数值不为0,则可能为硬盘故障,需要更换硬盘。
如果磁盘编号不确定,可以通过让硬盘闪烁的方式来给硬盘定位

让指定硬盘闪灯
megacli -PdLocate -start -physdrv [E:S] -aALL
其中 E表示 Enclosure Device ID,S表示Slot Number。比如坏盘的位置为:
Enclosure Device ID: 1
Slot Number: 0

megacli -PdLocate -start -physdrv[1:0] -a0

Adapter: 0: Device at EnclId-1 SlotId-0 — PD Locate Start Command was successfully sent to Firmware Exit Code: 0x00

关闭硬盘闪灯
megacli -PdLocate -stop -physdrv [E:S] -aALL

如果raid中有硬盘故障,更换硬盘后,一般都无需做操作,阵列卡会自动做rebuild,从拔出硬盘到插入新盘,一般会有以下的过程:
Device
Normal —>Damage —>Rebuild —>Normal
Virtual Drive
Optimal —>Degraded —>Degraded —>Optimal
Physical Drive
Online —>Failed Unconfigured —>Rebuild —>Online

  • 查看Rebuild进度
megacli -PDRbld -showprog -physDrv [E:S] -a0
一般输出如下
root@ubuntu-server:~# megacli -PDRbld -showprog -physDrv[0:23] -a0

Rebuild Progress on Device at Enclosure 0, Slot 23 Completed 50% in 519 Minutes.

Exit Code: 0x00

4.引用

官方页面
Megacli命令的使用总结
使用 MegaCLI 检测磁盘状态并更换磁盘
megacli通过盘符定位物理盘_MEGACLI查看硬盘状态
megecli官方wiki

Last modification:August 9th, 2021 at 09:42 pm