Loading... # MegaCli MegaCli是一款管理维护硬件RAID软件,可以通过它来了解当前raid卡的所有信息,包括 raid卡的型号,raid的阵列类型,raid 上各磁盘状态,等等。通常,我们对硬盘当前的状态不太好确定,一般通过机房人员巡检来完成,有没有通过软件的方式来检查确定这个问题呢。MegaCli就可以做到,一般通过 MegaCli 的Media Error Count: 0 Other Error Count: 0 这两个数值来确定阵列中磁盘是否有问题;Medai Error Count 表示磁盘可能错误,可能是磁盘有坏道,这个值不为0值得注意,数值越大,危险系数越高,Other Error Count 表示磁盘可能存在松动,可能需要重新再插入。MegaCli 可以对阵列中所有的磁盘进行检测,我们可以通过脚本的方式来检测相关参数,从而通知管理人员。 ## 0.安装 - 方法一 > 本文所使用系统均为Ubuntu20.04.1 ``` 添加镜像源与更新 vi /etc/apt/sources.list 在最后加入 deb http://hwraid.le-vert.net/ubuntu precise main 保存并退出(:wq),添加GPG密钥 wget -O - https://hwraid.le-vert.net/debian/hwraid.le-vert.net.gpg.key | sudo apt-key add - 然后更新镜像源 apt install 安装megacli,并确认是否安装成功 root@user:~# apt install megacli Reading package lists... Done Building dependency tree Reading state information... Done The following NEW packages will be installed: megacli 0 upgraded, 1 newly installed, 0 to remove and 15 not upgraded. Need to get 4,427 kB of archives. After this operation, 6,880 kB of additional disk space will be used. Get:1 http://hwraid.le-vert.net/ubuntu precise/main amd64 megacli amd64 8.07.14-2+Ubuntu.precise.12.04 [4,427 kB] Fetched 4,427 kB in 3s (1,554 kB/s) Selecting previously unselected package megacli. (Reading database ... 109717 files and directories currently installed.) Preparing to unpack .../megacli_8.07.14-2+Ubuntu.precise.12.04_amd64.deb ... Unpacking megacli (8.07.14-2+Ubuntu.precise.12.04) ... Setting up megacli (8.07.14-2+Ubuntu.precise.12.04) ... root@user:~# megacli -v MegaCLI SAS RAID Management Tool Ver 8.07.14 Dec 16, 2013 (c)Copyright 2013, LSI Corporation, All Rights Reserved. Exit Code: 0x00 ``` - 方法二 ## 1.0 megacli工具基础用法介绍 ``` # 查raid级别 $ megacli -LDInfo -Lall -aALL # 查raid卡信息 $ megacli -AdpAllInfo -aALL # 查看硬盘信息 $ megacli -PDList -aALL # 查看电池信息 $ megacli -AdpBbuCmd -aAll # 查看raid卡日志 $ megacli -FwTermLog -Dsply -aALL # 显示适配器个数 $ megacli -adpCount # 显示适配器时间 $ megacli -AdpGetTime –aALL # 显示所有适配器信息 $ megacli -AdpAllInfo -aAll # 显示所有逻辑磁盘组信息 $ megacli -LDInfo -LALL -aAll # 显示所有的物理信息 $ megacli -PDList -aAll # 查看充电状态 $ megacli -AdpBbuCmd -GetBbuStatus -aALL |grep 'Charger Status' # 显示BBU状态信息 $ megacli -AdpBbuCmd -GetBbuStatus -aALL # 显示BBU容量信息 $ megacli -AdpBbuCmd -GetBbuCapacityInfo -aALL # 显示BBU设计参数 $ megacli -AdpBbuCmd -GetBbuDesignInfo -aALL # 显示当前BBU属性 $ megacli -AdpBbuCmd -GetBbuProperties -aALL # 显示Raid卡型号,Raid设置,Disk相关信息 $ megacli -cfgdsply -aALL ## 磁带状态的变化,从拔盘,到插盘的过程中。 Device |Normal |Damage |Rebuild |Normal Virtual Drive |Optimal|Degraded|Degraded|Optimal Physical Drive |Online |Failed Unconfigured|Rebuild|Online # 查看物理磁盘状态: $ megacli -PDRbld -ShowProg -PhysDrv [Enclosure Device ID:Slot Number] -a0 ## Rebuild 中的物理磁盘状态中会显示:"Firmware state: Rebuild" # 查询 Rebuild 进度: $ megacli -pdrbld -showprog -physdrv[E:S] -aALL ## 返回内容类似于下面这样: Rebuild Progress on Device at Enclosure 32, Slot 5 Completed 77% in 101 Minutes. # 以文本进度条样式显示 Rebuild 进度: $ megacli -pdrbld -progdsply -physdrv[E:S] -aALL ## 屏幕显示类似下面的内容: Rebuild progress of physical drives... Enclosure:Slot Percent Complete Time Elps 032 :05 #######################87 %################******* 01:59:07 Press key to quit... # 查看 RAID 卡 Rebuild 参数: $ megacli -AdpAllinfo -aALL | grep -i rebuild ## 返回结果类似下面这样 Rebuild Rate : 30% Auto Rebuild : Enabled Rebuild Rate : YesForce Rebuild : Yes # 设置 RAID 卡 Rebuild 比例为60%: $ megacli -AdpSetProp { RebuildRate -60} -aALL ## 设置成功后返回: Adapter 0: Set rebuild rate to 60% success. # 设置全局热备 $ megacli -PDHSP -Set [-EnclAffinity] [-nonRevertible] -PhysDrv[252:0] -a0 # 删除全局热备 $ megacli-PDHSP -Rmv -PhysDrv[32:5] -a0 ``` ## 1.1 分类整理Megacli用法 ### 常用命令使用 ``` $ megacli -LDInfo -Lall -aALL [查raid级别] $ megacli -AdpAllInfo -aALL [查raid卡信息] $ megacli -PDList -aALL [查看硬盘信息] $ megacli -AdpBbuCmd -aAll [查看电池信息] $ megacli -FwTermLog -Dsply -aALL [查看raid卡日志] $ megacli -adpCount [显示适配器个数] $ megacli -AdpGetTime –aALL [显示适配器时间] $ megacli -AdpAllInfo -aAll [显示所有适配器信息] $ megacli -LDInfo -LALL -aAll [显示所有逻辑磁盘组信息] $ megacli -PDList -aAll [显示所有的物理信息]、 $ megacli -PdLocate -start -physdrv[252:2] -a0 [点亮指定硬盘(定位)] $ megacli -CfgForeign -Clear -a0 [清除Foreign状态] $ megacli -AdpBbuCmd -GetBbuStatus -aALL |grep 'ChargerStatus' [查看充电状态] $ megacli -AdpBbuCmd -GetBbuStatus -aALL[显示BBU状态信息] $ megacli -AdpBbuCmd -GetBbuCapacityInfo -aALL[显示BBU容量信息] $ megacli -AdpBbuCmd -GetBbuDesignInfo -aALL [显示BBU设计参数] $ megacli -AdpBbuCmd -GetBbuProperties -aALL [显示当前BBU属性] $ megacli -cfgdsply -aALL [显示Raid卡型号,Raid设置,Disk相关信息] $ megacli -PDList -aAll -NoLog [查看所有硬盘的状态] $ megacli -LdPdInfo -aAll -NoLog [查看所有Virtual Disk的状态] ``` ### 查看磁盘缓存策略 ``` $ megacli -LDGetProp -Cache -L0 -a0 $ megacli -LDGetProp -Cache -L1 -a0 $ megacli -LDGetProp -Cache -LALL -a0 $ megacli -LDGetProp -Cache -LALL -aALL $ megacli -LDGetProp -DskCache -LALL -aALL ``` ### 设置磁盘缓存策略 缓存策略解释 WT (Write through WB (Write back) NORA (No read ahead) RA (Read ahead) ADRA (Adaptive read ahead) Cached Direct ``` $ megacli -LDSetProp WT|WB|NORA|RA|ADRA -L0 -a0 $ megacli -LDSetProp -Cached|-Direct -L0 -a0 enable / disable disk cache $ megacli -LDSetProp -EnDskCache|-DisDskCache -L0 -a0 ``` ### 缓存控制示例 ``` # 设置磁盘的缓存模式和访问方式 (Change Virtual Disk Cache and Access Parameters) Description Allows you to change the following virtual disk parameters: -WT (Write through), WB (Write back): Selects write policy. -NORA (No read ahead), RA (Read ahead), ADRA (Adaptive read ahead): Selects read policy. -Cached, -Direct: Selects cache policy. -RW, -RO, Blocked: Selects access policy. -EnDskCache: Enables disk cache. -DisDskCache: Disables disk cache. MegaCli -LDSetProp { WT | WB|NORA |RA | ADRA|-Cached|Direct} | {-RW|RO|Blocked} | {-Name[string]} | {-EnDskCache|DisDskCache} –Lx | -L0,1,2|-Lall -aN|-a0,1,2|-aALL MegaCli -LDSetProp WT -L0 -a0 # 显示磁盘缓存和访问方式(Display Virtual Disk Cache and Access Parameters) MegaCli -LDGetProp -Cache | -Access | -Name | -DskCache -Lx|-L0,1,2| -Lall -aN|-a0,1,2|-aALL Displays the cache and access policies of the virtual disk(s): -WT (Write through), WB (Write back): Selects write policy. -NORA (No read ahead), RA (Read ahead), ADRA (Adaptive read ahead): Selects read policy. -Cache, -Cached, Direct: Displays cache policy. -Access, -RW, -RO, Blocked: Displays access policy. -DskCache: Displays physical disk cache policy. ``` ### Raid 管理 **RAID Level对应关系** |---|---| |Raid信息|Raid级别| |:-----:|:-----:| |RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0|RAID 1| |RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0|RAID 0| |RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3|RAID 50| |RAID Level : Primary-1, Secondary-3, RAID Level Qualifier-0|RAID 10| ``` # 创建一个 raid5 阵列,由物理盘 2,3,4 构成,该阵列的热备盘是物理盘 5 $ megacli -CfgLdAdd -r5 [1:2,1:3,1:4] WB Direct -Hsp[1:5] -a0 # 创建阵列,不指定热备 $ megacli -CfgLdAdd -r5 [1:2,1:3,1:4] WB Direct -a0 # 查看RAID阵列中掉线的盘 $ megacli -pdgetmissing -a0 # 删除阵列 $ megacli -CfgLdDel -L1 -a0 # 替换坏掉的模块 $ megacli -pdreplacemissing -physdrv[12:10] -Array5 -row0 -a0 # 在线添加磁盘 $ megacli -LDRecon -Start -r5 -Add -PhysDrv[1:4] -L1 -a0 # 阵列创建完后,会有一个初始化同步块的过程,可以看看其进度。 $ megacli -LDInit -ShowProg -LALL -aALL 或者以动态可视化文字界面显示 $ megacli -LDInit -ProgDsply -LALL -aALL # 查看阵列后台初始化进度 $ megacli -LDBI -ShowProg -LALL -aALL # 或者以动态可视化文字界面显示 $ megacli -LDBI -ProgDsply -LALL -aALL # 指定第 5 块盘作为全局热备 $ megacli -PDHSP -Set [-EnclAffinity] [-nonRevertible] -PhysDrv[1:5] -a0 # 指定为某个阵列的专用热备 $ megacli -PDHSP -Set [-Dedicated [-Array1]] [-EnclAffinity] [-nonRevertible] -PhysDrv[1:5] -a0 # 删除全局热备 $ megacli -PDHSP -Rmv -PhysDrv[1:5] -a0 # 将某块物理盘下线/上线 $ megacli -PDOffline -PhysDrv [1:4] -a0 $ megacli -PDOnline -PhysDrv [1:4] -a0 # 手动开启 rebuid $ megacli -pdrbld -start -physdrv[12:10] -a0 # 关闭 rebuild $ megacli -AdpAutoRbld -Dsbl -a0 # 设置rebuild的速率 $ megacli -AdpSetProp RebuildRate -30 -a0 # 查看物理磁盘重建进度 Rebuild $ megacli -PDRbld -ShowProg -PhysDrv [1:5] -a0 # 或者以动态可视化文字界面显示 $ megacli -PDRbld -ProgDsply -PhysDrv [1:5] -a0 # 查看 ES $ megacli -PDList -aAll -NoLog | grep -Ei "(enclosure|slot)" ``` ### raid 电池设置相关 ``` # 查看电池状态信息(Display BBU Status Information) $ megacli -AdpBbuCmd -GetBbuStatus -aN|-a0,1,2|-aALL $ megacli -AdpBbuCmd -GetBbuStatus -aALL # 查看电池容量(Display BBU Capacity Information) $ megacli -AdpBbuCmd -GetBbuCapacityInfo -aN|-a0,1,2|-aALL $ megacli -AdpBbuCmd -GetBbuCapacityInfo –aALL # 查看电池设计参数(Display BBU Design Parameters) $ megacli -AdpBbuCmd -GetBbuDesignInfo -aN|-a0,1,2|-aALL $ megacli -AdpBbuCmd -GetBbuDesignInfo –aALL # 查看电池属性(Display Current BBU Properties) $ megacli -AdpBbuCmd -GetBbuProperties -aN|-a0,1,2|-aALL $ megacli -AdpBbuCmd -GetBbuProperties –aALL # 设置电池为学习模式为循环模式(Start BBU Learning Cycle) Description Starts the learning cycle on the BBU. No parameter is needed for this option. $ megacli -AdpBbuCmd -BbuLearn -aN|-a0,1,2|-aALL ``` ### megacli必知必会 ``` # 使用 LSI 的 megaraid 可以对 raid 进行有效监控。别的厂商比如 HP,IBM 也有自己的 raid API $ MegaCli -ldinfo -lall -aall 查询raid级别,磁盘数量,容量,条带大小。 $ MegaCli -cfgdsply -aALL |grep Policy 查询控制器cache策略 $ MegaCli -LDSetProp WB -L0 -a0 设置write back功能 $ MegaCli -LDSetProp CachedBadBBU -L0 -a0 设置即使电池坏了还是保持WB功能 $ MegaCli -AdpBbuCmd -BbuLearn a0 手动充电 $ MegaCli -FwTermLog -Dsply -aALL 查询日志 $ MegaCli -adpCount 显示适配器个数 # 显示所有适配器信息 $ MegaCli -AdpAllInfo -aAll Critical Disks : 0 Failed Disks : 0 # 显示所有逻辑磁盘组信息 $ MegaCli -LDInfo -LALL -aAll # 显示所有的物理信息 $ MegaCli -PDList -aAll Media Error Count: 0 Other Error Count: 0 # 查看充电状态 $ MegaCli -AdpBbuCmd -GetBbuStatus -aALL Learn Cycle Requested : No Fully Charged : Yes 显示BBU(后备电池)状态信息: MegaCli -AdpBbuCmd -GetBbuStatus -aALL 显示BBU容量信息: MegaCli -AdpBbuCmd -GetBbuCapacityInfo -aALL 显示BBU设计参数: MegaCli -AdpBbuCmd -GetBbuDesignInfo -aALL 显示当前BBU属性: MegaCli -AdpBbuCmd -GetBbuProperties -aALL 显示Raid卡型号,Raid设置,Disk相关信息: MegaCli -cfgdsply -aALL 查看Cache 策略设置: MegaCli -cfgdsply -aALL |grep -i Policy Current Cache Policy: WriteBack, ReadAheadNone, Direct, Write Cache OK if Bad BBU 查看充电进度百分比: MegaCli -AdpBbuCmd -GetBbuStatus -aALL ``` ### 详细参数以及用法[megacli note](https://gist.github.com/edolnx/1070906) ## 2.重要参数 |参数名称| 含义| |-----|-----| |Firmware state| 磁盘状态| |Firmware state: Online, Spun Up |磁盘正常| |Firmware state: Unconfigured(good), Spun Up|磁盘已安装,但未启用| |Firmware state: Unconfigured(bad)|故障, 对应hwcheck的 Non-Critical| |Firmware state: Failed|故障, 对应hwcheck的Critical| |Firmware state: Rebuild|重建,一般在更换磁盘时显示| |Enclosure Device ID: 32|设备| |Slot Number: 1|磁盘在服务器上的槽位| |Adapter #0|适配器编号,对应 -a 参数| ## 3.实战 `megacli LDPDInfo -Aall` 重点关注以下几点: `Media Error Count`、`Other Error Count`、`Predictive Failure Count`、`Last Predictive Failure`、`Drive has flagged a S.M.A.R.T alert` 如果这几个数值不为0,则可能为硬盘故障,需要更换硬盘。 如果磁盘编号不确定,可以通过让硬盘闪烁的方式来给硬盘定位 ``` 让指定硬盘闪灯 megacli -PdLocate -start -physdrv [E:S] -aALL 其中 E表示 Enclosure Device ID,S表示Slot Number。比如坏盘的位置为: Enclosure Device ID: 1 Slot Number: 0 megacli -PdLocate -start -physdrv[1:0] -a0 Adapter: 0: Device at EnclId-1 SlotId-0 — PD Locate Start Command was successfully sent to Firmware Exit Code: 0x00 关闭硬盘闪灯 megacli -PdLocate -stop -physdrv [E:S] -aALL ``` 如果raid中有硬盘故障,更换硬盘后,一般都无需做操作,阵列卡会自动做rebuild,从拔出硬盘到插入新盘,一般会有以下的过程: Device Normal —>Damage —>Rebuild —>Normal Virtual Drive Optimal —>Degraded —>Degraded —>Optimal Physical Drive Online —>Failed Unconfigured —>Rebuild —>Online - 查看Rebuild进度 ``` megacli -PDRbld -showprog -physDrv [E:S] -a0 一般输出如下 root@ubuntu-server:~# megacli -PDRbld -showprog -physDrv[0:23] -a0 Rebuild Progress on Device at Enclosure 0, Slot 23 Completed 50% in 519 Minutes. Exit Code: 0x00 ``` # 4.引用 [官方页面](http://hwraid.le-vert.net/wiki/DebianPackages) [Megacli命令的使用总结](https://www.jianshu.com/p/dcfd4bfba207) [使用 MegaCLI 检测磁盘状态并更换磁盘](https://www.huaweicloud.com/articles/958c415b78102baaff815514c50b0aa1.html) [megacli通过盘符定位物理盘_MEGACLI查看硬盘状态](https://blog.csdn.net/weixin_39613637/article/details/111857209) [megecli官方wiki](https://wikitech.wikimedia.org/wiki/MegaCli) Last modification:August 9, 2021 © Allow specification reprint Like 0 If you think my article is useful to you, please feel free to appreciate
One comment
Tks