As we already mentioned in a previous occasion, Zabbix is a highly configurable monitoring system that allows us to create our own check scripts. This time we will configure alarms that will check the health status of the well-known LSI disk controllers.
The repository of the scripts used can be found at the following address:
https://github.com/lesovsky/zabbix-extensions/tree/master/files/hwraid-megacli
We define the UserParameters:
UserParameter=megacli.adp.discovery,/var/lib/zabbix/externalscripts/megacli-adp-discovery.sh
UserParameter=megacli.ld.discovery,/var/lib/zabbix/externalscripts/megacli-ld-discovery.sh
UserParameter=megacli.pd.discovery,/var/lib/zabbix/externalscripts/megacli-pd-discovery.sh
We install a cron since the auto-discovery units are items reported through zabbix-trapper:
0 */1 * * * /var/lib/zabbix/externalscripts/megacli-raid-data-processor.sh
We download the configured scripts and remove the sudo as needed:
cd /var/lib/zabbix/externalscripts/
wget
https://raw.githubusercontent.com/lesovsky/zabbix-extensions/master/files/hwraid-megacli/scripts/megacli-adp-discovery.sh
sed -i ’s/sudo//g’ megacli-adp-discovery.sh
wget
https://raw.githubusercontent.com/lesovsky/zabbix-extensions/master/files/hwraid-megacli/scripts/megacli-ld-discovery.sh
sed -i ’s/sudo//g’ megacli-ld-discovery.sh
wget
https://raw.githubusercontent.com/lesovsky/zabbix-extensions/master/files/hwraid-megacli/scripts/megacli-pd-discovery.sh
sed -i ’s/sudo//g’ megacli-pd-discovery.sh
wget
https://raw.githubusercontent.com/lesovsky/zabbix-extensions/master/files/hwraid-megacli/scripts/megacli-raid-data-processor.sh
sed -i ’s/sudo//g’ megacli-raid-data-processor.sh
sed -i ’s//usr/libexec/zabbix-extensions/scripts///var/lib/zabbix/externalscripts//g’ megacli-raid-data-processor.sh
chmod 700 *.sh
NOTE: If we use proxies, the original script parses the data incorrectly:
zbx_server=$(grep ^Server= /etc/zabbix/zabbix_agentd.conf |cut -d= -f2|cut -d, -f1)
zbx_server=$(grep ^Server= /etc/zabbix/zabbix_agentd.conf |cut -d= -f2|cut -d, -f2)
The script is not well programmed, for example, if the temperature is N/A, it fails, so we change:
value=$(sed -n -e "/pd begin $adp $enc $pd/,/ld end $adp $enc $pd/p" $data_out |grep -m1 -w "^Drive Temperature" |awk '{print $3}' |grep -oE '[0-9]+')
value=$(sed -n -e "/pd begin $adp $enc $pd/,/ld end $adp $enc $pd/p" $data_out |grep -m1 -w "^Drive Temperature" |awk '{print $3}' |grep -oE '[0-9]+')
if [ -z $value ]; then
value='0'
fi
We restart the service:
We import the template into zbx:
We apply the template to the server in question from the Zabbix web interface.
TROUBLESHOOTING
The items come from the cronjob, we run the script and check the collected data:
"node00" megacli.adp.name[0] Supermicro SMC2208
"node00" megacli.ld.degraded[0] 0
"node00" megacli.ld.offline[0] 0
"node00" megacli.pd.total[0] 2
"node00" megacli.pd.critical[0] 0
"node00" megacli.pd.failed[0] 0
"node00" megacli.mem.err[0] 0
"node00" megacli.mem.unerr[0] 0
"node00" megacli.ld.state[0:0] Optimal
"node00" megacli.pd.media_error[0:252:0] 0
"node00" megacli.pd.other_error[0:252:0] 0
"node00" megacli.pd.pred_failure[0:252:0] 0
"node00" megacli.pd.state[0:252:0] Online
"node00" megacli.pd.temperature[0:252:0]
"node00" megacli.pd.media_error[0:252:1] 0
"node00" megacli.pd.other_error[0:252:1] 0
"node00" megacli.pd.pred_failure[0:252:1] 0
"node00" megacli.pd.state[0:252:1] Online
"node00" megacli.pd.temperature[0:252:1] 0
NOTE: All lines to be sent should be:
<hostname> <key> <value>
We can simulate alarms to check that Zabbix notifies us in case of problems:
"node00" megacli.pd.critical[0] 1
We can also check the alarm but skip the intermediate scripts: