Monitoring LSI controllers with Zabbix

As we already mentioned in a previous occasion, Zabbix is a highly configurable monitoring system that allows us to create our own check scripts. This time we will configure alarms that will check the health status of the well-known LSI disk controllers.

The repository of the scripts used can be found at the following address:
https://github.com/lesovsky/zabbix-extensions/tree/master/files/hwraid-megacli

We define the UserParameters:

vi /etc/zabbix/conf.d/megacli.conf

UserParameter=megacli.adp.discovery,/var/lib/zabbix/externalscripts/megacli-adp-discovery.sh
UserParameter=megacli.ld.discovery,/var/lib/zabbix/externalscripts/megacli-ld-discovery.sh
UserParameter=megacli.pd.discovery,/var/lib/zabbix/externalscripts/megacli-pd-discovery.sh

We install a cron since the auto-discovery units are items reported through zabbix-trapper:

crontab -e

0 */1 * * * /var/lib/zabbix/externalscripts/megacli-raid-data-processor.sh

We download the configured scripts and remove the sudo as needed:

cd /var/lib/zabbix/externalscripts/
wget https://raw.githubusercontent.com/lesovsky/zabbix-extensions/master/files/hwraid-megacli/scripts/megacli-adp-discovery.sh
sed -i ’s/sudo//g’ megacli-adp-discovery.sh

wget https://raw.githubusercontent.com/lesovsky/zabbix-extensions/master/files/hwraid-megacli/scripts/megacli-ld-discovery.sh
sed -i ’s/sudo//g’ megacli-ld-discovery.sh

wget https://raw.githubusercontent.com/lesovsky/zabbix-extensions/master/files/hwraid-megacli/scripts/megacli-pd-discovery.sh
sed -i ’s/sudo//g’ megacli-pd-discovery.sh

wget https://raw.githubusercontent.com/lesovsky/zabbix-extensions/master/files/hwraid-megacli/scripts/megacli-raid-data-processor.sh
sed -i ’s/sudo//g’ megacli-raid-data-processor.sh
sed -i ’s//usr/libexec/zabbix-extensions/scripts///var/lib/zabbix/externalscripts//g’ megacli-raid-data-processor.sh

chmod 700 *.sh

NOTE: If we use proxies, the original script parses the data incorrectly:

vi /var/lib/zabbix/externalscripts/megacli-raid-data-processor.sh

zbx_server=$(grep ^Server= /etc/zabbix/zabbix_agentd.conf |cut -d= -f2|cut -d, -f1)
zbx_server=$(grep ^Server= /etc/zabbix/zabbix_agentd.conf |cut -d= -f2|cut -d, -f2)

The script is not well programmed, for example, if the temperature is N/A, it fails, so we change:

value=$(sed -n -e "/pd begin $adp $enc $pd/,/ld end $adp $enc $pd/p" $data_out |grep -m1 -w "^Drive Temperature" |awk '{print $3}' |grep -oE '[0-9]+')

value=$(sed -n -e "/pd begin $adp $enc $pd/,/ld end $adp $enc $pd/p" $data_out |grep -m1 -w "^Drive Temperature" |awk '{print $3}' |grep -oE '[0-9]+')
if [ -z $value ]; then
 value='0'
fi

We restart the service:

/etc/init.d/zabbix-agentd restart

We import the template into zbx:

wget https://raw.githubusercontent.com/lesovsky/zabbix-extensions/master/files/hwraid-megacli/hwraid-megacli-template.xml

We apply the template to the server in question from the Zabbix web interface.

TROUBLESHOOTING

The items come from the cronjob, we run the script and check the collected data:

/var/lib/zabbix/externalscripts/megacli-raid-data-processor.sh

cat /run/zabbix-sender-megacli-raid-data.in

"node00" megacli.adp.name[0] Supermicro SMC2208
"node00" megacli.ld.degraded[0] 0
"node00" megacli.ld.offline[0] 0
"node00" megacli.pd.total[0] 2
"node00" megacli.pd.critical[0] 0
"node00" megacli.pd.failed[0] 0
"node00" megacli.mem.err[0] 0
"node00" megacli.mem.unerr[0] 0
"node00" megacli.ld.state[0:0] Optimal
"node00" megacli.pd.media_error[0:252:0] 0
"node00" megacli.pd.other_error[0:252:0] 0
"node00" megacli.pd.pred_failure[0:252:0] 0
"node00" megacli.pd.state[0:252:0] Online
"node00" megacli.pd.temperature[0:252:0]
"node00" megacli.pd.media_error[0:252:1] 0
"node00" megacli.pd.other_error[0:252:1] 0
"node00" megacli.pd.pred_failure[0:252:1] 0
"node00" megacli.pd.state[0:252:1] Online
"node00" megacli.pd.temperature[0:252:1] 0

NOTE: All lines to be sent should be:

<hostname> <key> <value>

We can simulate alarms to check that Zabbix notifies us in case of problems:

vi /run/zabbix-sender-megacli-raid-data.in

"node00" megacli.pd.critical[0] 1

We can also check the alarm but skip the intermediate scripts:

zabbix_sender -z ZBX_SERVER_IP -s node00 -k megacli.pd.critical[0] -o 1

Monitoring LSI controllers with Zabbix

TROUBLESHOOTING

See Also