This page looks best with JavaScript enabled

Zabbix docker + escalation

 ·  🎃 kr0m

One of the most interesting features that Zabbix offers us is alert escalation. With this system, we will be able to execute actions sequentially. In my case, there are two schedules: working hours and non-working hours.

During working hours, the first notification will be to send a Telegram message to the sysadmins, followed by a phone call. On the other hand, if it is outside working hours, it will send a Telegram message, call the on-call sysadmin, and if the problem is not resolved or an ACK is not received after the third call, it will call all sysadmins.

Let’s start by compiling the Zabbix client on our test client:

vi /etc/portage/package.use/zabbix

net-analyzer/zabbix agent curl -frontend ipv6 -java ldap libxml2 mysql odbc openipmi -oracle postgres -proxy -server snmp sqlite ssh ssl -static xmpp
emerge -av net-analyzer/zabbix

Next, we perform the basic configuration:

vi /etc/zabbix/zabbix_agentd.conf

PidFile=/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=128
Server=ZBXSERVER
ServerActive=ZBXSERVER
Hostname=Zabbix server

Start the service and add it to the default runlevel:

/etc/init.d/zabbix-agentd start
rc-update add zabbix-agentd default

We can check if it starts successfully by reviewing the logs:

tail -f /var/log/zabbix/zabbix_agentd.log


Now we proceed with the server, we will use Docker as it is a quick way to have it up and running without complications.

We create the Docker container, changing the port to 8080 since in my case port 80 is being used by another container:

docker run --name zabbix-appliance -t -p 10051:10051 -p 8080:80 -d zabbix/zabbix-appliance:latest

Access the web interface and change the default password for the Admin user:
http://ZBXSERVER:8080

Admin
zabbix

Navigate through the interface menus to:

Administration -> Users
Admin -> Password -> Change password

To send notifications via Telegram, we will use the ableev scripts:
https://github.com/ableev/Zabbix-in-Telegram/

Create the necessary bot to send Telegrams:
https://core.telegram.org/bots#creating-a-new-bot

Search for BotFather on Telegram and create the bot:

/newbot

In botfather, you will see:

Done! Congratulations on your new bot. You will find it at t.me/zbx4bot.

Use this token to access the HTTP API:

XXXXXXXXX:YYYYYYYYYYYYYYYYY

Start a conversation with the bot:

/start

If we want to use a group, create it, add the bot, and inside it:

/start@BOTNAME

Add a media type.

Administration -> Media types

Create Media type
Name: Telegram
Type: Script
Script name: zbxtg.py
Script parameters
{ALERT.SENDTO}
{ALERT.SUBJECT}
{ALERT.MESSAGE}

And another one for sending notifications to groups:

Create Media type
Name: TelegramGroup
Type: Script
Script name: zbxtg.py
Script parameters
{ALERT.SENDTO}
{ALERT.SUBJECT}
{ALERT.MESSAGE}
--group

Add the media type for phone calls.

Create Media type
Name: Phonecall
Type: Script
Script name: phonecall.sh

Find out the path where Zabbix will look for alarm notification scripts:

docker exec -it zabbix-appliance /bin/bash

grep AlertScriptsPath /etc/zabbix/zabbix_server.conf |grep -v ‘#’
AlertScriptsPath=/usr/lib/zabbix/alertscripts

Create the call script:

vi /usr/lib/zabbix/alertscripts/phonecall.sh

#! /bin/bash
curl -X POST http://PBXSERVER:PORT/alertSysAdminsFromZabbix --data "$1"
chown -R zabbix:zabbix /usr/lib/zabbix/alertscripts/*
chmod 700 /usr/lib/zabbix/alertscripts/*

We create a user for each sysadmin with the media types we have defined:

Administration -> Users

Create user
Alias: kr0m
Name: kr0m
Groups: Zabbix administrators

In Media, we define the phone number of that sysadmin:

Send to: PHONE

We also define the Telegram username of that sysadmin:

Send to: @USERNAME

The media for this sysadmin will be as follows:

The users must be of type Zabbix Super Admin:

NOTE: We must define a media type for the Admin user even if we are not going to use it for anything; otherwise, we will get the following error when trying to notify alerts:

No media defined for user.

We define an email media type, for example, and an invented email, but we disable it. The error will still appear, but the notifications will work.

By default, there is already a group of administrators (Zabbix administrators), we create another one for the on-call sysadmin.

Administration -> User groups

Create user group
Group name: sysadminsGuardia


We install the necessary scripts for Telegram notifications:

docker exec -it zabbix-appliance /bin/bash

We install git and clone the alert scripts repository:

apk add git
git clone https://github.com/ableev/Zabbix-in-Telegram.git
cd Zabbix-in-Telegram

We install pip and the script requirements:

apk add py2-pip
pip install -r requirements.txt

We copy the scripts to the AlertScriptsPath:

cp zbxtg.py /usr/lib/zabbix/alertscripts/
cp zbxtg_group.py /usr/lib/zabbix/alertscripts/
cp zbxtg_settings.example.py /usr/lib/zabbix/alertscripts/zbxtg_settings.py

We change the owner and adjust the permissions:

chown -R zabbix:zabbix /usr/lib/zabbix/alertscripts/*
chmod 700 /usr/lib/zabbix/alertscripts/*

We configure the script parameters with the data of the bot and the user we have created in Zabbix so that the script can obtain the alert images via API:

vi /usr/lib/zabbix/alertscripts/zbxtg_settings.py

tg_key = "XXXXXXX:YYYYYYYYYYYYYYY" # telegram bot api key
zbx_server = "http://127.0.0.1" # zabbix server full url
zbx_api_user = "zbxtg"
zbx_api_pass = "zbxtgPASSWORD"

NOTE: If we want to use graphs, we would need to use a Zabbix RO user and configure it in zbxtg_settings.py

zbx_api_user
zbx_api_pass

First, we create the RO group:

Administration -> User groups

Create user group
Group name: zbxtgRO
En Permissions: Linux servers Read
Administration -> Users

Create user
Alias: zbxtg
Groups: zbxtgRO

After saving, if we go back to the Permissions tab, we will see:

The notifications will be executed in chronological order:

  • Zabbix alarm
  • 6m Telegram sysadmins alarm or sysadmin on duty according to schedule
  • 12m - every 6m Call sysadmins or sysadmin on duty according to schedule
  • 30m Call all sysadmins

To make the notifications work, we will have to edit the action: Report problems to Zabbix administrators.

Configuration -> Actions
Event source: Triggers
Report problems to Zabbix administrators

We define the working hours:

Conditions:
Type of calculation: And/Or
Time period in 1-4,08:00-14:00
Time period in 1-4,15:00-18:00
Time period in 5,08:00-15:00
Enabled: True

We edit the operations associated with the action, as the call API has a limit of one call every 5m, we adjust the step duration to 6m to not exceed the limit.

Default operation step duration: 6m
Default subject: {TRIGGER.STATUS}: {TRIGGER.NAME} on {HOSTNAME}

If we want to use graphs:

Default message:
Last value: {ITEM.LASTVALUE1} ({TIME})
zbxtg;graphs
zbxtg;graphs_period=10800
zbxtg;itemid:{ITEM.ID1}
zbxtg;title:{HOST.HOST} - {TRIGGER.NAME}

In the Operations subsection, we edit the default one:

Steps: 2 - 2
Step duration: 0
Operation type: Send message
Send to User groups: Zabbix administrators
Send only to: Telegram

We define a second operation:

Steps: 3 - 0
Step duration: 0
Operation type: Send message
Send to User groups: Zabbix administrators
Send only to: Phonecall
Operation condition: Event acknowloedged equals Not Ack

Finally, it will look like this:

We create a new Action for the guards:

Configuration -> Actions
Event source: Triggers

We define the non-working hours:

Create action
Name: Guardias
Type of calculation: And/Or
Time period in 1-4,00:00-08:00
Time period in 1-4,18:00-24:00
Time period in 5,15:00-24:00
Time period in 6-7,00:00-24:00

In operations, we change the step duration:

Default operation step duration: 6m
Default subject: {TRIGGER.STATUS}: {TRIGGER.NAME} on {HOSTNAME}

We add an operation:

Steps: 2 - 2
Step duration: 0
Operation type: Send message
Send to User groups: sysadminsGuardia
Send only to: Telegram

After Telegram, we will call the on-call sysadmin:

Steps: 3 - 5
Step duration: 0
Operation type: Send message
Send to User groups: sysadminsGuardia
Send only to: Phonecall
Operation condition: Event acknowloedged equals Not Ack

Finally, we will call all sysadmins:

Steps: 6 - 0
Step duration: 0
Operation type: Send message
Send to User groups: Zabbix administrators
Send only to: Phonecall
Operation condition: Event acknowloedged equals Not Ack

Finally, it will look like this:

We assign a recovery operation in both actions.

Recovery operations:
Default subject: {TRIGGER.STATUS}: {TRIGGER.NAME} on {HOSTNAME}


We can manually check the sending of Telegrams:

docker exec -it zabbix-appliance /bin/bash
cd /usr/lib/zabbix/alertscripts/
./zbxtg.py "@USERID" "first part of a message" "second part of a message" --debug

We also check if we can make calls through our API:

./phonecall.sh PHONE_NUMBER

Or from the Zabbix web interface:

Administration -> Media types -> Test



We register our test server.

Configuration -> Hosts
Create Host

We assign the OS Linux template:

Template OS Linux by Zabbix agent

We configure the loss of communication with the Zabbix agent to be a disaster alarm.

Configuration -> Hosts
Triggers -> Zabbix agent is not available
Severity: Disaster

We verify that it works by stopping the client:

/etc/init.d/zabbix-agentd stop

The notifications via Telegram will appear as follows:


DEBUG:

Testing Telegram scripts:
https://github.com/ableev/Zabbix-in-Telegram/wiki/How-to-test-script-in-command-line

We can see a log of the actions taken by Zabbix at:

Administration -> Action log

If we want to view the CT logs:

docker logs zabbix-appliance
docker logs zabbix-appliance --follow

If we want to debug further, we need to modify the Zabbix server configuration. To do this, we temporarily modify the Docker entrypoint so that it does not overwrite our changes:

docker exec -it zabbix-appliance /bin/bash

We comment out all the places where update_zbx_config is called.

vi /usr/bin/docker-entrypoint.sh

Modify the Zabbix server configuration:

vi /etc/zabbix/zabbix_server.conf

DebugLevel=5

If we want to debug the sending of Telegrams in the Telegram media type, we need to add the --debug parameter:

Restart the container and verify that we can see the logs:

docker stop zabbix-appliance
docker start zabbix-appliance
docker logs zabbix-appliance --follow

The obtained graphs will be stored in /var/tmp/zbxtg/ as long as the --debug parameter is enabled:

docker exec -it zabbix-appliance /bin/bash
ls -al /var/tmp/zbxtg/

We can manually send Telegrams with graphs using:

su zabbix -l -s /bin/bash
cd /usr/lib/zabbix/alertscripts/
./zbxtg.py @TELEGRAMUSERID test "$(echo -e ‘zbxtg;graphs: \nzbxtg;graphs_period=3600\nzbxtg;itemid:30471\nzbxtg;title:ololo’)" --debug

NOTE: We can obtain the itemid by checking the previous logs using docker logs zabbix-appliance --follow.

Once we finish debugging, we will need to uncomment the calls to update_zbx_config in the entrypoint script.

Some interesting links can be:

If you liked the article, you can treat me to a RedBull here