A maintenance window is a period of time during which a server will be down in a controlled manner for an administrative task, whether it is replacing some hardware, updating some software that requires a restart, or it may even be due to some operation performed on the network electronics that connects the server.
During these periods of time, we must disable the alarms of the intervened server so that it does not disturb while the tasks are being carried out. In this article, we will program a small script to generate such maintenance windows in AlertManager quickly and easily.
The script in question is as follows:
#!/usr/bin/env bash
# https://github.com/prometheus/alertmanager/blob/master/api/v2/openapi.yaml
if [ "$#" -ne 2 ]; then
echo "ERROR: Instance and manteinance window time in minutes must be indicated!"
exit
fi
URL='http://pmm.alfaexploit.com:9093/api/v2/silences'
USERNAME='admin'
PASSWORD='XXXXX'
SERVER=$1
# Convert to seconds
MANTEINANCE_WINDOW=$(($2 * 60))
#echo MANTEINANCE_WINDOW: $MANTEINANCE_WINDOW's'
uname -a|grep FreeBSD 1>/dev/null
if [ $? -eq 0 ]; then
OS=FREEBSD
else
OS=LINUX
fi
currentEpochDate=$(date +%s)
if [ $OS == "FREEBSD" ]; then
startsAt=$(date -u -r $currentEpochDate +%Y-%m-%dT%H:%M:%S)
else
startsAt=$(date -u -d @$currentEpochDate +%Y-%m-%dT%H:%M:%S)
fi
echo "startsAt-UTC: $startsAt"
finalEpochDate=$((currentEpochDate + MANTEINANCE_WINDOW))
if [ $OS == "FREEBSD" ]; then
endsAt=$(date -u -r $finalEpochDate +%Y-%m-%dT%H:%M:%S)
else
endsAt=$(date -u -d @$finalEpochDate +%Y-%m-%dT%H:%M:%S)
fi
echo "endsAt-UTC: $endsAt"
curl -sS -H "Content-Type: application/json" -u $USERNAME:$PASSWORD $URL -X POST -d '{"comment": "Kr0m-manteinanceWindow","createdBy": "Kr0m-manteinanceWindow","startsAt": "'"$startsAt"'", "endsAt": "'"$endsAt"'","matchers": [{"isRegex": false,"name": "instance","value": "'"$SERVER"'"}]}'
We assign the necessary permissions:
We try to generate a 2-hour maintenance silence:
startsAt-UTC: 2020-11-21T10:21:31
endsAt-UTC: 2020-11-21T12:21:31
{"silenceID":"1288e299-096c-419a-8285-5f5bbec2ceae"}
If we access the AlertManager interface, we can see the silence:
http://pmm.alfaexploit.com:9093/#/alerts
We can see the 2-hour maintenance in the details:
“When we finish with the maintenance tasks, we simply need to expire the silence by pressing the Expire button.”