PMM2 Vol0

Percona Monitoring and Management or PMM is a set of tools that will allow us to quickly and easily monitor our servers and especially databases. If we enable all the options, it will show us a lot of useful information about slow queries, locks, and other problems related to MySQL and MongoDB.

Metrics monitoring is done through Prometheus and their visualization through Grafana .

The entire installation will be done on Docker as it simplifies both the installation and future server updates.

We create a user-defined network to be able to use the CT names directly. When we restart CTs, the IP addresses will change but we will not have to touch the configuration thanks to having used our user-defined network. In a basic installation, this aspect is not important, but when we configure Alertmanager, it will be useful:

docker network create --driver bridge pmm-net

Updating to PMM2 does not allow the migration of old data, everything must be reinstalled from scratch.

PMM needs approximately 1GB of storage for each database to be monitored with a retention of one week. As for RAM, 2GB per database server are needed, but this requirement is not linear. For example, for 20 nodes, 16GB are needed, not 40.

Currently, PMM versioning works as follows:

pmm-server:latest -> Indicates the latest release of the PMM 1.X branch
pmm-server:2 -> Indicates the latest release of the PMM 2 branch

We download the PMM2 Docker image:

docker pull percona/pmm-server:2

We create the CT where PMM data will be stored. This CT will not be running, it will only serve as a storage CT. In future updates, we will delete the PMM CT, but the data will persist between updates since it resides in this other CT.

docker create -v /srv --name pmm-data percona/pmm-server:2 /bin/true

We create the PMM CT and map the prometheus.base.yml file. This will allow us to perform a custom configuration of Prometheus. For now, we leave it empty, we just create it:

touch prometheusConf/prometheus.base.yml
docker run -d -p 80:80 -p 443:443 -v $PWD/prometheusConf/prometheus.base.yml:/srv/prometheus/prometheus.base.yml --volumes-from pmm-data --dns 8.8.8.8 --dns-search=alfaexploit.com --network=pmm-net --name pmm-server --restart always percona/pmm-server:2

We access the PMM web interface:
http://PMM_IP (admin/admin)

We change the password.

We perform the basic PMM configuration:

PMM -> PMM Settings

We adjust the parameters according to our needs:

We configure the Alertmanager URL. In future articles, I will explain how to set it up. For now, we just configure it and paste the alert rules:

Alertmanager URL:
http://pmm-alertmanager:9093

Alertmanager rules:

groups:
- name: genericRules
  rules:
  - alert: BrokenNodeExporter
    expr: up{agent_type="node_exporter"} == 0
    for: 5m
    labels:
      severity: critical

We restart PMM:

docker stop pmm-server && docker start pmm-server

NOTE: Restarting Prometheus is a heavy process. It takes quite a while (4m) for the web interface to be operational again.

We check from the Prometheus interface that the rules are loaded:

Status -> Rules

In Alerts, we can see the list of alerts:

We can also check within the CT that the file contains the defined alarms:

docker exec -it pmm-server bash

[root@c848f5caa91c ~]# cat /etc/prometheus.yml

rule_files:
- /srv/prometheus/rules/*.rules.yml

[root@c848f5caa91c ~]# cat /srv/prometheus/rules/pmm.rules.yml

groups:
- name: genericRules
  rules:
  - alert: BrokenNodeExporter
    expr: up{agent_type="node_exporter"} == 0
    for: 5m
    labels:
      severity: critical

The client is installed as follows:

Ubuntu:

wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb
dpkg -i percona-release_latest.generic_all.deb
apt-get update
apt-get install pmm2-client
pmm-admin config --server-insecure-tls --server-url=https://admin:PASSWORD@PMM_SERVER:443

Gentoo:

wget https://www.percona.com/downloads/pmm2/2.6.1/binary/tarball/pmm2-client-2.6.1.tar.gz
tar xvzf pmm2-client-2.6.1.tar.gz
cd pmm2-client-2.6.1
./install_tarball

Add the installed binaries to the path:

PATH=$PATH:/usr/local/percona/pmm2/bin

Make the change permanent:

cd
echo "PATH=$PATH:/usr/local/percona/pmm2/bin" » .bashrc
echo "export BASH_ENV=~/.bashrc" » .bash_profile
echo "if [ -f ~/.bashrc ]; then source ~/.bashrc; fi" » .bash_profile

To make the exporters work, we must first configure/start the agent:

pmm-agent setup --config-file=/usr/local/percona/pmm2/config/pmm-agent.yaml --server-insecure-tls --server-address=PMM_SERVER:443 --server-username=admin --server-password=“PASSWORD”

If we set up incorrectly, the second time we run it, it will tell us:

Failed to register pmm-agent on PMM Server: Node with name "kr0mtest" already exists..

We will have to add the option: –force to the command.

Manually start the pmm-agent to check that it does not give any problems:

pmm-agent --config-file=/usr/local/percona/pmm2/config/pmm-agent.yaml

In another console, we can see the exporters:

pmm-admin list

Service type  Service name         Address and port  Service ID

Agent type                  Status     Agent ID                                        Service ID
pmm_agent                   Connected  /agent_id/aba88792-1544-420a-a091-cbffe93f9232 
node_exporter               Running    /agent_id/d35f58b3-a4e8-4367-8659-d726adf3641a

Daemonize the agent:

echo "nohup pmm-agent --config-file=/usr/local/percona/pmm2/config/pmm-agent.yaml &" > /etc/local.d/pmm.start
chmod 700 /etc/local.d/pmm.start

In the Grafana interface, we can now see data:

In PMM2, several things have changed compared to the previous version, one of them is that it performs several data collections (scraps) at different frequencies, the data that usually changes is collected more frequently and the more static data is collected less frequently, this presents a problem when an alarm goes off, it appears tripled:

job="node_exporter_agent_id_d35f58b3-a4e8-4367-8659-d726adf3641a_hr-5s"
job="node_exporter_agent_id_d35f58b3-a4e8-4367-8659-d726adf3641a_mr-10s"
job="node_exporter_agent_id_d35f58b3-a4e8-4367-8659-d726adf3641a_lr-1m0s"

We need to unify alarms when agent_id, agent_type, alertname, alertname, machine_id, node_id, node_id, node_name, node_type, and severity match.

In the screenshot, we can see the alarm is tripled. If we click on the expression link, it will take us to the query executor.

Here we can adapt the queries until they match the alerts we need. In this case, if we unify by node_name, it only shows one result:

sum(up{agent_type="node_exporter"} == 0) by (node_name)

We modify the alert configuration:

Alertmanager rules:

groups:
- name: genericRules
  rules:
  - alert: BrokenNodeExporter
    expr: sum(up{agent_type="node_exporter"} == 0) by (node_name)
    for: 5m
    labels:
      severity: critical

We restart PMM:

docker stop pmm-server && docker start pmm-server

Now the alerts are unified:

TROUBLESHOOTING
View CT logs:

docker logs -f pmm-server

docker exec -it pmm-server /bin/bash
tail /srv/logs/pmm-managed.log

Useful CT information:

docker inspect pmm-server

To check if we are correctly collecting a metric, we can consult it in the Prometheus interface, Status -> Targets, but it will ask for authentication. The user is always pmm, and the password is the exporter ID we are consulting.

We check the exporters on the server:

pmm-admin list

Service type  Service name         Address and port  Service ID

Agent type                  Status     Agent ID                                        Service ID
pmm_agent                   Connected  /agent_id/d94f4797-68df-4832-835f-e2b742f3a189  
node_exporter               Running    /agent_id/7e985a11-eb50-4c7a-9569-834666c3a934

In this case, there is only one, and its password is: /agent_id/7e985a11-eb50-4c7a-9569-834666c3a934

We can also consult it in the Prometheus web interface in the agent_id tag:

UPDATE
We check the current PMM version:

curl -k -u admin:PASSWORD -X POST “https://PMM_SERVER/v1/Updates/Check” | jq

We stop the current PMM and rename the CT:

docker stop pmm-server
docker rename pmm-server pmm-server-backup

We download the latest version of PMM:

docker pull percona/pmm-server:2

We start the new PMM reusing the data storage CT, so there is no loss of metrics between updates:

docker run -d -p 80:80 -p 443:443 -v $PWD/prometheusConf/prometheus.base.yml:/srv/prometheus/prometheus.base.yml --volumes-from pmm-data --dns 8.8.8.8 --dns-search=alfaexploit.com --network=pmm-net --name pmm-server --restart always percona/pmm-server:2

We check that the correct version is running:

docker ps
curl -k -u admin:PASSWORD -X POST “https://PMM_SERVER/v1/Updates/Check” | jq

We remove the backup:

docker rm pmm-server-backup

It can also be updated from the web interface (runs an Ansible playbook), but if the version is buggy, there will be no turning back:
http://PMMSERVER/graph/d/pmm-home/home-dashboard

We can read the release notes for the versions here .

NOTE: In case of a total reinstallation of the parent (Docker server), it will be necessary to backup any file that we import into the CT and the data CT.

PMM2 Vol0

See Also