Monitoring

Introduction

Why bother doing all this in a traditional native sense? There are use cases, maybe for smaller environments. In the conclusion to this guide, I evaluate the value of spending time on managing such a stack manually versus deploying the stack on a container platform. However, for getting to grips with all the moving parts of this monitoring and alerting stack, doing it all by hand is the best way to learn and understand the pros and cons.

Native Overview

At a high level, this is the final objective, it might look complex at first glance, but the target is a pair of each core component for high availability and connectivity relationships.

Overview

The rest of this guide uses the following Virtual Machines:

192.168.0.71 mon1
192.168.0.71 mon2

Environment

This guide assumes th ability to configure and access NFS shares and HAProxy, everything is done on Red Hat Enterprise Linux 8 or equivalent.

NFS File shares

This NFS share example is done using host a 192.168.0.70 which serves as a utilities Linux host attached to same network as nodes. This server will also run HAProxy to load balancer traffic to either mon1 or mon2.

[root@host ~]# dnf install nfs-utils -y
[root@host ~]# systemctl enable --now rpcbind
[root@host ~]# systemctl enable --now nfs-server

Create all the directories to share, needed for all the tasks in this guide:

[root@host ~]# mkdir -p /nfs/rules /nfs/http_targets /nfs/https_targets /nfs/alertmanagers /nfs/thanos /nfs/loki

Add the share to configuration:

[root@host ~]# vi /etc/exports
/nfs/rules               192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
/nfs/http_targets        192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
/nfs/https_targets       192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
/nfs/alertmanagers       192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
/nfs/thanos              192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
/nfs/loki                192.168.0.1/24(rw,sync,no_wdelay,no_root_squash,insecure)
NFS is typically NOT recommended for real environments, See https://thanos.io/tip/thanos/storage.md/ for configuring access to object storage and the supported clients.

Export the new share with:

[root@host ~]# exportfs -arv

Confirm the share is visible locally:

[root@host ~]# exportfs  -s
[root@host ~]# showmount -e 127.0.0.1

If showmount is not avaible install it:

[root@host ~]# dnf install nfs-utils -y

And from another host:

[root@host ~]# showmount -e 192.168.0.70

If required, open up the firewall ports needed:

[root@host ~]# firewall-cmd --permanent --add-service=nfs
[root@host ~]# firewall-cmd --permanent --add-service=rpc-bind
[root@host ~]# firewall-cmd --permanent --add-service=mountd
[root@host ~]# firewall-cmd --reload

HAProxy

One quick solution is to use HAproxy for load balancing, which I tend to use for local none SSL lab environments. As already mentions this will be on a seperate utilities server 192.168.0.70. Install HAProxy:

[root@host ~]# dnf install haproxy -y

Back up the original configuration file:

[root@host ~]# mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak

And add the following configuration (changing IPs for your environment)

[root@host ~]# vi /etc/haproxy/haproxy.cfg
global
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    stats socket /var/lib/haproxy/stats

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    30s
    timeout queue           1m
    timeout connect         30s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 30s
    timeout check           30s
    maxconn                 4000

listen stats
    bind 0.0.0.0:9000
    mode http
    balance
    timeout client 5000
    timeout connect 4000
    timeout server 30000
    stats uri /stats
    stats refresh 5s
    stats realm HAProxy\ Statistics
    stats auth admin:changeme
    stats admin if TRUE

# Load balancers:

This haproxy.conf example is the minimal configuration ready to add load balancers later.

Set the SELinux boolean to allow haproxy to connect to any port:

[root@host ~]# setsebool -P haproxy_connect_any=1

Open firewall to enable access to HAProxy:

[root@host ~]# firewall-cmd --permanent --add-port=9000/tcp --zone=public
[root@host ~]# firewall-cmd --reload

If SELinux is enforcing, label the new port:

[root@host ~]# semanage port -a -t http_port_t -p tcp 9000

Enable and start HAProxy:

[root@host ~]# systemctl enable haproxy.service --now

View the graphical statistics report at http://192.168.0.70:9000/stats. In this example the username is admin and password is changeme as defined in the haproxy.cfg previously.

This should be all set for adding load balancers.

Prometheus

Prometheus is a free software application used for event monitoring and alerting. It records real-time metrics in a time series database (allowing for high dimensionality) built using a HTTP pull model, with flexible queries and real-time alerting. The project is written in Go and licensed under the Apache 2 License.

The good news is that Prometheus is at the heart of this whole micro-service architecture. At its most basic it could be all that is needed. Any target to be scraped for metrics and any alerting rule is all done here using Prometheus. Every other component is peripheral, either extending or handing off responsibility or consuming data for visualisation and storage.

This section deal with deploying Prometheus on two Virtual Machines, mounting NFS shares for the target and rules configuration and load balancing the two Prometheus instances. Think of each instance of Prometheus as a Replica.

Prometheus

Deploy

Add a service user account:

[root@host ~]# useradd -m -s /bin/false prometheus

Create two directories:

[root@host ~]# mkdir -p /etc/prometheus /var/lib/prometheus

Change ownership of directories:

[root@host ~]# chown prometheus:prometheus /etc/prometheus /var/lib/prometheus/

Get the latest download link from https://prometheus.io/download/:

[root@host ~]# dnf install wget -y
[root@host ~]# wget https://github.com/prometheus/prometheus/releases/download/v2.26.0/prometheus-2.26.0.linux-amd64.tar.gz

Extract the archive and copy binaries into place:

[root@host ~]# dnf install tar -y
[root@host ~]# tar -xvf prometheus-2.26.0.linux-amd64.tar.gz
[root@host ~]# cd prometheus-2.26.0.linux-amd64
[root@host ~]# cp prometheus promtool /usr/local/bin/

Check the path is correct and versions:

[root@host ~]# prometheus --version
prometheus, version 2.26.0 (branch: HEAD, revision: 3cafc58827d1ebd1a67749f88be4218f0bab3d8d)
  build user:       root@a67cafebe6d0
  build date:       20210331-11:56:23
  go version:       go1.16.2
  platform:         linux/amd64
[root@host ~]# promtool --version
promtool, version 2.26.0 (branch: HEAD, revision: 3cafc58827d1ebd1a67749f88be4218f0bab3d8d)
  build user:       root@a67cafebe6d0
  build date:       20210331-11:56:23
  go version:       go1.16.2
  platform:         linux/amd64

Always use the IP Address or preferably DNS name, not localhost for scrape targets:

Global external_labels are added to either identify each prometheus instances in a HA configuration, or the prometheus cluster, if labels are identical on each instance.

Same config on both nodes

[root@host ~]# vi /etc/prometheus/prometheus.yml
# Global config
global:
  scrape_interval:     15s
  evaluation_interval: 15s
  scrape_timeout: 15s
  external_labels:
    cluster: prometheus-cluster
    region: europe
    environment: dev

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['0.0.0.0:9090']

Note that the scrape_configs includes only this prometheus target at this stage. In other words a running instance of Prometheus exposes metrics about itself, when further instances are added they need to be included for example:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['0.0.0.0:9090','192.168.0.72']

Create a service using systemd, adding --web.listen-address=:9090:

To reduce the time to begin archiving use minutes, for example:

    --storage.tsdb.max-block-duration=30m \
    --storage.tsdb.min-block-duration=30m \
[root@host ~]# vi /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Service
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
LimitNOFILE=65536
ExecStart=/usr/local/bin/prometheus \
    --config.file=/etc/prometheus/prometheus.yml \
    --storage.tsdb.path=/var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --storage.tsdb.max-block-duration=2h \
    --storage.tsdb.min-block-duration=2h \
    --web.listen-address=:9090

[Install]
WantedBy=multi-user.target

Start and enable the Prometheus:

[root@host ~]# systemctl daemon-reload
[root@host ~]# systemctl enable prometheus --now
[root@host ~]# systemctl status prometheus
Prometheus store its data under /var/lib/prometheus by default.
To avoid a gotcha, you may find on RHEL/CentOS 8 systems that cockpit is preconfigured. It turns out cockpit uses port 9090.

Open firewall port:

[root@host ~]# firewall-cmd --add-port=9090/tcp --permanent
[root@host ~]# firewall-cmd --reload

If SELinux is in enforcing mode, label the port:

[root@host ~]# dnf install -y policycoreutils-python-utils
[root@host ~]# semanage port -a -t http_port_t -p tcp 9090

A single Prometheus instance can then be access using a browser for example: http://192.168.0.71:9090/. Assuming all these steps have been repeated on a second node (192.168.0.72), add a load balancer for these two Prometheus instances using HAProxy.

On the host serving HAProxy:

[root@host ~]# vi /etc/haproxy/haproxy.cfg
# Prometheus LB
frontend prometheus-lb-frontend
    bind 192.168.0.70:9090
    default_backend prometheus-lb-backend

backend prometheus-lb-backend
    balance roundrobin
    server prometheus1 192.168.0.71:9090 check
    server prometheus2 192.168.0.72:9090 check

And restart HAProxy plus checking the status:

[root@host ~]# systemctl restart haproxy
[root@host ~]# systemctl status haproxy

Open firewall on HAProxy host too:

[root@host ~]# firewall-cmd --add-port=9090/tcp --permanent
[root@host ~]# firewall-cmd --reload

View the state of the load balancer using a browser at http://192.168.0.70:9000/stats.

View Prometheus via the load balancer using http://192.168.0.70:9090/.

Basics

A prometheus instance exposes metrics about itself, for example http://192.168.0.71:9090/metrics and the only target configuration included (at this stage) is itself.

Look at Targets in a browser:

Prometheus

Execute a query:

promhttp_metric_handler_requests_total{code="200"}
Prometheus

And observe there are no alerts configured yet:

Prometheus

Decouple config

Remember to think of each instance of Prometheus as a Replica behind the load balancer, this mean any instance of Prometheus need the same configuration. Deploying this stack natively on VMs or cloud instances (oppose to using containers), the config directories might as well be mounted file systems.

I’ve split HTTP and HTTPS into two separate directories to make it clear which targets are using SSL and because HTTPS targets require scheme: https to use the protocol.

Make three directories for the target config and rules:

[root@host ~]# mkdir -p /etc/prometheus/http_targets /etc/prometheus/https_targets /etc/prometheus/rules

Added the following to fstab:

[root@host ~]# vi /etc/fstab
192.168.0.70:/nfs/http_targets /etc/prometheus/http_targets nfs rw,sync,hard,intr 0 0
192.168.0.70:/nfs/https_targets /etc/prometheus/https_targets nfs rw,sync,hard,intr 0 0
192.168.0.70:/nfs/rules /etc/prometheus/rules nfs rw,sync,hard,intr 0 0

Ensure nfs-utils is installed:

[root@host ~]# dnf install nfs-utils -y

And mount the NFS shares (created at the start of this page):

[root@host ~]# mount -a

Now update the Prometheus configuration to read files from those directories for http_targets, https_targets and rules:

[root@host ~]# vi /etc/prometheus/prometheus.yml
scrape_configs:
  - job_name: 'http_targets'
    file_sd_configs:
    - files:
      - /etc/prometheus/http_targets/*.yml

  - job_name: 'https_targets'
    scheme: https
    file_sd_configs:
    - files:
      - /etc/prometheus/https_targets/*.yml

rule_files:
  - /etc/prometheus/rules/*.yml

And add the Prometheus target/s:

[root@host ~]# vi /etc/prometheus/http_targets/prometheus_targets.yml
---
- labels:
    service: prometheus
    env: staging
  targets:
  - 192.168.0.71:9090

Restart Prometheus:

[root@host ~]# systemctl restart prometheus

Everything should be the same except now the configuration is decoupled from any instance of Prometheus. When the second instance is added in this example prometheus_targets.yml should include both instances:

---
- labels:
    service: prometheus
    env: staging
  targets:
  - 192.168.0.71:9090
  - 192.168.0.72:9090

The repeat this configuration is shared for both mon1 and mon2, changes to the config mean the serveice needs to restarted on both nodes!

Restart Prometheus on both nodes:

[root@host ~]# systemctl restart prometheus

It’s obvious that managing this manually will become unmanageable, there will be quite a few services by the end. It is more realist to use Ansible for all of this, but that is not the point for this documentation, this is about truly getting a grip on. Once steps are ironed out, automating and orchestrating it will be a breeze.

Chronyd

It’s a good idea to make sure all servers and clients are in sync with their clocks, for reference:

[root@host ~]# dnf install chrony
[root@host ~]# systemctl start chronyd
[root@host ~]# systemctl enable chronyd
[root@host ~]# chronyc tracking

Recap

Everything from this point on involves adding target scrape configurations and alert rules for Prometheus. All the other components are peripheral to Prometheus, either extending or handing off services, or consuming data for other purposes, such as Grafana that uses Prometheus as a data source for displaying information in a graphical way.

node_exporter

node_exporter is a Prometheus exporter for hardware and OS metrics exposed by UNIX and Linux kernels. Think of it as a machine agent that exposes meters at the host level, for things such as CPU, Disk usage and memory etc.

In this guide, there are initially three VMs, utilities, mon1 and mon2, these steps for installing node_exporter are repeated on any node required to be monitored.

This high level diagram summarises to architecture. node_exporter is deployed on nodes, and the included in the Prometheus scrape targets config. Any number of these node_experter endpoints can be added to monitor infrastructure hosts. In this case, the nodes hosting Prometheus are included. The second Prometheus instance and other nodes are greyed out in the diagram, remember the second instance is a replica of the first.

node_exporter

Deploy node_exporter

[root@host ~]# useradd -m -s /bin/false node_exporter

Get the latest download link from https://prometheus.io/download/.

[root@host ~]# wget https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz

Extract the archive:

[root@host ~]# tar -xvf node_exporter-1.1.2.linux-amd64.tar.gz

Move into the extracted directory:

[root@host ~]# cd node_exporter-1.1.2.linux-amd64

Copy the node_exporter binary to a suitable path:

[root@host ~]# cp node_exporter /usr/local/bin/

Check version:

[root@host ~]# node_exporter --version
node_exporter, version 1.1.2 (branch: HEAD, revision: b597c1244d7bef49e6f3359c87a56dd7707f6719)
  build user:       root@f07de8ca602a
  build date:       20210305-09:29:10
  go version:       go1.15.8
  platform:         linux/amd64

Create a service for node_exporter using systemd, example includes a custom port 4100:

[root@host ~]# vi /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
    --web.listen-address=:9100

[Install]
WantedBy=multi-user.target

Open firewall port:

[root@host ~]# firewall-cmd --add-port=9100/tcp --permanent
[root@host ~]# firewall-cmd --reload

Start and enable the Node Exporter:

[root@host ~]# systemctl daemon-reload
[root@host ~]# systemctl enable node_exporter --now
[root@host ~]# systemctl status node_exporter

Update Targets

Add node_exporter to Prometheus scrape, include labels to identify the services. In this example nodes mon1 - 192.168.0.71, mon2 - 192.168.0.72 and utilities - 192.168.0.70 are added. Because this config is mounted by any Prometheus instance, the config is the same. This example includes three nodes:

[root@host ~]# vi /etc/prometheus/http_targets/node_exporter.yml
---
- labels:
    service: node_exporter
    env: staging
  targets:
  - 192.168.0.70:9100
  - 192.168.0.71:9100
  - 192.168.0.72:9100

Prometheus needs to be restarted on both Prometheus instances:

[root@host ~]# systemctl restart prometheus

At this stage, there are two prometheus targets and three node_exporter targets:

Targets

Alertmanager

Before deploying Alert Manager, alert rules can be added to Prometheus and fully tested. Alert will fully work and fire in Prometheus. All Alert Manager does is hook into Prometheus and handles the actual sending alert messages to what ever providers are configured such as email, and it takes care of de-duplication. For example, where two alert managers are in the equation, you don’t want both sending out an email for the same alert.

Rules

Consider the query node_filesystem_size_bytes{mountpoint="/boot"}, executing this in Prometheus should return each /boot file system for each of the nodes where metrics are scraped using node_exporter.

All an alert is, it such a query with an added condition.

node_filesystem_size_bytes{mountpoint="/boot"} > 1000000000
Alert Rules

In this case, increasing the number to where no condition is matched returns no results.

node_filesystem_size_bytes{mountpoint="/boot"} > 2000000000
Alert Rules

By working with Prometheus directly and tuning expressions and conditions is the best way of deriving alerts expressions.

To add an alert, create rules files under /etc/prometheus/rules/. These files can contain multiple alerts.

Example:

[root@host ~]# vi /etc/prometheus/rules/boot_fs.yml
groups:
- name: node_exporter
  rules:
    - alert: boot_partition
      expr: node_filesystem_size_bytes{mountpoint="/boot"} > 1000000000
      for: 1m
      labels:
        severity: warning
      annotations:
        title: Disk space filling up
        description: /boot is filling up

Restart each Prometheus instance systemctl restart prometheus.

That is fundamentally all there is to alerts, in this example all three nodes will start firing:

Alert Rules

Edit the alert and tweak the threshold to expr: node_filesystem_size_bytes{mountpoint="/boot"} > 2000000000 and restart Prometheus again, the alert with turn green:

Alert Rules

However, while this is functionally working, its not all too useful if you have to manually check the alerts in Prometheus. This is where Alert Manager comes into the equation.

Deploy Alert Manager

Two instances of alert manager are deployed, one on each node, each instance of alert manager needs to know about the other, so they can "gossip" and know if who has sent what and avoid duplicate alerts been sent out.

Prometheus configuration is also updated to include the Alert Manager instances, so it can off load the responsibility of dealing with what to do with them.

Alert Rules

Add a service user account:

[root@host ~]# useradd -m -s /bin/false alertmanager

Get the latest download link from https://prometheus.io/download/.

[root@host ~]# wget https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz

Extract the archive:

[root@host ~]# tar -xvf alertmanager-0.21.0.linux-amd64.tar.gz

Move into the extracted directory:

[root@host ~]# cd alertmanager-0.21.0.linux-amd64

Copy the alertmanager binary to a suitable path:

[root@host ~]# cp alertmanager /usr/local/bin/

Check verion:

[root@host ~]# alertmanager --version
alertmanager, version 0.21.0 (branch: HEAD, revision: 4c6c03ebfe21009c546e4d1e9b92c371d67c021d)
  build user:       root@dee35927357f
  build date:       20200617-08:54:02
  go version:       go1.14.4

Add the following configuration, you can use a regular Gmail account for SMTP, although it might be necessary to create app credentials, and add what receiver email address desired.

[root@host ~]# mkdir /etc/alertmanager
[root@host ~]# vi /etc/alertmanager/alertmanager.yml
global:
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'AlertManager <mailer@lab.com>'
  smtp_require_tls: true
  smtp_hello: 'alertmanager'
  smtp_auth_username: 'username'
  smtp_auth_password: 'changme'

route:
  group_by: ['instance', 'alert']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h
  receiver: staging

receivers:
  - name: 'staging'
    email_configs:
      - to: 'user@example.com'

Change permissions:

[root@host ~]# chown -R alertmanager:alertmanager /etc/alertmanager

Create a service for alertmanager using systemd, example includes the cluster.peer:

[root@host ~]# vi /etc/systemd/system/alertmanager.service
[Unit]
Description=Prometheus Alert Manager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
LimitNOFILE=65536
ExecStart=/usr/local/bin/alertmanager \
    --cluster.listen-address=192.168.0.71:9004 \
    --cluster.peer=192.168.0.72:9004 \
    --config.file=/etc/alertmanager/alertmanager.yml \
    --web.external-url=http://192.168.0.71:9093

WorkingDirectory=/etc/alertmanager

[Install]
WantedBy=multi-user.target
the second Alert Manager instance needs to point to the other peer --cluster.peer=192.168.0.71:9004 and its own IP for --cluster.listen-address=192.168.0.72:9004 and --web.external-url=http://192.168.0.72:9093

Make a directory to mount the alertmanagers.yml config file:

[root@host ~]# mkdir /etc/prometheus/alertmanagers

Add the NFS mount point:

[root@host ~]# vi /etc/fstab
192.168.0.70:/nfs/alertmanagers /etc/prometheus/alertmanagers nfs rw,sync,hard,intr 0 0
[root@host ~]# mount -a

Add alertmanagers.yml:

[root@host ~]# vi /etc/prometheus/alertmanagers/alertmanagers.yml
---
- targets:
  - 192.168.0.71:9093
  - 192.168.0.72:9093

Open firewall:

[root@host ~]# firewall-cmd --add-port=9093/tcp --permanent
[root@host ~]# firewall-cmd --add-port=9004/tcp --permanent
[root@host ~]# firewall-cmd --reload

Start and enable the Alert Manager:

[root@host ~]# systemctl daemon-reload
[root@host ~]# systemctl enable alertmanager.service --now

Add the following configuration to Prometheus configuration:

[root@host ~]# vi /etc/prometheus/prometheus.yml
alerting:
  alertmanagers:
  - static_configs:
    file_sd_configs:
    - files:
      - 'alertmanagers/alertmanagers.yml'

Restart Prometheus:

[root@host ~]# systemctl restart prometheus.service

With this configured, go back to Prometheus to configure some alerts. Alerts will only appear in Alert Manager if they fire.

Check the status of each alert manager for example http://192.168.0.71:9093 and http://192.168.0.72:9093

Alert Rules

These two Alert Manager instances can be added as a load balancer, on the host serving HAProxy:

[root@host ~]# vi /etc/haproxy/haproxy.cfg
# Alert Manager LB
frontend alertmanager-lb-frontend
    bind 192.168.0.70:9093
    default_backend alertmanager-lb-backend

backend alertmanager-lb-backend
    balance roundrobin
    server alertmanager1 192.168.0.71:9093 check
    server alertmanager2 192.168.0.72:9093 check

Restart HAProxy plus checking the status:

[root@host ~]# systemctl restart haproxy
[root@host ~]# systemctl status haproxy

Open firewall on HAProxy host too:

[root@host ~]# firewall-cmd --add-port=9093/tcp --permanent
[root@host ~]# firewall-cmd --reload

Experiment by changing the condition and causing an alert to fire vi /etc/prometheus/rules/boot_fs.yml (remember to restart prometheus on both nodes)

Alert Rules

Thanos

Thanos includes quite a few components, in this section three core ones are covered for achieving high availability and mainly long term storage for historic Prometheus metrics retention.

For lab work and testing NFS storage is used.

NFS is typically NOT recommended for real environments, See https://thanos.io/tip/thanos/storage.md/ for configuring access to object storage and the supported clients.
Thanos

You could skip this Thanos section and up to Grafana, Thanos provides a service to retain Prometheus metrics for the long term. However, setting up Grafana with a database will also retain historic metrics.

So why bother with Thanos? It’s a question of requirements, and in more complex deployments, where Prometheus instances maybe be spun up or down, and potential scraping an array of different targets. Thanos leverages the Prometheus 2.0 storage format to cost-efficiently store historical metric data in any object storage while retaining fast query latencies. Moreover, Thanos provides a global query view across all Prometheus installations and can merge data from Prometheus HA pairs on the fly. The key word here is efficiency.

Thanos Binary

The same Thanos binary is used for launching the Sidecar, Store and Query components.

Get the latest release link from https://github.com/thanos-io/thanos/releases/ and download it:

[root@host ~]# wget https://github.com/thanos-io/thanos/releases/download/v0.20.1/thanos-0.20.1.linux-amd64.tar.gz

Extract the archive:

[root@host ~]# tar -xvf thanos-0.20.1.linux-amd64.tar.gz

Move into the extracted directory and copy the two Prometheus binary files to a suitable path:

[root@host ~]# cd thanos-0.20.1.linux-amd64
[root@host ~]# cp thanos /usr/local/bin/

Confirm version:

[root@host ~]# thanos --version
thanos, version 0.20.1 (branch: HEAD, revision: 10023e4882414baa56a8955057f2678f1e724e8c)
  build user:       root@3d72611b2972
  build date:       20210430-12:34:50
  go version:       go1.16.3
  platform:         linux/amd64

Thanos Sidecar

Starting with the Thanos Sidecar, create a configuration directory for Thanos and a directory to use for mounting the NFS share for storing Prometheus data.

Note: both the Sidecar and Store components use the objstore.config-file which references the mount point.

[root@host ~]# mkdir -p /thanos /etc/thanos/
[root@host ~]# chown prometheus:prometheus /thanos /etc/thanos/

Check NFS share is visible from the host:

[root@host ~]# dnf install nfs-utils -y
[root@host ~]# showmount -e 192.168.0.70

Mount the NFS share:

[root@host ~]# vi /etc/fstab
192.168.0.70:/nfs/thanos /thanos               nfs     defaults        0 0
[root@host ~]# mount -a
[root@host ~]# df -h

Add the file system config:

[root@host ~]# vi /etc/thanos/file_system.yaml
type: FILESYSTEM
config:
  directory: "/thanos"
[root@host ~]# chown prometheus:prometheus /etc/thanos/file_system.yaml

Create a service for Thanos Sidecar using systemd, including options for the existing Prometheus data directory, Prometheus endpoint and the object store configuration file:

[root@host ~]# vi /etc/systemd/system/thanos_sidecar.service
[Unit]
Description=Thanos Sidecar
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/thanos sidecar \
    --tsdb.path=/var/lib/prometheus \
    --prometheus.url=http://0.0.0.0:9090 \
    --objstore.config-file=/etc/thanos/file_system.yaml \
    --http-address=0.0.0.0:10902 \
    --grpc-address=0.0.0.0:10901

[Install]
WantedBy=multi-user.target

Start and enable the Thanos Sidecar:

[root@host ~]# systemctl daemon-reload
[root@host ~]# systemctl enable thanos_sidecar --now
[root@host ~]# systemctl status thanos_sidecar

Open firewall:

[root@host ~]# firewall-cmd --add-port=10901/tcp --permanent
[root@host ~]# firewall-cmd --add-port=10902/tcp --permanent
[root@host ~]# firewall-cmd --reload

Thanos Store

Create a data directory:

[root@host ~]# mkdir /var/thanos-data-dir
[root@host ~]# chown prometheus:prometheus /var/thanos-data-dir
[root@host ~]# vi /etc/systemd/system/thanos_store.service
[Unit]
Description=Thanos Store
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/thanos store \
    --objstore.config-file=/etc/thanos/file_system.yaml \
    --http-address=0.0.0.0:10906 \
    --grpc-address=0.0.0.0:10905 \
    --data-dir=/var/thanos-data-dir \
    --log.level=debug

[Install]
WantedBy=multi-user.target

Start and enable the Thanos Store:

[root@host ~]# systemctl daemon-reload
[root@host ~]# systemctl enable thanos_store --now
[root@host ~]# systemctl status thanos_store

Open firewall:

[root@host ~]# firewall-cmd --add-port=10905/tcp --permanent
[root@host ~]# firewall-cmd --add-port=10906/tcp --permanent
[root@host ~]# firewall-cmd --reload

Thanos Query

Create a service for Thanos Query using systemd, note the store arguments, port 10905 is the Thanos Store for each instance and 10901 is the Thanos Sidecar for both instances.

[root@host ~]# vi /etc/systemd/system/thanos_query.service
[Unit]
Description=Thanos Query
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
LimitNOFILE=65536
ExecStart=/usr/local/bin/thanos query \
    --store=192.168.0.71:10905 \
    --store=192.168.0.72:10905 \
    --store=192.168.0.71:10901 \
    --store=192.168.0.72:10901 \
    --http-address=0.0.0.0:10904 \
    --grpc-address=0.0.0.0:10903

[Install]
WantedBy=multi-user.target

Start and enable the Thanos Query:

[root@host ~]# systemctl daemon-reload
[root@host ~]# systemctl enable thanos_query --now
[root@host ~]# systemctl status thanos_query

Open the firewall:

[root@host ~]# firewall-cmd --add-port=10904/tcp --permanent
[root@host ~]# firewall-cmd --reload

You should now be able to hit single instances directly, for example http://192.168.0.71:10904 and look at the stores, the following is with one node configured with the three Thanos components:

Thanos

And now with the second node and second instances of the Thanos components:

Thanos

You can use the Thanos Query to execute queries just like in Prometheus, the metrics are fed in directly from Prometheus via the Thanos Sidecar and the Thanos Store.

Thanos

Do a directory listing ls -al /thanos to see Prometheus data being written.

Data might not be written immediately, try reducing --storage.tsdb.max-block-duration=5m and --storage.tsdb.min-block-duration=5m in the promethues systemd service. It’s probably a good time to reboot each server too, check all the service come back up healthy.

These two Thanos Query instances can be added as a load balancer, on the host serving HAProxy:

[root@host ~]# vi /etc/haproxy/haproxy.cfg
# Thanos Query LB
frontend thanos-query-lb-frontend
    bind 192.168.0.70:10904
    default_backend thanos-query-lb-backend

backend thanos-query-lb-backend
    balance roundrobin
    server thanos-query1 192.168.0.71:10904 check
    server thanos-query2 192.168.0.72:10904 check

And restart HAProxy plus checking the status:

[root@host ~]# systemctl restart haproxy
[root@host ~]# systemctl status haproxy

Open firewall on HAProxy host too:

[root@host ~]# firewall-cmd --add-port=10904/tcp --permanent
[root@host ~]# firewall-cmd --reload

Grafana

Grafana is a popular technology used to compose observability dashboards in this case using Prometheus metrics, and later also logs using Loki and promtail.

It’s very simple to deploy, and as default it uses a local SQLite database for single instances, the only step necessary to achieve high availability is configure the database settings for any number of Grafana instance to a shared database such as PostgresSQL.

Grafana

Add the Grafana repository:

[root@host ~]# vi /etc/yum.repos.d/grafana.repo

OSS release:

[grafana]
name=grafana
baseurl=https://packages.grafana.com/oss/rpm
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt

And install Grafana:

[root@host ~]# dnf install grafana -y

Set up a PostgresQL database as per instructions here: https://www.richardwalker.blog/article/postgres/

For a single instance just start the service and go. In this case for a HA pair, update grafana.ini with the PostgreSQL database settings, as defined at the start of this page. Search for "database" in the ini file and add the following settings:

[root@host ~]# vi /etc/grafana/grafana.ini
type = postgres
host = 192.168.0.70:5432
name = grafana_db
user = grafana
password = changeme

Optionally, you can change the http port the service listens on:

# The http port  to use
http_port = 3000

Open firewall:

[root@host ~]# firewall-cmd --add-port=3000/tcp --permanent
[root@host ~]# firewall-cmd --reload

Start and enable Grafana:

[root@host ~]# systemctl daemon-reload
[root@host ~]# systemctl enable  grafana-server --now
[root@host ~]# systemctl status grafana-server

Grafana will be up and running, for example http://192.168.0.71:3000/. Log in with username admin and password admin. A password change is mandated on first login.

Repeating the steps on the second node will behave like like the first since there attached to the same database, log in with admin and the new password you set during the first install http://192.168.0.72:3000/.

Add the HAProxy load balancer:

[root@host ~]# vi /etc/haproxy/haproxy.cfg
# Grafana LB
frontend grafana-lb-frontend
    bind 192.168.0.70:3000
    default_backend grafana-lb-backend

backend grafana-lb-backend
    balance roundrobin
    server grafana1 192.168.0.71:3000 check
    server grafana2 192.168.0.72:3000 check

And restart HAProxy plus checking the status:

[root@host ~]# systemctl restart haproxy
[root@host ~]# systemctl status haproxy

Open firewall on HAProxy host too:

[root@host ~]# firewall-cmd --add-port=3000/tcp --permanent
[root@host ~]# firewall-cmd --reload

Load balancers

At this stage there should be four load balancers, all configured using round-robin.

HAProxy

The Grafana LB is used for using Grafana itself. The Prometheus and Thanos Query LBs can now be used a data sources in Grafana.

for reference, in this example there are:

Data Sources

In Grafana go to Configuration → Data Sources, and "Add data source", select "Prometheus", under HTTP → URL add the Thanos Query LB URL http://192.168.0.70:10904

Grafana

Example Dashboard

We have node_exporter running a some hosts and included them as scape targets in Prometheus. This is all out-of-the-box configuration. Most common metrics and dashboard have already being solved.

To proved a flavour of the power here, take a look at this dashboard: https://grafana.com/grafana/dashboards/1860. (There are lots of others to search for).

In Grafana, use the "+" sign in left hand menu and select "Import", enter the code, in this case "1860" and select "Load", then "Import:

Grafana

On the next screen, select Prometheus source in the drop down menu, and select "Import":

Grafana

You should now see some magic happen, the data source provides all the metrics from the node_exporter’s picking out the hosts, which are available to select in a drop down menu and using similar Prometheus query’s, visualise the data, quite impressive!

Grafana

Examining pre made dashboard such as this provides the inspiration and know-how for building custom dashboards.

Loki & Promtail

Originally, I was planning on (and had written) a ton of documentation on Elasticsearch, Fluentd and Kibana. (EFK). I’ve never been a fan of the stack, but a version ships with OpenShift, and it’s the only valid open-source alternative to Splunk (TTBOMK). It is, however, a resource-hungry, cumbersome beast and something I’ve never really seen implemented and used in the real world (I’m sure it used widely and capable.)

Only very recently have I discovered Loki, which ties very nicely into this whole Prometheus stack. It enables logging to be accessible in Grafana along with all the metrics. It’s lightweight and probably the most exciting thing I’ve discovered in recent years.

I’m such a fan of Loki and Promtail that I’ve dropped EFK like a bad habit and will never look back. Hence, this last simple section, following the same pattern as all those Prometheus components. Not too much fuss, and all the monitoring and logging via to one excellent portal, Grafana.

Loki

Find the latest release at https://github.com/grafana/loki/releases/.

[root@host ~]# dnf install wget unzip -y

Download loki:

[root@host ~]# wget https://github.com/grafana/loki/releases/download/v2.2.1/loki-linux-amd64.zip

Extract the contents:

[root@host ~]# unzip loki-linux-amd64.zip

Move the binary to /usr/local/bin/ renaming it in the process:

[root@host ~]# mv loki-linux-amd64 /usr/local/bin/loki

Restore the SELinux contexts to avoid the service failing to start:

[root@host ~]# restorecon -R /usr/local/bin/

Check the version:

[root@host ~]# loki --version
loki, version 2.2.1 (branch: HEAD, revision: babea82e)
  build user:       root@e2d295b84e26
  build date:       2021-04-06T00:52:41Z
  go version:       go1.15.3
  platform:         linux/amd64

Add a service user account:

[root@host ~]# useradd -m -s /bin/false loki

Create a directory:

[root@host ~]# mkdir -p /var/loki/ /etc/loki/
[root@host ~]# chown loki:loki /var/loki/ /etc/loki/

Permanently mount the NFS share:

[root@host ~]# vi /etc/fstab
192.168.0.70:/nfs/loki /var/loki/               nfs     defaults        0 0
[root@host ~]# mount -a
[root@host ~]# df -h

Add the following configuration taking note of the directory /var/loki/mon1/index and /var/loki/mon1/chunks, which can be named per node to keep the data segmented.

[root@host ~]# vi /etc/loki/config.yml
auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  max_transfer_retries: 0

schema_config:
  configs:
    - from: 2018-04-15
      store: boltdb
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 168h

storage_config:
  boltdb:
    directory: /var/loki/mon1/index

  filesystem:
    directory: /var/loki/mon1/chunks

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h

chunk_store_config:
  max_look_back_period: 0s

table_manager:
  retention_deletes_enabled: false
  retention_period: 0s
[root@host ~]# vi /etc/systemd/system/loki.service
[Unit]
Description=Loki
Wants=network-online.target
After=network-online.target

[Service]
User=loki
Group=loki
Type=simple
ExecStart=/usr/local/bin/loki \
    -config.file /etc/loki/config.yml

[Install]
WantedBy=multi-user.target

Open firewall:

[root@host ~]# firewall-cmd --add-port=3100/tcp --permanent
[root@host ~]# firewall-cmd --reload

Start and enable Loki:

[root@host ~]# systemctl daemon-reload
[root@host ~]# systemctl enable loki.service --now

Promtail

This example added /var/log/messages, the promtail service will be done using the `root' user because it needs permission to read that log file.

Download promtail:

[root@host ~]# wget https://github.com/grafana/loki/releases/download/v2.2.1/promtail-linux-amd64.zip

Extract it:

[root@host ~]# unzip promtail-linux-amd64.zip

Move the binary to /usr/local/bin/ renaming it in the process:

[root@host ~]# mv promtail-linux-amd64 /usr/local/bin/promtail

Restore the SELinux file context:

[root@host ~]# restorecon -R /usr/local/bin/

Check the version:

[root@host ~]# promtail --version
promtail, version 2.2.1 (branch: HEAD, revision: babea82e)
  build user:       root@e2d295b84e26
  build date:       2021-04-06T00:52:41Z
  go version:       go1.15.3
  platform:         linux/amd64

Create a directory to keep the promtail configuration:

[root@host ~]# mkdir -p /etc/promtail/

Add the following configuration, notice naming the file mon1-positions.yml:

[root@host ~]# vi /etc/promtail/config.yml
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/loki/mon1-positions.yml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
- job_name: system
  static_configs:
  - targets:
      - localhost
    labels:
      job: varlogs
      __path__: /var/log/messages

Add a systemd service for promtail:

[root@host ~]# vi /etc/systemd/system/promtail.service
[Unit]
Description=Promtail
Wants=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/promtail \
    -config.file /etc/promtail/config.yml

[Install]
WantedBy=multi-user.target

Open firewall:

[root@host ~]# firewall-cmd --add-port=9080/tcp --permanent
[root@host ~]# firewall-cmd --reload

Start and enable the Promtail:

[root@host ~]# systemctl daemon-reload
[root@host ~]# systemctl enable promtail.service --now

Promtail can be accessed, for example at http://192.168.0.71:9080, to view what targets have been configured per node.

Promtail

Grafana

Almost there, finally goto "Configuration" →"Data Sources" in Grafana, Select "Add data source", scroll down to select "Loki":

Loki Data Source

Enter the details of a Loki end-point:

Loki Data Source

Select "Save & Test", then goto "Explore" in Grafana. From the dropdown select the data source previously added, and have an explore. In this example only /var/log/messages was configured. Enter {filename="/var/log/messages"} in the query field and see logs streaming:

Loki Data Source

As a test use the following command on the mon1 host to write something custom to the logs and see it appear in Grafana:

logger "Hello from mon1!"

Conclusion

As you can see, there are a lot of moving parts and components. The whole Prometheus suite is essentially a microservice approach. Googling for information is dominated by Kubernetes or OpenShift contexts.

With this in mind, the next natural step to take after manually deploying all these components is looking to Ansible and automating the process. Managing the platform with Ansible works perfectly for small deployments, with a hand full of hosts and applications to monitor.

When looking to manage all of this at scale, the problems and time demand maintaining it will become exponential.

The clues are there as to why the Prometheus stack is generally deployed on the container platforms, as mentioned earlier. Most importantly, it would help if you considered where you spend your time. Is it worth managing a whole monitoring stack when all these problems are already solved? This Prometheus stack can be automatically deployed using operators or Helm charts on either Kubernetes or OpenShift. Replicas of each component can be increased or decreased with little effort. Spending time focusing on a container platform is probably a better investment. Using such a platform solves more problems than just monitoring, of course.

That said, there is still value in gaining an intimate understanding of the low-level workings of Prometheus before blindly deploying it in an obfuscated fashion. And there are still valid use cases, for example I’ll be using single instances of all these components for my one Django web application.

It is all about knowing where to draw the line between practicality and scalability. Know requirements, know where to spend time and effort to get the most value.