A monitoring solution with Docker

November 22, 2022

Docker Compose is a great way to set up small test environments locally or remotely. It allows to define your infrastructure as code and does not require any prerequisite tasks or after deployments tasks.

The Docker installation is well documented at https://docs.docker.com/get-docker/ and is well supported amongst the most popular operating systems. The installation itself will not be covered in this article. If you want to get familiar with the details of Docker, start with the documentation at https://docs.docker.com/get-started/.

All code used in this article is available at: https://github.com/insani4c/docker-monitoring-stack.

In this article, we will see how to set up a monitoring solution based on:

To monitor the deployed containers, we will also deploy Google’s cadvisor container, to get some interesting statistics and details in our Prometheus/ Grafana setup.

The Docker Compose file, called docker-compose.yml, contains all the information of the infrastructure such as:

network information
volumes
services (the containers)
…

Let’s start from the top of the file.

version: '3.8'

name: docmon

volumes:
  grafana-data: {}
  alertmanager-data: {}
  prometheus-data: {}
  loki-data: {}

At first at line 1, the version Docker Compose version is specified, to define which specifications are allowed. At line 3 a name for the container group or stack is set. And finally starting from line 5, data volumes (think disks) are defined, which will be used by the containers. These are persistent data volumes which will be reused unless the container has been completely removed.

Next, we will define the services in the docker-compose.yml:

services:
  cadvisor:
    image: 'gcr.io/cadvisor/cadvisor:latest'
    container_name: cadvisor
    restart: always
    mem_limit: 512m
    mem_reservation: 32m
    # ports: 
    #   - '8880:8080'
    volumes:
      - '/:/rootfs:ro'
      - '/var/run:/var/run:ro'
      - '/sys:/sys:ro'
      - '/var/lib/docker/:/var/lib/docker:ro'
      - '/dev/disk/:/dev/disk:ro'
    privileged: true
    devices: 
      - '/dev/kmsg:/dev/kmsg'

  prometheus:
    image: 'prom/prometheus:latest'
    container_name: prometheus
    restart: always
    mem_limit: 2048m
    mem_reservation: 256m
    cpus: 2
    # ports:
    #   - '9090:9090'
    volumes:
      - '$PROMETHEUS_HOME/config:/etc/prometheus'
      - 'prometheus-data:/prometheus'
    extra_hosts:
      myrouter: 192.168.1.1
      myswitch: 192.168.1.10
    depends_on:
      - cadvisor

Containers are defined as services. Each service will require at least:

a service name (example line 2 and line 20)
an image definition

All other options are optional or required by specific images.

The first image or container defined in the above example is cadvisor. This service provides statistics from Docker and the deployed containers to Prometheus. To be able to provide this information, the container must have read access to certain file paths or sockets on the hypervisor (read: the server where the Docker containers will be running). These are provided in the volumes section of the container. Here, directory paths on the hypervisor will be provided as mount partitions in the container, and they will be mounted with the readonly (:ro) parameter so that the container can’t make any changes to them.

Furthermore it provides access to a device (to read kernel messages), set memory and cpu limits and will run the container in privilege mode. The ports section has been put in comments, as it isn’t really require to expose ports, or make them available outside the Docker ecosystem. In our example, only Prometheus must be able to connect to it, and since Prometheus will be deployed as a container, we don’t need to be able to access the web service running on the container to read out the metrics or see the statistics.

The next container defined is called prometheus. For this container, volumes will be mounted to provide the Prometheus configuration files and to store the data to the volume called prometheus-data. It also defines an extra_hosts. These are entries that are typically defined in an /etc/hosts file, which Docker does not read from the hypervisor. And instead of deploying or mounting the hypervisor’s hosts file, extra host mappings can be defined or handed to the container, which is set up in the extra_hosts section as above.

At the end of the prometheus container definition, a depends_on section is configured, which means that the prometheus container won’t be deployed until the container names defined in that section are up and running.

Next we will define all other containers (see the second tab in the above code block, called the rest).

  hypervisor:
    image: 'prom/node-exporter:latest'
    container_name: hypervisor
    mem_limit: 128m
    mem_reservation: 32m
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'
      - '/proc:/host/proc:ro'
      - '/sys:/host/sys:ro'
    command:
      - '--path.rootfs=/host'
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
      - '--collector.systemd'
      - '--collector.cgroups'
    depends_on:
      - cadvisor

  prom_snmp:
    image: 'prom/snmp-exporter:latest'
    container_name: prom_snmp
    restart: always
    mem_limit: 128m
    mem_reservation: 32m
    # ports: 
    #   - '9116:9116'
    volumes:
      - '$PROMSNMP_HOME/config:/etc/snmp_exporter'
    extra_hosts:
      myrouter: 192.168.1.1
      myswitch: 192.168.1.10
    depends_on:
      - cadvisor
      - prometheus

  alertmanager:
    image: 'prom/alertmanager:latest'
    container_name: alertmanager
    restart: always
    mem_limit: 256m
    mem_reservation: 32m 
    # ports:
    #   - 9093:9093
    volumes:
      - '$ALERTMANAGER_HOME/config/alertmanager.yml:/etc/alertmanager/config.yml'
      - 'alertmanager-data:/alertmanager'
    command:
      - '--config.file=/etc/alertmanager/config.yml'
      - '--storage.path=/alertmanager'
    depends_on:
      - cadvisor
      - prometheus

  loki:
    image: 'grafana/loki:latest'
    container_name: loki
    restart: always
    mem_limit: 32768m
    mem_reservation: 8192m
    cpus: 6 
    ports:
      - '3100:3100'
    volumes:
      - '$LOKI_HOME/config:/etc/loki'
      - 'loki-data:/loki'
    depends_on:
      - cadvisor
      - prometheus
      - alertmanager

  blackbox_exporter:
    image: 'prom/blackbox-exporter:latest'
    container_name: blackbox_exporter
    restart: always
    mem_limit: 128m
    mem_reservation: 32m
    dns:
      - 8.8.8.8
      - 8.8.4.4
    # ports:
    #   - 9115:9115
    volumes:
      - '$BLACKBOXEXPORTER_HOME/config:/etc/blackboxexporter/'
    command:
      - '--config.file=/etc/blackboxexporter/config.yml'
    depends_on:
      - cadvisor
      - prometheus

  promtail:
    image: grafana/promtail:latest
    container_name: promtail
    restart: always
    mem_limit: 256m
    mem_reservation: 64m
    volumes:
      - $PROMTAIL_HOME/config:/etc/promtail/
      # to read container labels and logs
      - '/var/run/docker.sock:/var/run/docker.sock:ro'
      - '/var/lib/docker/containers:/var/lib/docker/containers:ro'
      - '/var/log/ulog:/var/log/ulog/:ro'
    depends_on:
      - cadvisor
      - loki

  grafana:
    image: 'grafana/grafana:latest'
    container_name: grafana
    restart: always
    mem_limit: 2048m
    mem_reservation: 256m
    ports:
      - '3000:3000'
    volumes:
      - '$GRAFANA_HOME/config:/etc/grafana'
      - 'grafana-data:/var/lib/grafana'
      - '$GRAFANA_HOME/dashboards:/var/lib/grafana/dashboards'
    depends_on:
      - cadvisor
      - prometheus
      - loki
      - alertmanager

The rest of the code will deploy:

a container called hypervisor, which is actually the Prometheus node-exporter for the hypervisor.
a container called prom_snmp, which will retrieve SNMP statistics
a container called blackbox_exporter, which mainly checks webservers and their SSL certificates
a container called promtail, which collects logs and log statistics from the hypervisor
a container called loki, which allows to store and index logs sent to the service by promtail (either container or as a service running on some external server)

Finally, the last container deployed is the grafana container. Besides its normal configuration file grafana.ini, the Docker container will also automatically provision (the provisioning sub directory in the config directory) datasources and dashboards so that no manual after-tasks are required once the containers are running.

The datasources can be preconfigured in a YAML file called default.yaml, stored in the provisioning/datasources/ sub directory.

apiVersion: 1

datasources:
 - name: Alertmanager
   type: alertmanager 
   access: proxy
   orgId: 1
   url: http://alertmanager:9093
   version: 1
   editable: false
   isDefault: false
   uid: DS_ALERTMANAGER
   jsonData:
    implementation: prometheus
 - name: Prometheus
   type: prometheus
   access: proxy
   orgId: 1
   url: http://prometheus:9090
   version: 1
   editable: false
   isDefault: true
   uid: DS_PROMETHEUS
   jsonData:
    alertmanagerUid: DS_ALERTMANAGER
    manageAlerts: true
    prometheusType: Prometheus
    prometheusVersion: 2.39.1
 - name: Loki
   type: loki 
   access: proxy
   orgId: 1
   url: http://loki:3100
   version: 1
   editable: false
   isDefault: false
   uid: DS_LOKI
   jsonData:
    alertmanagerUid: DS_ALERTMANAGER
    manageAlerts: true

Same thing goes for dashboards we want to have automatically deployed:

apiVersion: 1

providers:
 - name: 'default'
   orgId: 1
   folder: 'Custom'
   folderUid: ''
   type: file
   options:
     path: /var/lib/grafana/dashboards

Finally, if Docker is running on multiple network interfaces (for it is a hosted server, or it has internal and external IP addresses), you might want to limit access to the container to specific networks only.

Below is a netfilter example, which allows traffic only coming from 192.168.1.0/24 and from the network interface enp35so:

iptables -I DOCKER-USER -i enp35s0 ! -s 192.168.1.0/24 -m conntrack --ctdir ORIGINAL -j DROP

The chain DOCKER-USER is not flushed by Docker and thus can be created in a general firewall script of netfilter configuration, even at boot time:

-N DOCKER-USER
-I DOCKER-USER -i enp35s0 ! -s 192.168.1.0/24 -m conntrack --ctdir ORIGINAL -j DROP

Johnny Morano Author

Jenkins to manage a libvirt infrastructure with Terraform

byJohnny Morano

August 18, 2022

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

A monitoring solution with Docker

Jenkins to manage a libvirt infrastructure with Terraform

Using multipath together with mdadm on Debian

Trending Tags

A monitoring solution with Docker

Leave a Reply Cancel reply

Previous Post

Jenkins to manage a libvirt infrastructure with Terraform

A monitoring solution with Docker

Jenkins to manage a libvirt infrastructure with Terraform

Using multipath together with mdadm on Debian

A monitoring solution with Docker

Leave a Reply Cancel reply

Previous Post

Jenkins to manage a libvirt infrastructure with Terraform

Related Posts

Postgresql: Monitor sequence scans with Perl

Perl: Archive E-Mails in an IMAP Folder

OSSEC: building an OpenBSD package