5.1 Application Monitoring

Application Monitoring with Prometheus und Grafana

Collecting Application Metrics

When running applications in production, a fast feedback loop is a key factor. The following reasons show why it’s essential to gather and combine all sorts of metrics when running an application in production:

  • to make sure that an application runs smoothly
  • to be able to see production issues and send alerts
  • to debug an application
  • to make business and architectural decisions
  • metrics can also help to decide on how to scale applications

Application Metrics provide insights into what is happening inside our Quarkus Applications using the MicroProfile Metrics specification.

Those Metrics (e.g. Request Count on a specific URL) are collected within the application and then can be processed with tools like Prometheus for further analysis and visualization.

Prometheus is a monitoring system and timeseries database which integrates great with all sorts of applications and platforms.

The basic principle behind Prometheus is to collect metrics using a polling mechanism. There are a lot of different so-called exporters, where metrics can be collected from.

In our case, the metrics will be collected from a specific path provided by the application (/metrics)

Architecture

On our lab cluster, a Prometheus / Grafana stack is already deployed. Using the service discovery capability of the Prometheus - Kubernetes integration the running Prometheus server will be able to locate our application almost out of the box.

  • Prometheus running in the namespace pitc-infra-monitoring
  • Prometheus must be able to collect Metrics from the running application, by sending GET Requests (Network Policy)
  • Prometheus must know where to go and where to collect the metrics from

Annotation vs. Service Monitor

In an early stage of Prometheus - Kubernetes integration, the configuration has been done by annotations. The Prometheus - Kubernetes integration worked by reading specific configured annotations from Kubernetes resources. The information form those annotations helped the Prometheus Server to find the endpoints to collect Metrics from.

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/scheme: http
    prometheus.io/port: "8080"

The current OpenShift - Prometheus integration works differently and is way more flexible. It is based on the ServiceMonitor CustomResource.

oc explain ServiceMonitor

Task 5.1.1: Check project setup

We first check that the project is ready for the lab.

Ensure that the LAB_USER environment variable is set.

echo $LAB_USER

If the result is empty, set the LAB_USER environment variable.

command hint
export LAB_USER=<username>

Change to your main Project.

command hint
oc project $LAB_USER

Don’t forget to deploy/update your resources with the git instead of the oc command for this lab.

Task 5.1.2: Create Service Monitor

Let’s now create our first ServiceMonitor.

Create the following ServiceMonitor resource as local file <workspace>/servicemonitor.yaml.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: amm-techlab
  name: amm-techlab-monitor
spec:
  endpoints:
  - interval: 30s
    port: http
    scheme: http
    path: /metrics
  selector:
    matchLabels:
      application: amm-techlab

source

Let ArgoCD create the ServiceMonitor by adding the file to git and push it.

command hint
git add servicemonitor.yaml && git commit -m "Add ServiceMonitor Manifest" && git push

In hurry and do not want to wait for ArgoCD to sync? Do it manually by applying the file.

command hint
oc apply -f servicemonitor.yaml

Expected result: servicemonitor.monitoring.coreos.com/amm-techlab-monitor created

Task 5.1.3: Verify whether the Prometheus Targets gets scraped or not

Prometheus is integrated into the OpenShift Console under the Menu Item Monitoring. But as part of this lab, we want to use Grafana to interact with prometheus. Open Grafana (https://grafana.techlab.openshift.ch/) and switch to the explore tab, then execute the following query to check whether your target is configured or not:

prometheus_sd_discovered_targets{config="serviceMonitor/<username>/amm-techlab-monitor/0"}

Expected result on the bottom of the Graph: two targets (Consumer and provider) similar to:

prometheus_sd_discovered_targets{cluster="console.techlab.openshift.ch", config="serviceMonitor/<username>/amm-techlab-monitor/0", container="kube-rbac-proxy", endpoint="metrics", instance="10.128.2.18:9091", job="prometheus-user-workload", name="scrape", namespace="openshift-user-workload-monitoring", pod="prometheus-user-workload-1", prometheus="openshift-monitoring/k8s", service="prometheus-user-workload"}
prometheus_sd_discovered_targets{cluster="console.techlab.openshift.ch", config="serviceMonitor/<username>/amm-techlab-monitor/0", container="kube-rbac-proxy", endpoint="metrics", instance="10.131.0.33:9091", job="prometheus-user-workload", name="scrape", namespace="openshift-user-workload-monitoring", pod="prometheus-user-workload-0", prometheus="openshift-monitoring/k8s", service="prometheus-user-workload"}

Task 5.1.4: How does it work

The Prometheus Operator “scans” namespaces for ServiceMonitor CustomResources. It then updates the ServiceDiscovery configuration accordingly.

The selector part in the Service Monitor defines in our case which services will be auto discovered.

# servicemonitor.yaml
...
  selector:
    matchLabels:
      application: amm-techlab
...

And the corresponding Service

apiVersion: v1
kind: Service
metadata:
  name: data-producer
  labels:
    application: amm-techlab
...

This means Prometheus scrapes all Endpoints where the application: amm-techlab label is set.

The spec section in the ServiceMonitor resource allows us now to further configure the targets Prometheus will scrape. In our case Prometheus will scrape:

  • every 30 seconds
  • look for a port with the name http (this must match the name in the Service resource)
  • it will srcape the path /metrics using http

This means now: since all three Services data-producer, data-consumer and data-transformer have the matching label application: amm-techlab, a port with the name http is configured and the matching pods provide metrics on http://[Pod]/metrics, Prometheus will scrape data from these pods.

Task 5.1.5: Query Application Metrics

Since the Metrics are now collected from all three services, let’s execute a query and visualize the data. For example, the total amount of Produced, Consumed and Transformed Messages.

sum(application_ch_puzzle_quarkustechlab_reactiveproducer_boundary_ReactiveDataProducer_producedMessages_total{namespace="<username>"})

Then click Add Query and enter the transformed messages query.

sum(application_ch_puzzle_quarkustechlab_reactivetransformer_boundary_ReactiveDataTransformer_messagesTransformed_total{namespace="<username>"})

Add another query with Add Query and enter the consumed messages query.

sum(application_ch_puzzle_quarkustechlab_reactiveconsumer_boundary_ReactiveDataConsumer_consumedMessages_total{namespace="<username>"})

Finally click Run Query to execute the queries.

Solution

The needed resource files are available inside the folder manifests/05.0/5.1/ of the techlab github repository.

If you weren’t successful, you can update your project with the solution by cloning the Techlab Repository git clone https://github.com/puzzle/amm-techlab.git. You need to add the new file into your git repository. If not, ArgoCD will delete the resources again.

  • go to your workspace: cd ~/amm-workspace
  • copy the solution: cp <path-to-the-amm-techlab-repo>/manifests/05.0/5.1/* .
  • let ArgoCD do it’s work: git add servicemonitor.yaml && git commit -m "Add ServiceMonitor Manifest" && git push