5.1 Application Monitoring
Collecting Application Metrics
When running applications in production, a fast feedback loop is a key factor. The following reasons show why it’s essential to gather and combine all sorts of metrics when running an application in production:
- to make sure that an application runs smoothly
- to be able to see production issues and send alerts
- to debug an application
- to make business and architectural decisions
- metrics can also help to decide on how to scale applications
Application Metrics provide insights into what is happening inside our Quarkus Applications using the MicroProfile Metrics specification.
Those Metrics (e.g. Request Count on a specific URL) are collected within the application and then can be processed with tools like Prometheus for further analysis and visualization.
Prometheus is a monitoring system and timeseries database which integrates great with all sorts of applications and platforms.
The basic principle behind Prometheus is to collect metrics using a polling mechanism. There are a lot of different so-called exporters, where metrics can be collected from.
In our case, the metrics will be collected from a specific path provided by the application (/metrics
)
Architecture
On our lab cluster, a Prometheus / Grafana stack is already deployed. Using the service discovery capability of the Prometheus - Kubernetes integration the running Prometheus server will be able to locate our application almost out of the box.
- Prometheus running in the namespace
pitc-infra-monitoring
- Prometheus must be able to collect Metrics from the running application, by sending GET Requests (Network Policy)
- Prometheus must know where to go and where to collect the metrics from
Annotation vs. Service Monitor
In an early stage of Prometheus - Kubernetes integration, the configuration has been done by annotations. The Prometheus - Kubernetes integration worked by reading specific configured annotations from Kubernetes resources. The information form those annotations helped the Prometheus Server to find the endpoints to collect Metrics from.
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/scheme: http
prometheus.io/port: "8080"
The current OpenShift - Prometheus integration works differently and is way more flexible. It is based on the ServiceMonitor CustomResource.
oc explain ServiceMonitor
Task 5.1.1: Check project setup
We first check that the project is ready for the lab.
Ensure that the LAB_USER
environment variable is set.
echo $LAB_USER
If the result is empty, set the LAB_USER
environment variable.
command hint
export LAB_USER=<username>
Change to your main Project.
command hint
oc project $LAB_USER
Don’t forget to deploy/update your resources with the git instead of the oc command for this lab.
Task 5.1.2: Create Service Monitor
Let’s now create our first ServiceMonitor.
Create the following ServiceMonitor resource as local file <workspace>/servicemonitor.yaml
.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: amm-techlab
name: amm-techlab-monitor
spec:
endpoints:
- interval: 30s
port: http
scheme: http
path: /metrics
selector:
matchLabels:
application: amm-techlab
Let ArgoCD create the ServiceMonitor by adding the file to git and push it.
command hint
git add servicemonitor.yaml && git commit -m "Add ServiceMonitor Manifest" && git push
In hurry and do not want to wait for ArgoCD to sync? Do it manually by applying the file.
command hint
oc apply -f servicemonitor.yaml
Expected result: servicemonitor.monitoring.coreos.com/amm-techlab-monitor created
Warning
Your current user must have the following rights in the current namespace:oc policy add-role-to-user monitoring-edit <user> -n <username>
Tell your trainer if you get a permission error while creating the ServiceMonitorTask 5.1.3: Verify whether the Prometheus Targets gets scraped or not
Prometheus is integrated into the OpenShift Console under the Menu Item Monitoring. But as part of this lab, we want to use Grafana to interact with prometheus. Open Grafana (https://grafana.techlab.openshift.ch/) and switch to the explore tab, then execute the following query to check whether your target is configured or not:
Note
Make sure to replace<username>
with your current namespaceprometheus_sd_discovered_targets{config="serviceMonitor/<username>/amm-techlab-monitor/0"}
Expected result on the bottom of the Graph: two targets (Consumer and provider) similar to:
prometheus_sd_discovered_targets{cluster="console.techlab.openshift.ch", config="serviceMonitor/<username>/amm-techlab-monitor/0", container="kube-rbac-proxy", endpoint="metrics", instance="10.128.2.18:9091", job="prometheus-user-workload", name="scrape", namespace="openshift-user-workload-monitoring", pod="prometheus-user-workload-1", prometheus="openshift-monitoring/k8s", service="prometheus-user-workload"}
prometheus_sd_discovered_targets{cluster="console.techlab.openshift.ch", config="serviceMonitor/<username>/amm-techlab-monitor/0", container="kube-rbac-proxy", endpoint="metrics", instance="10.131.0.33:9091", job="prometheus-user-workload", name="scrape", namespace="openshift-user-workload-monitoring", pod="prometheus-user-workload-0", prometheus="openshift-monitoring/k8s", service="prometheus-user-workload"}
Task 5.1.4: How does it work
The Prometheus Operator “scans” namespaces for ServiceMonitor CustomResources. It then updates the ServiceDiscovery configuration accordingly.
The selector part in the Service Monitor defines in our case which services will be auto discovered.
# servicemonitor.yaml
...
selector:
matchLabels:
application: amm-techlab
...
And the corresponding Service
apiVersion: v1
kind: Service
metadata:
name: data-producer
labels:
application: amm-techlab
...
This means Prometheus scrapes all Endpoints where the application: amm-techlab
label is set.
The spec
section in the ServiceMonitor resource allows us now to further configure the targets Prometheus will scrape.
In our case Prometheus will scrape:
- every 30 seconds
- look for a port with the name
http
(this must match the name in the Service resource) - it will srcape the path
/metrics
usinghttp
This means now: since all three Services data-producer
, data-consumer
and data-transformer
have the matching label application: amm-techlab
, a port with the name http
is configured and the matching pods provide metrics on http://[Pod]/metrics
, Prometheus will scrape data from these pods.
Task 5.1.5: Query Application Metrics
Since the Metrics are now collected from all three services, let’s execute a query and visualize the data. For example, the total amount of Produced, Consumed and Transformed Messages.
Note
Make sure to replace<username>
with your current namespace.sum(application_ch_puzzle_quarkustechlab_reactiveproducer_boundary_ReactiveDataProducer_producedMessages_total{namespace="<username>"})
Then click Add Query
and enter the transformed messages query.
sum(application_ch_puzzle_quarkustechlab_reactivetransformer_boundary_ReactiveDataTransformer_messagesTransformed_total{namespace="<username>"})
Add another query with Add Query
and enter the consumed messages query.
sum(application_ch_puzzle_quarkustechlab_reactiveconsumer_boundary_ReactiveDataConsumer_consumedMessages_total{namespace="<username>"})
Finally click Run Query
to execute the queries.
Note
You can ignore the warning about the rate/sum function after you execute the query. For our small example it draws a nicer graph.Solution
The needed resource files are available inside the folder manifests/05.0/5.1/ of the techlab github repository.
If you weren’t successful, you can update your project with the solution by cloning the Techlab Repository git clone https://github.com/puzzle/amm-techlab.git
. You need to add the new file into your git repository. If not, ArgoCD will delete the resources again.
- go to your workspace:
cd ~/amm-workspace
- copy the solution:
cp <path-to-the-amm-techlab-repo>/manifests/05.0/5.1/* .
- let ArgoCD do it’s work:
git add servicemonitor.yaml && git commit -m "Add ServiceMonitor Manifest" && git push