Cheat sheet

Author @Saief1999

1. Prometheus, Grafana

we should use snake_case to name metrics in prometheus ( as a norm ) Example : http_request_total

counter

Metrics :

Counters ( Difference in time + rate , variety in time, we don't want the number of elements, but the rate each second for exemple)
Gauges ( a certain value at a certain time : How much free memory we have)
Histograms : example , how many request we managed to answer under one second, between 1~2 seconds, +3 seconds ect... we divide our values into buckets. ( intervals )

How to show metrics externally :

Approach 1 : our applications exports its metrics to a db that's compatible with prometheus within fixed intervals
- Our application will depend from Prometheus
- it will handle the different errors in case it can't access it
- Might be more suitable for a Lambda function
Approach 2 : expose metrics in an endpoints ( like /metrics ) and let prometheus access it.
- decoupling between db an app
- Generally more used

Labels : used to identify variables having the same name ( like a property on a metric )

We first create a metric and give it all the possible labels ( that can have multiple values ), we then specify the label when we increment for example

In order to register our application in Prometheus :

Method 1 : Static configuration

we add this to prometheus.yml

- job_name: "fact"
  static_configs:
      - targets: ["localhost:5000"]

Method 2 : Dynamic configuration (using Service Discovery)

We can use PromQL to query the metrics

Graph : Shows the progress of metrics in time

Alternative to Prometheus : DataDog ( we send data to it and then show the different metrics )

Grafana : a dashboard that uses prometheus as a source of truth

We first configure the data sources ( such as Prometheus )

There's two types of metrics :

application metrics
host metrics ( Status of the machine , for example )

we can use prometheus-node-exporter ( from the package manager ) to serve system monitoring info on port 9100 , we need to add it later as another job in prometheus, and later we add a dashboard for it in grafana ( we can import one done by the community )

2. Helm

In Kubernetes, we define the desired state in a declarative way . We define the desired state using YAML configuration files. Kubernetes tries to get to the desired state from the current state.

Helm helps us package multiple kubernetes components together. and Makes it configurable in order to run full setups easily and efficiently

Helm is also a template engine :

Input : Values.yaml + template
Output: Manifest Files

We create an Azure cluster ( with Dev/Test and container registry, the rest is default

run az login

run az aks get-credentials --resource-group RESOURCEGROUPE_NAME --name CLUSTER_NAME

running:

kubectl get nodes

we get the nodes.

Note : We can also use kind which is similar to minikube

We add the repo for the charts via :

helm add repo <repo_url>

helm search repo prometheus-community

To install the prometheus chart and give it the name prometheus we run this ( brings the definition, uses the default values, generates the manifests, applies the config ):

helm install prometheus prometheus-community/prometheus

this is similar to doing

helm template ... | kubectl apply

To see all applications currently installed through helm:

helm list

To only generate the template without applying

helm template prometheus prometheus-community/prometheus

To select a configuration file to use for templating, we add this part to the previous command:

-f values.yaml

to uninstall a chart

helm uninstall prometheus

In order for a directory to contain a Helm chart, it should contain a Chart.yaml

After creating the Helm chart ( and the deployment inside), we can do :

helm install "fact-service" .

in Kubernetes : we have an overlay network : even if two pods are far from each other ( in two different nodes ) they share the same local network with each other and can communicate

nodeport : attach port in node to service

Load balancer : IP/port but in the cloud provider level , not in the node level

To forward the Requests from the prometheus server port 80 to 9090 in our machine , we do

kubectl port-forward pod/prometheus-server-6cfc854b6f-44s4v 9090:9090

Before, we linked our metrics statically (by providing a link for prometheus), this doesn't scale well.

Now we'll try to configure our servies to dynamically bind to prometheus and send the logs automatically

Inversion Of Control: Instead of control in prometheus to decide which services to scrap from, we give control to applications to enable/disable scraping

Kubernetes has an API Server : Gateway to the cluster

We will give Prometheus access to the Api server to get pods that have a certain label , if they enable scraping, prometheus will scrap from them.

-> In this case, the Api gateway will play the role of a service registry for prometheus to get the list of pods

For prometheus to get access to list pods feature, we will create a Service Account

3. Watchdog 1

Log levels :

Debug
Info
Warn
Error
Critical

requestLogger = rootLogger.with("request_id", "1234") -> We will inherit the rootLogger and add the request_id to each and every log message

agent : reads the logs ( outputs from the app ) - and send it to datadog/loki for visualization

In kubernetes : nodes , within each node multiple pods

for each node , we need a global agent to collect the traces ( One DataDog agent for example )

daemonset : Only one copy of the Pod in each node

A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.

In Datadog, we will generate an access key to access datadog from the agents and send logs to it.

we add the Datadog helm chart by running :

helm repo add datadog https://helm.datadoghq.com
helm repo update

To install the chart , we run ( found in https://github.com/DataDog/helm-charts/tree/main/charts/datadog )

helm install <RELEASE_NAME> \
    --set datadog.apiKey=<DATADOG_API_KEY> datadog/datadog

4. Watchdog 2

When using DataDog, we need to enable the functionality / Service that we need ( for example , logs by adding the logs block)

This time, we will add more configuration before applying the chart

In datadog-values.yml

datadog:
  apiKey: MYKEY

  kubelet:
    tlsVerify: false # needed if working with azure

  logs:
    enabled: true #enable logs
    containerCollectAll: true # collect logs of all containers

helm install datadog -f datadog-values.yml  datadog/datadog

After that we add a simple console.log, redeploy the application in dockerhub then in kubernetes :

kubectl port-forward factservice-deployment-6c7bb95d7b-92qnl 6000:5000

We run this a few times to get logs , and then we check them in datadog

Datadog : There are two parts , tags and attributes

tags : dependent from the appllcation Message : blabla requestId: 1234

attributes: ( added to pinpoint the location )

podName
serviceName

Datadog query language: key:value

Next Task -> We added the requestId to all the logs

other tools for logging open source : logstash (from elk), loki ( from grafana loki ) -> Problem with opensource tools: storage

Layer 0 : Provisioning/Infra: Terraform

Layer 1 : Configuration ( Ansible )

Layer 2 : Bootstrap env ( The different tools we might need, Datadog, ect ... )

Layer 3 : The application

5. Tracing

For tracing , The app should send the traces to the tools and not the other way around

Each request has its own traces, and it can get quite large. That's why we don't store it in the machine but send it directly to the tracing service.

We add

oltp:
receiver:
    protocols:
    grpc:
        enabled: true

Cheat sheet

1. Prometheus, Grafana​

2. Helm​

3. Watchdog 1​

4. Watchdog 2​

5. Tracing​

1. Prometheus, Grafana

2. Helm

3. Watchdog 1

4. Watchdog 2

5. Tracing