KCNA
Kubernetes and Cloud Native Associate
KCNACloud Native Observability

Cloud Native Observability

Observability answers: what is my system doing and why? It rests on three pillars.

The Three Pillars

1. Metrics

Numeric time-series data. Prometheus is the de-facto standard in cloud native.

# ServiceMonitor (Prometheus Operator)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-monitor
spec:
  selector:
    matchLabels:
      app: web-app
  endpoints:
  - port: metrics
    interval: 15s

Grafana visualises Prometheus metrics with dashboards and alerting.

2. Logs

Structured event records. Cloud native log pipeline:

App → stdout/stderr → Container runtime → Node log file
                                         → Log aggregator (Fluentd/Loki)
                                                          → Storage + Query

Loki + Grafana: lightweight log aggregation with label-based indexing (no full-text index).

3. Traces

Distributed tracing captures a request's journey across services.

OpenTelemetry (OTel) — the CNCF standard for instrumentation. Produces traces, metrics, and logs via a unified API/SDK.

Backends: Jaeger, Zipkin, Tempo.

Kubernetes Built-in Observability

kubectl top nodes                  # CPU/memory usage (needs metrics-server)
kubectl top pods
kubectl describe pod <name>        # events + status
kubectl logs <pod> --previous      # logs from previous container instance
kubectl get events --sort-by=.lastTimestamp

Health Probes

Kubernetes uses probes to determine container health:

ProbeFailure action
livenessProbeRestart the container
readinessProbeRemove from Service endpoints
startupProbeDisable liveness during slow startup
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /ready
    port: 8080

Alerting

Prometheus AlertManager routes alerts to Slack, PagerDuty, email, etc.

# PrometheusRule
- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 5m
  labels:
    severity: critical

Summary

  • Metrics (Prometheus), Logs (Loki/ELK), Traces (OTel + Jaeger) are the three pillars
  • OpenTelemetry is the CNCF standard for instrumentation
  • Health probes let Kubernetes react automatically to unhealthy containers
  • Grafana provides unified visualisation across all three pillars