KCNACloud Native Observability
Cloud Native Observability
Observability answers: what is my system doing and why? It rests on three pillars.
The Three Pillars
1. Metrics
Numeric time-series data. Prometheus is the de-facto standard in cloud native.
# ServiceMonitor (Prometheus Operator)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: app-monitor
spec:
selector:
matchLabels:
app: web-app
endpoints:
- port: metrics
interval: 15s
Grafana visualises Prometheus metrics with dashboards and alerting.
2. Logs
Structured event records. Cloud native log pipeline:
App → stdout/stderr → Container runtime → Node log file
→ Log aggregator (Fluentd/Loki)
→ Storage + Query
Loki + Grafana: lightweight log aggregation with label-based indexing (no full-text index).
3. Traces
Distributed tracing captures a request's journey across services.
OpenTelemetry (OTel) — the CNCF standard for instrumentation. Produces traces, metrics, and logs via a unified API/SDK.
Backends: Jaeger, Zipkin, Tempo.
Kubernetes Built-in Observability
kubectl top nodes # CPU/memory usage (needs metrics-server)
kubectl top pods
kubectl describe pod <name> # events + status
kubectl logs <pod> --previous # logs from previous container instance
kubectl get events --sort-by=.lastTimestamp
Health Probes
Kubernetes uses probes to determine container health:
| Probe | Failure action |
|---|---|
livenessProbe | Restart the container |
readinessProbe | Remove from Service endpoints |
startupProbe | Disable liveness during slow startup |
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
Alerting
Prometheus AlertManager routes alerts to Slack, PagerDuty, email, etc.
# PrometheusRule
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: critical
Summary
- Metrics (Prometheus), Logs (Loki/ELK), Traces (OTel + Jaeger) are the three pillars
- OpenTelemetry is the CNCF standard for instrumentation
- Health probes let Kubernetes react automatically to unhealthy containers
- Grafana provides unified visualisation across all three pillars