Monitoring & Logging in Production

Observability is about understanding what your system is doing at any given time. It has three pillars: logs, metrics, and traces.

Structured Logging

Prefer JSON logs over plain text — they're machine-readable and easy to query.

// Instead of:
console.log('User logged in: ' + userId)

// Use structured logs:
console.log(JSON.stringify({ event: 'user.login', userId, timestamp: new Date().toISOString() }))

Key Metrics to Track

Metric	Why it matters
Request rate	Traffic volume
Error rate	Service health
Latency (p50/p95/p99)	User experience
CPU / Memory	Resource saturation

Common Tooling

Prometheus + Grafana — metrics collection and dashboards
ELK Stack — Elasticsearch, Logstash, Kibana for log aggregation
Datadog / New Relic — managed observability platforms
Sentry — error tracking and alerting

Alerting Best Practices

Alert on symptoms (high error rate), not causes (CPU spike)
Set actionable alerts — every alert should require a human response
Use runbooks linked from alert descriptions
Avoid alert fatigue by tuning thresholds carefully

Monitoring & Logging in Production ​

Structured Logging ​

Key Metrics to Track ​

Common Tooling ​

Alerting Best Practices ​

Monitoring & Logging in Production

Structured Logging

Key Metrics to Track

Common Tooling

Alerting Best Practices