Monitoring & Logging in Production
Observability is about understanding what your system is doing at any given time. It has three pillars: logs, metrics, and traces.
Structured Logging
Prefer JSON logs over plain text — they're machine-readable and easy to query.
ts
// Instead of:
console.log('User logged in: ' + userId)
// Use structured logs:
console.log(JSON.stringify({ event: 'user.login', userId, timestamp: new Date().toISOString() }))Key Metrics to Track
| Metric | Why it matters |
|---|---|
| Request rate | Traffic volume |
| Error rate | Service health |
| Latency (p50/p95/p99) | User experience |
| CPU / Memory | Resource saturation |
Common Tooling
- Prometheus + Grafana — metrics collection and dashboards
- ELK Stack — Elasticsearch, Logstash, Kibana for log aggregation
- Datadog / New Relic — managed observability platforms
- Sentry — error tracking and alerting
Alerting Best Practices
- Alert on symptoms (high error rate), not causes (CPU spike)
- Set actionable alerts — every alert should require a human response
- Use runbooks linked from alert descriptions
- Avoid alert fatigue by tuning thresholds carefully
