When failover isn’t safe: Building high-availability PostgreSQL on Kubernetes
Datadog made PostgreSQL failover safer by treating replica lag as the promotion gate. A zonal-failure gameday showed that detection and automation could not protect the database if the standby sat behind the primary. The team added lag-aware checks, clearer operator signals, and failure drills so en.. read more









