certificate expiration outage

Incident Report for CipherOwl - System

Postmortem

Customer impact: Public APIs and investigation tools experienced elevated error rates between 06:18 and 07:30 UTC. Enterprise customers running ECSD on-prem were unaffected by design.

What happened: From 06:18 to 07:30 UTC, an internal certificate that authenticates service-to-service traffic inside our cluster expired. New mesh-internal connections briefly failed authentication while existing connections kept working. We rotated the certificate, refreshed the affected workloads, and confirmed full recovery.

What we're doing: Migrating the affected certificate to automated rotation, adding cluster-wide expiry monitoring with multi-tier alerts, and publishing dashboards on our public ingress so we have a definitive customer-impact signal on day one of any future incident.

Posted Apr 29, 2026 - 18:42 UTC

Resolved

One of the internal certificate expired causing outages in APIs from 06:18UTC, the service recovered at 06:35UTC and fully resolved 07:30UTC.

Posted Apr 29, 2026 - 06:00 UTC