Creating high availability infrastructure


I needed to create an ecosystem that was:

  1. Provide 99.9% Uptime.
  2. Allow for quick deployments.
  3. Allow for quick rollbacks.
  4. Allow for high-speed self-recovery.
  5. Allow for horizontal and vertical scaling of services / microservices.

The approach

In order to create redundancy, a few crucial questions would need to be answered.

  1. Where are the customers? Understanding this would allow for the correct positioning of the server instances.
  2. What are the risks? Local collapse? national collapse? international collapse?
  3. How do we minimize the blast radius in the event of a failure?

The decision

The decision was made to host with Hetzner in Germany as they had a footprint across Europe and the USA.

Building redundancy was simple. We created 3 data nodes and 3 Kubernetes nodes. A Kubernetes node and data node pair were positioned in the USA, Germany and Frankfurt.

The data services (ElasticSearch, MongoDB and Active MQ) were configured to automatically replicate across the data centres while the Kubernetes nodes would replicate containers across the borders.

Everything Works Together

In December 2022, we experienced an outage with one of the data centres. The services automatically recovered and our customers experienced zero downtime.