High Availability Architecture Examples

When organizations require scaling and high availability the following architectures can be utilized. As the introduction section at the top of this page mentions, there is a tradeoff between cost/complexity and uptime. Be sure this complexity is absolutely required before taking the step into full high availability.

For all examples below, we recommend running Consul and Redis Sentinel on dedicated nodes. Consul is the service that provides registration and healthchecks for Postgres, enablingh Postgres HA. Sentinel is a similar service for Redis. If Consul is running on PostgreSQL nodes or Sentinel on Redis nodes there is a potential that high resource usage by PostgreSQL or Redis could prevent communication between the other Consul and Sentinel nodes. This may lead to the other nodes believing a failure has occurred and automated failover is necessary. Isolating them from the services they monitor reduces the chances of split-brain.

The examples below do not really address high availability of NFS. Some enterprises have access to NFS appliances that manage availability. This is the best case scenario. In the future, GitLab may offer a more user-friendly solution to GitLab HA Storage.

There are many options in between each of these examples. Work with GitLab Support to understand the best starting point for your workload and adapt from there.

Horizontal

This is the simplest form of high availability and scaling. It requires the fewest number of individual servers (virtual or physical) but does have some trade-offs and limits.

This architecture will work well for many GitLab customers. Larger customers may begin to notice certain events cause contention/high load - for example, cloning many large repositories with binary files, high API usage, a large number of enqueued Sidekiq jobs, etc. If this happens you should consider moving to a hybrid or fully distributed architecture depending on what is causing the contention.

2 PostgreSQL nodes
2 Redis nodes
3 Consul/Sentinel nodes
2 or more GitLab application nodes (Unicorn, Workhorse, Sidekiq, PGBouncer)
1 NFS/Gitaly server

Horizontal architecture diagram

Hybrid

In this architecture, certain components are split on dedicated nodes so high resource usage of one component does not interfere with others. In larger environments this is a good architecture to consider if you foresee or do have contention due to certain workloads.

2 PostgreSQL nodes
2 Redis nodes
3 Consul/Sentinel nodes
2 or more Sidekiq nodes
2 or more Web nodes (Unicorn, Workhorse, PGBouncer)
1 or more NFS/Gitaly servers

Hybrid architecture diagram

Fully Distributed

This architecture scales to hundreds of thousands of users and projects and is the basis of the GitLab.com architecture. While this scales well it also comes with the added complexity of many more nodes to configure, manage and monitor.

2 PostgreSQL nodes
4 or more Redis nodes (2 separate clusters for persistent and cache data)
3 Consul nodes
3 Sentinel nodes
Multiple dedicated Sidekiq nodes (Split into real-time, best effort, ASAP, CI Pipeline and Pull Mirror sets)
2 or more Git nodes (Git over SSH/Git over HTTP)
2 or more API nodes (All requests to /api)
2 or more Web nodes (All other web requests)
2 or more NFS/Gitaly servers

Fully Distributed architecture diagram

You can close this issue now ✅

Go to the next section ➡

Edited Nov 26, 2018 by Conley Rogers