167 lines
3.7 KiB
Markdown
167 lines
3.7 KiB
Markdown
# ZeroLagHub – Infrastructure Architecture
|
||
|
||
## Three-Layer Model
|
||
|
||
The platform is organized into three distinct layers.
|
||
|
||
```
|
||
Control Plane → Edge / Gateway → Runtime
|
||
```
|
||
|
||
This separation ensures that customer workloads cannot affect platform
|
||
availability, and that the platform can always manage the runtime even
|
||
if the runtime is degraded.
|
||
|
||
---
|
||
|
||
## Layer 1 — Control Plane (Core Host)
|
||
|
||
Mission critical. Must never go down or the platform is dead.
|
||
|
||
Low CPU usage but high availability requirement.
|
||
|
||
Components:
|
||
|
||
- `zpack-api` — platform API
|
||
- `zpack-portal` — Next.js frontend
|
||
- MariaDB — primary database
|
||
- Redis — cache + agent state
|
||
- Technitium DNS (`zlh-dns`)
|
||
- Prometheus/Grafana monitoring (`zlh-monitor`)
|
||
- PBS backup (`zlh-back`)
|
||
- Headscale (`zlh-ctl`) — dev access VPN
|
||
- `zlh-proxy` — Traefik for core/portal SSL termination
|
||
- core OPNsense
|
||
|
||
Rule: **Control plane must never depend on runtime.**
|
||
|
||
---
|
||
|
||
## Layer 2 — Edge / Gateway Layer
|
||
|
||
Handles incoming traffic routing. Sits between internet and runtime.
|
||
|
||
Components:
|
||
|
||
- `zlh-zpack-proxy` — Traefik for game/dev runtime traffic
|
||
- `zlh-velocity` — Minecraft Velocity proxy
|
||
- game/dev OPNsense (`zlh-zpack-router`)
|
||
|
||
The two Traefik instances map cleanly to this split:
|
||
|
||
| Instance | Routes |
|
||
|----------|--------|
|
||
| `zlh-proxy` (core) | portal, API, monitoring, Headscale |
|
||
| `zlh-zpack-proxy` (runtime) | game servers, dev IDE (future) |
|
||
|
||
Velocity belongs with runtime — proxying game traffic locally avoids
|
||
cross-host latency.
|
||
|
||
---
|
||
|
||
## Layer 3 — Runtime Layer (Runtime Host)
|
||
|
||
Customer workloads. Noisy and unpredictable. Isolated from control plane.
|
||
|
||
Components:
|
||
|
||
- game containers
|
||
- dev containers
|
||
- zlh-agent (inside every container)
|
||
- build systems
|
||
- `zlh-artifacts` — runtime binaries + server jars
|
||
|
||
Rule: **Runtime must not be able to break control plane.**
|
||
|
||
---
|
||
|
||
## Host Placement
|
||
|
||
### Core Host (small, stable)
|
||
|
||
```
|
||
core OPNsense
|
||
zlh-proxy (Traefik — core traffic)
|
||
zpack-api
|
||
zpack-portal
|
||
MariaDB
|
||
Redis
|
||
zlh-dns (Technitium)
|
||
zlh-monitor (Prometheus/Grafana)
|
||
zlh-back (PBS backup)
|
||
zlh-ctl (Headscale)
|
||
```
|
||
|
||
### Runtime Host (large, beefy)
|
||
|
||
```
|
||
game-dev OPNsense
|
||
zlh-zpack-proxy (Traefik — runtime traffic)
|
||
zlh-velocity (Minecraft proxy)
|
||
game containers
|
||
dev containers
|
||
agents
|
||
build systems
|
||
zlh-artifacts
|
||
```
|
||
|
||
---
|
||
|
||
## Agent Communication Path
|
||
|
||
API → agent crosses the host boundary via management network.
|
||
|
||
```
|
||
Core Host (API)
|
||
↓ management network
|
||
Runtime Host
|
||
↓
|
||
agent :18888 inside container
|
||
```
|
||
|
||
Firewall rule: only API (core host) may reach agent ports (:18888) on
|
||
runtime host. No other source should be able to reach agents directly.
|
||
|
||
---
|
||
|
||
## Failure Isolation
|
||
|
||
If runtime host degrades or goes down:
|
||
|
||
- control plane stays up
|
||
- operators can still login, manage servers, redeploy containers
|
||
- platform is degraded but not dead
|
||
|
||
If core host degrades:
|
||
|
||
- runtime containers continue running
|
||
- game servers stay up
|
||
- management capability is lost until core recovers
|
||
|
||
---
|
||
|
||
## VMID Allocation Scheme
|
||
|
||
Block-based allocation makes the Proxmox UI readable and automation
|
||
predictable. Container role is derivable from VMID range.
|
||
|
||
| Range | Purpose |
|
||
|-------|---------|
|
||
| 100–199 | Core infrastructure |
|
||
| 200–299 | Base templates |
|
||
| 300–399 | Network / proxy services |
|
||
| 1000–1099 | Game containers |
|
||
| 1100–1199 | Dev containers |
|
||
| 2000+ | Monitoring / backup |
|
||
|
||
---
|
||
|
||
## Design Principles
|
||
|
||
1. Control plane never depends on runtime
|
||
2. Runtime cannot break control plane
|
||
3. Management network is the only path from core to runtime agents
|
||
4. Two Traefik instances — one per traffic domain — is correct
|
||
5. Velocity stays with runtime to keep game traffic local
|
||
6. VMID ranges communicate role without lookup
|