diff --git a/INFRASTRUCTURE_ARCHITECTURE.md b/INFRASTRUCTURE_ARCHITECTURE.md deleted file mode 100644 index 67cae7d..0000000 --- a/INFRASTRUCTURE_ARCHITECTURE.md +++ /dev/null @@ -1,166 +0,0 @@ -# ZeroLagHub – Infrastructure Architecture - -## Three-Layer Model - -The platform is organized into three distinct layers. - -``` -Control Plane → Edge / Gateway → Runtime -``` - -This separation ensures that customer workloads cannot affect platform -availability, and that the platform can always manage the runtime even -if the runtime is degraded. - ---- - -## Layer 1 — Control Plane (Core Host) - -Mission critical. Must never go down or the platform is dead. - -Low CPU usage but high availability requirement. - -Components: - -- `zpack-api` — platform API -- `zpack-portal` — Next.js frontend -- MariaDB — primary database -- Redis — cache + agent state -- Technitium DNS (`zlh-dns`) -- Prometheus/Grafana monitoring (`zlh-monitor`) -- PBS backup (`zlh-back`) -- Headscale (`zlh-ctl`) — dev access VPN -- `zlh-proxy` — Traefik for core/portal SSL termination -- core OPNsense - -Rule: **Control plane must never depend on runtime.** - ---- - -## Layer 2 — Edge / Gateway Layer - -Handles incoming traffic routing. Sits between internet and runtime. - -Components: - -- `zlh-zpack-proxy` — Traefik for game/dev runtime traffic -- `zlh-velocity` — Minecraft Velocity proxy -- game/dev OPNsense (`zlh-zpack-router`) - -The two Traefik instances map cleanly to this split: - -| Instance | Routes | -|----------|--------| -| `zlh-proxy` (core) | portal, API, monitoring, Headscale | -| `zlh-zpack-proxy` (runtime) | game servers, dev IDE (future) | - -Velocity belongs with runtime — proxying game traffic locally avoids -cross-host latency. - ---- - -## Layer 3 — Runtime Layer (Runtime Host) - -Customer workloads. Noisy and unpredictable. Isolated from control plane. - -Components: - -- game containers -- dev containers -- zlh-agent (inside every container) -- build systems -- `zlh-artifacts` — runtime binaries + server jars - -Rule: **Runtime must not be able to break control plane.** - ---- - -## Host Placement - -### Core Host (small, stable) - -``` -core OPNsense -zlh-proxy (Traefik — core traffic) -zpack-api -zpack-portal -MariaDB -Redis -zlh-dns (Technitium) -zlh-monitor (Prometheus/Grafana) -zlh-back (PBS backup) -zlh-ctl (Headscale) -``` - -### Runtime Host (large, beefy) - -``` -game-dev OPNsense -zlh-zpack-proxy (Traefik — runtime traffic) -zlh-velocity (Minecraft proxy) -game containers -dev containers -agents -build systems -zlh-artifacts -``` - ---- - -## Agent Communication Path - -API → agent crosses the host boundary via management network. - -``` -Core Host (API) - ↓ management network -Runtime Host - ↓ -agent :18888 inside container -``` - -Firewall rule: only API (core host) may reach agent ports (:18888) on -runtime host. No other source should be able to reach agents directly. - ---- - -## Failure Isolation - -If runtime host degrades or goes down: - -- control plane stays up -- operators can still login, manage servers, redeploy containers -- platform is degraded but not dead - -If core host degrades: - -- runtime containers continue running -- game servers stay up -- management capability is lost until core recovers - ---- - -## VMID Allocation Scheme - -Block-based allocation makes the Proxmox UI readable and automation -predictable. Container role is derivable from VMID range. - -| Range | Purpose | -|-------|---------| -| 100–199 | Core infrastructure | -| 200–299 | Base templates | -| 300–399 | Network / proxy services | -| 1000–1099 | Game containers | -| 1100–1199 | Dev containers | -| 2000+ | Monitoring / backup | - ---- - -## Design Principles - -1. Control plane never depends on runtime -2. Runtime cannot break control plane -3. Management network is the only path from core to runtime agents -4. Two Traefik instances — one per traffic domain — is correct -5. Velocity stays with runtime to keep game traffic local -6. VMID ranges communicate role without lookup