# ZeroLagHub – Infrastructure Architecture ## Three-Layer Model The platform is organized into three distinct layers. ``` Control Plane → Edge / Gateway → Runtime ``` This separation ensures that customer workloads cannot affect platform availability, and that the platform can always manage the runtime even if the runtime is degraded. --- ## Layer 1 — Control Plane (Core Host) Mission critical. Must never go down or the platform is dead. Low CPU usage but high availability requirement. Components: - `zpack-api` — platform API - `zpack-portal` — Next.js frontend - MariaDB — primary database - Redis — cache + agent state - Technitium DNS (`zlh-dns`) - Prometheus/Grafana monitoring (`zlh-monitor`) - PBS backup (`zlh-back`) - Headscale (`zlh-ctl`) — dev access VPN - `zlh-proxy` — Traefik for core/portal SSL termination - core OPNsense Rule: **Control plane must never depend on runtime.** --- ## Layer 2 — Edge / Gateway Layer Handles incoming traffic routing. Sits between internet and runtime. Components: - `zlh-zpack-proxy` — Traefik for game/dev runtime traffic - `zlh-velocity` — Minecraft Velocity proxy - game/dev OPNsense (`zlh-zpack-router`) The two Traefik instances map cleanly to this split: | Instance | Routes | |----------|--------| | `zlh-proxy` (core) | portal, API, monitoring, Headscale | | `zlh-zpack-proxy` (runtime) | game servers, dev IDE (future) | Velocity belongs with runtime — proxying game traffic locally avoids cross-host latency. --- ## Layer 3 — Runtime Layer (Runtime Host) Customer workloads. Noisy and unpredictable. Isolated from control plane. Components: - game containers - dev containers - zlh-agent (inside every container) - build systems - `zlh-artifacts` — runtime binaries + server jars Rule: **Runtime must not be able to break control plane.** --- ## Host Placement ### Core Host (small, stable) ``` core OPNsense zlh-proxy (Traefik — core traffic) zpack-api zpack-portal MariaDB Redis zlh-dns (Technitium) zlh-monitor (Prometheus/Grafana) zlh-back (PBS backup) zlh-ctl (Headscale) ``` ### Runtime Host (large, beefy) ``` game-dev OPNsense zlh-zpack-proxy (Traefik — runtime traffic) zlh-velocity (Minecraft proxy) game containers dev containers agents build systems zlh-artifacts ``` --- ## Agent Communication Path API → agent crosses the host boundary via management network. ``` Core Host (API) ↓ management network Runtime Host ↓ agent :18888 inside container ``` Firewall rule: only API (core host) may reach agent ports (:18888) on runtime host. No other source should be able to reach agents directly. --- ## Failure Isolation If runtime host degrades or goes down: - control plane stays up - operators can still login, manage servers, redeploy containers - platform is degraded but not dead If core host degrades: - runtime containers continue running - game servers stay up - management capability is lost until core recovers --- ## VMID Allocation Scheme Block-based allocation makes the Proxmox UI readable and automation predictable. Container role is derivable from VMID range. | Range | Purpose | |-------|---------| | 100–199 | Core infrastructure | | 200–299 | Base templates | | 300–399 | Network / proxy services | | 1000–1099 | Game containers | | 1100–1199 | Dev containers | | 2000+ | Monitoring / backup | --- ## Design Principles 1. Control plane never depends on runtime 2. Runtime cannot break control plane 3. Management network is the only path from core to runtime agents 4. Two Traefik instances — one per traffic domain — is correct 5. Velocity stays with runtime to keep game traffic local 6. VMID ranges communicate role without lookup