Remove premature infrastructure doc — host split is still planning, not decided

This commit is contained in:
jester 2026-03-16 23:11:22 +00:00
parent 75ca1303e9
commit 56178ead38

View File

@ -1,166 +0,0 @@
# ZeroLagHub Infrastructure Architecture
## Three-Layer Model
The platform is organized into three distinct layers.
```
Control Plane → Edge / Gateway → Runtime
```
This separation ensures that customer workloads cannot affect platform
availability, and that the platform can always manage the runtime even
if the runtime is degraded.
---
## Layer 1 — Control Plane (Core Host)
Mission critical. Must never go down or the platform is dead.
Low CPU usage but high availability requirement.
Components:
- `zpack-api` — platform API
- `zpack-portal` — Next.js frontend
- MariaDB — primary database
- Redis — cache + agent state
- Technitium DNS (`zlh-dns`)
- Prometheus/Grafana monitoring (`zlh-monitor`)
- PBS backup (`zlh-back`)
- Headscale (`zlh-ctl`) — dev access VPN
- `zlh-proxy` — Traefik for core/portal SSL termination
- core OPNsense
Rule: **Control plane must never depend on runtime.**
---
## Layer 2 — Edge / Gateway Layer
Handles incoming traffic routing. Sits between internet and runtime.
Components:
- `zlh-zpack-proxy` — Traefik for game/dev runtime traffic
- `zlh-velocity` — Minecraft Velocity proxy
- game/dev OPNsense (`zlh-zpack-router`)
The two Traefik instances map cleanly to this split:
| Instance | Routes |
|----------|--------|
| `zlh-proxy` (core) | portal, API, monitoring, Headscale |
| `zlh-zpack-proxy` (runtime) | game servers, dev IDE (future) |
Velocity belongs with runtime — proxying game traffic locally avoids
cross-host latency.
---
## Layer 3 — Runtime Layer (Runtime Host)
Customer workloads. Noisy and unpredictable. Isolated from control plane.
Components:
- game containers
- dev containers
- zlh-agent (inside every container)
- build systems
- `zlh-artifacts` — runtime binaries + server jars
Rule: **Runtime must not be able to break control plane.**
---
## Host Placement
### Core Host (small, stable)
```
core OPNsense
zlh-proxy (Traefik — core traffic)
zpack-api
zpack-portal
MariaDB
Redis
zlh-dns (Technitium)
zlh-monitor (Prometheus/Grafana)
zlh-back (PBS backup)
zlh-ctl (Headscale)
```
### Runtime Host (large, beefy)
```
game-dev OPNsense
zlh-zpack-proxy (Traefik — runtime traffic)
zlh-velocity (Minecraft proxy)
game containers
dev containers
agents
build systems
zlh-artifacts
```
---
## Agent Communication Path
API → agent crosses the host boundary via management network.
```
Core Host (API)
↓ management network
Runtime Host
agent :18888 inside container
```
Firewall rule: only API (core host) may reach agent ports (:18888) on
runtime host. No other source should be able to reach agents directly.
---
## Failure Isolation
If runtime host degrades or goes down:
- control plane stays up
- operators can still login, manage servers, redeploy containers
- platform is degraded but not dead
If core host degrades:
- runtime containers continue running
- game servers stay up
- management capability is lost until core recovers
---
## VMID Allocation Scheme
Block-based allocation makes the Proxmox UI readable and automation
predictable. Container role is derivable from VMID range.
| Range | Purpose |
|-------|---------|
| 100199 | Core infrastructure |
| 200299 | Base templates |
| 300399 | Network / proxy services |
| 10001099 | Game containers |
| 11001199 | Dev containers |
| 2000+ | Monitoring / backup |
---
## Design Principles
1. Control plane never depends on runtime
2. Runtime cannot break control plane
3. Management network is the only path from core to runtime agents
4. Two Traefik instances — one per traffic domain — is correct
5. Velocity stays with runtime to keep game traffic local
6. VMID ranges communicate role without lookup