Remove premature infrastructure doc — host split is still planning, not decided
This commit is contained in:
parent
75ca1303e9
commit
56178ead38
@ -1,166 +0,0 @@
|
||||
# ZeroLagHub – Infrastructure Architecture
|
||||
|
||||
## Three-Layer Model
|
||||
|
||||
The platform is organized into three distinct layers.
|
||||
|
||||
```
|
||||
Control Plane → Edge / Gateway → Runtime
|
||||
```
|
||||
|
||||
This separation ensures that customer workloads cannot affect platform
|
||||
availability, and that the platform can always manage the runtime even
|
||||
if the runtime is degraded.
|
||||
|
||||
---
|
||||
|
||||
## Layer 1 — Control Plane (Core Host)
|
||||
|
||||
Mission critical. Must never go down or the platform is dead.
|
||||
|
||||
Low CPU usage but high availability requirement.
|
||||
|
||||
Components:
|
||||
|
||||
- `zpack-api` — platform API
|
||||
- `zpack-portal` — Next.js frontend
|
||||
- MariaDB — primary database
|
||||
- Redis — cache + agent state
|
||||
- Technitium DNS (`zlh-dns`)
|
||||
- Prometheus/Grafana monitoring (`zlh-monitor`)
|
||||
- PBS backup (`zlh-back`)
|
||||
- Headscale (`zlh-ctl`) — dev access VPN
|
||||
- `zlh-proxy` — Traefik for core/portal SSL termination
|
||||
- core OPNsense
|
||||
|
||||
Rule: **Control plane must never depend on runtime.**
|
||||
|
||||
---
|
||||
|
||||
## Layer 2 — Edge / Gateway Layer
|
||||
|
||||
Handles incoming traffic routing. Sits between internet and runtime.
|
||||
|
||||
Components:
|
||||
|
||||
- `zlh-zpack-proxy` — Traefik for game/dev runtime traffic
|
||||
- `zlh-velocity` — Minecraft Velocity proxy
|
||||
- game/dev OPNsense (`zlh-zpack-router`)
|
||||
|
||||
The two Traefik instances map cleanly to this split:
|
||||
|
||||
| Instance | Routes |
|
||||
|----------|--------|
|
||||
| `zlh-proxy` (core) | portal, API, monitoring, Headscale |
|
||||
| `zlh-zpack-proxy` (runtime) | game servers, dev IDE (future) |
|
||||
|
||||
Velocity belongs with runtime — proxying game traffic locally avoids
|
||||
cross-host latency.
|
||||
|
||||
---
|
||||
|
||||
## Layer 3 — Runtime Layer (Runtime Host)
|
||||
|
||||
Customer workloads. Noisy and unpredictable. Isolated from control plane.
|
||||
|
||||
Components:
|
||||
|
||||
- game containers
|
||||
- dev containers
|
||||
- zlh-agent (inside every container)
|
||||
- build systems
|
||||
- `zlh-artifacts` — runtime binaries + server jars
|
||||
|
||||
Rule: **Runtime must not be able to break control plane.**
|
||||
|
||||
---
|
||||
|
||||
## Host Placement
|
||||
|
||||
### Core Host (small, stable)
|
||||
|
||||
```
|
||||
core OPNsense
|
||||
zlh-proxy (Traefik — core traffic)
|
||||
zpack-api
|
||||
zpack-portal
|
||||
MariaDB
|
||||
Redis
|
||||
zlh-dns (Technitium)
|
||||
zlh-monitor (Prometheus/Grafana)
|
||||
zlh-back (PBS backup)
|
||||
zlh-ctl (Headscale)
|
||||
```
|
||||
|
||||
### Runtime Host (large, beefy)
|
||||
|
||||
```
|
||||
game-dev OPNsense
|
||||
zlh-zpack-proxy (Traefik — runtime traffic)
|
||||
zlh-velocity (Minecraft proxy)
|
||||
game containers
|
||||
dev containers
|
||||
agents
|
||||
build systems
|
||||
zlh-artifacts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Agent Communication Path
|
||||
|
||||
API → agent crosses the host boundary via management network.
|
||||
|
||||
```
|
||||
Core Host (API)
|
||||
↓ management network
|
||||
Runtime Host
|
||||
↓
|
||||
agent :18888 inside container
|
||||
```
|
||||
|
||||
Firewall rule: only API (core host) may reach agent ports (:18888) on
|
||||
runtime host. No other source should be able to reach agents directly.
|
||||
|
||||
---
|
||||
|
||||
## Failure Isolation
|
||||
|
||||
If runtime host degrades or goes down:
|
||||
|
||||
- control plane stays up
|
||||
- operators can still login, manage servers, redeploy containers
|
||||
- platform is degraded but not dead
|
||||
|
||||
If core host degrades:
|
||||
|
||||
- runtime containers continue running
|
||||
- game servers stay up
|
||||
- management capability is lost until core recovers
|
||||
|
||||
---
|
||||
|
||||
## VMID Allocation Scheme
|
||||
|
||||
Block-based allocation makes the Proxmox UI readable and automation
|
||||
predictable. Container role is derivable from VMID range.
|
||||
|
||||
| Range | Purpose |
|
||||
|-------|---------|
|
||||
| 100–199 | Core infrastructure |
|
||||
| 200–299 | Base templates |
|
||||
| 300–399 | Network / proxy services |
|
||||
| 1000–1099 | Game containers |
|
||||
| 1100–1199 | Dev containers |
|
||||
| 2000+ | Monitoring / backup |
|
||||
|
||||
---
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. Control plane never depends on runtime
|
||||
2. Runtime cannot break control plane
|
||||
3. Management network is the only path from core to runtime agents
|
||||
4. Two Traefik instances — one per traffic domain — is correct
|
||||
5. Velocity stays with runtime to keep game traffic local
|
||||
6. VMID ranges communicate role without lookup
|
||||
Loading…
Reference in New Issue
Block a user