Remove premature infrastructure doc — host split is still planning, not decided
This commit is contained in:
parent
75ca1303e9
commit
56178ead38
@ -1,166 +0,0 @@
|
|||||||
# ZeroLagHub – Infrastructure Architecture
|
|
||||||
|
|
||||||
## Three-Layer Model
|
|
||||||
|
|
||||||
The platform is organized into three distinct layers.
|
|
||||||
|
|
||||||
```
|
|
||||||
Control Plane → Edge / Gateway → Runtime
|
|
||||||
```
|
|
||||||
|
|
||||||
This separation ensures that customer workloads cannot affect platform
|
|
||||||
availability, and that the platform can always manage the runtime even
|
|
||||||
if the runtime is degraded.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Layer 1 — Control Plane (Core Host)
|
|
||||||
|
|
||||||
Mission critical. Must never go down or the platform is dead.
|
|
||||||
|
|
||||||
Low CPU usage but high availability requirement.
|
|
||||||
|
|
||||||
Components:
|
|
||||||
|
|
||||||
- `zpack-api` — platform API
|
|
||||||
- `zpack-portal` — Next.js frontend
|
|
||||||
- MariaDB — primary database
|
|
||||||
- Redis — cache + agent state
|
|
||||||
- Technitium DNS (`zlh-dns`)
|
|
||||||
- Prometheus/Grafana monitoring (`zlh-monitor`)
|
|
||||||
- PBS backup (`zlh-back`)
|
|
||||||
- Headscale (`zlh-ctl`) — dev access VPN
|
|
||||||
- `zlh-proxy` — Traefik for core/portal SSL termination
|
|
||||||
- core OPNsense
|
|
||||||
|
|
||||||
Rule: **Control plane must never depend on runtime.**
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Layer 2 — Edge / Gateway Layer
|
|
||||||
|
|
||||||
Handles incoming traffic routing. Sits between internet and runtime.
|
|
||||||
|
|
||||||
Components:
|
|
||||||
|
|
||||||
- `zlh-zpack-proxy` — Traefik for game/dev runtime traffic
|
|
||||||
- `zlh-velocity` — Minecraft Velocity proxy
|
|
||||||
- game/dev OPNsense (`zlh-zpack-router`)
|
|
||||||
|
|
||||||
The two Traefik instances map cleanly to this split:
|
|
||||||
|
|
||||||
| Instance | Routes |
|
|
||||||
|----------|--------|
|
|
||||||
| `zlh-proxy` (core) | portal, API, monitoring, Headscale |
|
|
||||||
| `zlh-zpack-proxy` (runtime) | game servers, dev IDE (future) |
|
|
||||||
|
|
||||||
Velocity belongs with runtime — proxying game traffic locally avoids
|
|
||||||
cross-host latency.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Layer 3 — Runtime Layer (Runtime Host)
|
|
||||||
|
|
||||||
Customer workloads. Noisy and unpredictable. Isolated from control plane.
|
|
||||||
|
|
||||||
Components:
|
|
||||||
|
|
||||||
- game containers
|
|
||||||
- dev containers
|
|
||||||
- zlh-agent (inside every container)
|
|
||||||
- build systems
|
|
||||||
- `zlh-artifacts` — runtime binaries + server jars
|
|
||||||
|
|
||||||
Rule: **Runtime must not be able to break control plane.**
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Host Placement
|
|
||||||
|
|
||||||
### Core Host (small, stable)
|
|
||||||
|
|
||||||
```
|
|
||||||
core OPNsense
|
|
||||||
zlh-proxy (Traefik — core traffic)
|
|
||||||
zpack-api
|
|
||||||
zpack-portal
|
|
||||||
MariaDB
|
|
||||||
Redis
|
|
||||||
zlh-dns (Technitium)
|
|
||||||
zlh-monitor (Prometheus/Grafana)
|
|
||||||
zlh-back (PBS backup)
|
|
||||||
zlh-ctl (Headscale)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Runtime Host (large, beefy)
|
|
||||||
|
|
||||||
```
|
|
||||||
game-dev OPNsense
|
|
||||||
zlh-zpack-proxy (Traefik — runtime traffic)
|
|
||||||
zlh-velocity (Minecraft proxy)
|
|
||||||
game containers
|
|
||||||
dev containers
|
|
||||||
agents
|
|
||||||
build systems
|
|
||||||
zlh-artifacts
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Agent Communication Path
|
|
||||||
|
|
||||||
API → agent crosses the host boundary via management network.
|
|
||||||
|
|
||||||
```
|
|
||||||
Core Host (API)
|
|
||||||
↓ management network
|
|
||||||
Runtime Host
|
|
||||||
↓
|
|
||||||
agent :18888 inside container
|
|
||||||
```
|
|
||||||
|
|
||||||
Firewall rule: only API (core host) may reach agent ports (:18888) on
|
|
||||||
runtime host. No other source should be able to reach agents directly.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Failure Isolation
|
|
||||||
|
|
||||||
If runtime host degrades or goes down:
|
|
||||||
|
|
||||||
- control plane stays up
|
|
||||||
- operators can still login, manage servers, redeploy containers
|
|
||||||
- platform is degraded but not dead
|
|
||||||
|
|
||||||
If core host degrades:
|
|
||||||
|
|
||||||
- runtime containers continue running
|
|
||||||
- game servers stay up
|
|
||||||
- management capability is lost until core recovers
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## VMID Allocation Scheme
|
|
||||||
|
|
||||||
Block-based allocation makes the Proxmox UI readable and automation
|
|
||||||
predictable. Container role is derivable from VMID range.
|
|
||||||
|
|
||||||
| Range | Purpose |
|
|
||||||
|-------|---------|
|
|
||||||
| 100–199 | Core infrastructure |
|
|
||||||
| 200–299 | Base templates |
|
|
||||||
| 300–399 | Network / proxy services |
|
|
||||||
| 1000–1099 | Game containers |
|
|
||||||
| 1100–1199 | Dev containers |
|
|
||||||
| 2000+ | Monitoring / backup |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Design Principles
|
|
||||||
|
|
||||||
1. Control plane never depends on runtime
|
|
||||||
2. Runtime cannot break control plane
|
|
||||||
3. Management network is the only path from core to runtime agents
|
|
||||||
4. Two Traefik instances — one per traffic domain — is correct
|
|
||||||
5. Velocity stays with runtime to keep game traffic local
|
|
||||||
6. VMID ranges communicate role without lookup
|
|
||||||
Loading…
Reference in New Issue
Block a user