Update PROJECT_CONTEXT.md with current infrastructure and Velocity plugin model

This commit is contained in:
jester 2026-04-10 19:15:53 +00:00
parent c5f4b9ddb5
commit 9beac4015a

View File

@ -3,7 +3,7 @@
## What It Is
Game server hosting platform targeting modded, indie, and emerging games.
Competitive advantages: LXC containers (20-30% perf over Docker), custom
agent architecture, open-source stack, developer-to-player pipeline that
agent architecture, open-source stack, and a developer-to-player pipeline that
turns mod developers into a distribution channel.
System posture: stable, controlled expansion phase.
@ -12,72 +12,79 @@ System posture: stable, controlled expansion phase.
## Naming Convention
- `zlh-*` = core infrastructure (DNS, monitoring, backup, routing, artifacts)
- `zpack-*` = game and dev server stack (portal, API, containers)
- `zlh-*` = core infrastructure (routing, monitoring, backup, artifacts, shared services)
- `zpack-*` = game and dev stack (portal, API, containers, Velocity/game edge)
---
## Infrastructure (Proxmox)
## Infrastructure (Current Reality)
### Active VMs
### Active host / environment
- Active dedicated host: **GTHost Detroit**
- Denver host: **decommissioned Apr 2, 2026**
- Active infra VM/LXC IDs are in the **9000s range**
- Legacy IDs in the 100s / 300s / 1000s / 2000s are reference only and should **not** be treated as active production
| VM | Name | Role |
|----|------|------|
| 104 | zlh-monitor | Prometheus/Grafana monitoring |
| 105 | zlh-router | Core services router |
| 300 | zlh-velocity | Minecraft Velocity proxy |
| 1001 | zlh-dns | Technitium DNS |
| 1002 | zlh-proxy | Traefik — core/frontend SSL termination (portal traffic) |
| 1003 | zlh-artifacts | Runtime binaries + Minecraft server jars (agent install source) |
| 1004 | zlh-zpack-proxy | Traefik — game/dev edge routing + dev IDE wildcard TLS |
| 1005 | zpack-api | Node.js API |
| 1006 | zlh-zpack-router | Game/dev router |
| 1100 | zpack-portal | Next.js frontend |
| 2001 | zlh-back | PBS backup + Backblaze B2 |
### Infrastructure source of truth
- Authoritative VM/IP inventory: `INFRASTRUCTURE.md`
- `PROJECT_CONTEXT.md` is a platform snapshot, **not** the authoritative VM/IP registry
### Legacy / Reference Only (not active production)
### Active core infrastructure (high level)
- `9001 zlh-router` — OPNsense core router
- `9002 zpack-router` — OPNsense game/dev router
- `9010 zpack-dns` — Technitium DNS
- `9011 zlh-proxy` — core reverse proxy
- `9012 zpack-proxy` — game/dev edge proxy
- `9014 zlh-artifacts` — runtime, jar, and agent artifact server
- `9015 zpack-velocity` — Velocity proxy
- `9016 zlh-monitor` — Prometheus/Grafana
- `9017 zlh-back` — Proxmox Backup Server
- `9020 zpack-api` — API VM
- `9021 zpack-portal` — portal VM
| VM | Name | Notes |
|----|------|-------|
| 100 | zlh-panel | Old Pterodactyl panel — kept for reference |
| 101 | zlh-wings | Old Wings — kept for reference |
| 103 | zlh-api | Old API VM — kept for reference |
| 1000 | zlh-router | Not in use |
### Service discovery note
- `internal.zlh` is **not currently used** for hot-path runtime service discovery
- Prefer explicit env-configured IPs / addresses in runtime-critical paths
---
## Stack
**API (zpack-api, VM 1005):** Node.js ESM, Express 5, Prisma 6, MariaDB,
Redis, BullMQ, JWT, Stripe, argon2, ssh2, WebSocket, http-proxy-middleware
**API (`zpack-api`):** Node.js ESM, Express 5, Prisma 6, MariaDB, Redis,
BullMQ, JWT, Stripe, argon2, ssh2, WebSocket, http-proxy-middleware
**Portal (zpack-portal, VM 1100):** Next.js 15, TypeScript, TailwindCSS,
**Portal (`zpack-portal`):** Next.js 15, TypeScript, TailwindCSS,
Axios, WebSocket console. Sci-fi HUD aesthetic (steel textures, neon
accents, beveled panels).
**Agent (zlh-agent):** Go 1.21, stdlib HTTP, creack/pty, gorilla/websocket.
**Agent (`zlh-agent`):** Go 1.21, stdlib HTTP, creack/pty, gorilla/websocket.
Runs inside every game/dev container. Only process with direct filesystem
access. Pulls runtimes + server jars from zlh-artifacts (VM 1003).
access. Pulls runtimes + server jars from `zlh-artifacts`.
**Velocity plugin (`ZpackVelocityBridge`):** custom Velocity-side bridge that
hydrates/registers backend servers for the proxy and exposes plugin-local
register/unregister/status HTTP endpoints.
---
## Agent (Operational)
- HTTP server on :18888, internal only — API is the only caller
- HTTP server on :18888, internal only — API is the only intended caller
- Container types: `game` and `dev`
- Lifecycle: POST /config triggers async provision + start pipeline
- Lifecycle: `POST /config` triggers async provision + start pipeline
- Filesystem: strict path allowlist for games, workspace-root sandbox for dev containers
- Upload transport: raw `http.request` piping (`req.pipe(proxyReq)`), never fetch()
- Upload transport: raw `http.request` piping (`req.pipe(proxyReq)`), never `fetch()`
- Console: PTY-backed WebSocket, one read loop per container
- Self-update: periodic check + apply
- Forge/Neoforge: automated 5-step post-install patch sequence
- Modrinth mod lifecycle: install/enable/disable/delete — fully operational
- Forge/Neoforge: automated post-install patch sequence
- Modrinth mod lifecycle: install/enable/disable/delete — operational
- Provenance: `.zlh_metadata.json` — source is `null` if not set
- Status transport model: poll-based (`/status`), not push-based
- State transitions: `idle`, `installing`, `starting`, `running`, `stopping`, `crashed`, `error`
- Crash recovery: backoff 30s/60s/120s, resets if uptime ≥ 30s, `error` state after repeated failures
- Crash observability: exit code, signal, uptime, log tail, classification (oom/mod_error/missing_dep/nonzero/unexpected)
- Structured logging across provisioning, installs, file ops, control plane
- Crash observability: exit code, signal, uptime, log tail, classification
- Real Minecraft readiness probing exists in `internal/minecraft/readiness.go`
- Current open Minecraft issue is sequencing/use of readiness, not absence of readiness logic
---
@ -96,7 +103,6 @@ access. Pulls runtimes + server jars from zlh-artifacts (VM 1003).
- agent port: `18888`
Code-server launch model:
- binds to `0.0.0.0`
- `--auth none`
- API/hosted flow handles auth and proxying
@ -107,75 +113,50 @@ Code-server launch model:
### Browser IDE (Current Working Model)
```
```text
Browser
Traefik (dev-<vmid>.zerolaghub.dev, 10.70.0.242)
Traefik / hosted dev edge
API (10.60.0.245:4000)
API
container:6000
```
Working hosted flow:
1. frontend calls `POST /api/dev/:id/ide-token`
2. API returns `https://dev-<vmid>.zerolaghub.dev/?token=...`
2. API returns hosted IDE URL with short-lived token
3. browser opens hosted URL
4. Traefik wildcard router forwards to API at `http://10.60.0.245:4000`
4. edge forwards to API
5. API validates token, sets HTTP-only IDE cookie, redirects to clean hosted URL
6. subsequent cookie-backed request proxied to container code-server
6. subsequent cookie-backed requests proxy to container code-server
7. code-server redirects to `/?folder=/home/dev/workspace`
8. IDE loads successfully
Curl-verified response chain:
### Traefik / edge role
- terminates TLS for hosted dev traffic
- forwards hosted dev traffic to the API
- preserves original `Host` header
- does **not** route directly to containers for hosted IDE access
- `GET /?token=...``302` + `Set-Cookie`
- `GET /` with cookie → `302` to `/?folder=/home/dev/workspace`
- `GET /?folder=/home/dev/workspace``200` code-server HTML
### Traefik Role
- terminates TLS via wildcard cert `*.zerolaghub.dev` (Let's Encrypt DNS-01 via Cloudflare)
- matches `dev-*.zerolaghub.dev` via `HostRegexp`
- forwards to API at `http://10.60.0.245:4000`
- preserves original `Host` header (`passHostHeader: true`)
- does NOT route directly to containers
### API Role
- extracts vmid from `Host` header via `handleHostedProxy`
### API role
- extracts vmid from hosted request context
- validates short-lived IDE token
- sets HTTP-only `zlh_dev_ide_token` cookie
- sets HTTP-only IDE cookie
- redirects token URL to clean hostname URL
- proxies all live code-server HTTP + WebSocket traffic to correct container
### Local Developer Access (Future)
- proxies live code-server HTTP + WebSocket traffic to the correct container
### Local developer access (future / separate track)
Headscale/Tailscale for SSH, VS Code Remote, local tools.
Headscale server: `zlh-ctl` (status to be confirmed).
Constraints: no exit nodes, `magic_dns: false`.
### Removed / No Longer Current
- path-based `/api/dev/:id/ide` as primary browser entry
- path-based `/api/dev/:id/ide` as the primary browser entry
- Caddy-hosted dev IDE edge
- per-container Traefik file creation from dev provisioning
- per-container Cloudflare/Technitium publish/unpublish from API for dev IDE access
- per-container Cloudflare/Technitium publish/unpublish for dev IDE browser access
`proxyClient.js` remains in repo — still used by game edge publish logic.
---
## API Routes — Dev IDE
```
POST /api/dev/:id/ide-token — generate short-lived IDE token + hosted URL
```
Hosted requests land on the API through Traefik using the dev hostname.
API handles host-based vmid extraction, token bootstrap, cookie handoff,
HTTP + WebSocket proxy to code-server.
`proxyClient.js` remains in repo and is still used by game edge publish logic.
---
@ -183,8 +164,32 @@ HTTP + WebSocket proxy to code-server.
- API polls agent `/status`
- API exposes polled state back to frontend via `GET /api/servers/:id/status`
- Portal uses the API-mediated hosted IDE flow
- Portal uses the API websocket bridge for console access
- Portal still has some migration debt through `src/lib/api/legacy.ts`
- Portal no longer relies on stale DB-only state for console availability
- Game publish flow remains untouched
- Game publish flow remains untouched by dev routing work
---
## Velocity / Registration Model
### Current model
- API is the source of backend inventory/state for Minecraft routing
- Velocity plugin (`ZpackVelocityBridge`) is the component that actually registers/unregisters backends inside Velocity
- Plugin supports startup rehydrate from API plus plugin-local webhook endpoints:
- `POST /zpack/register`
- `POST /zpack/unregister`
- `GET /zpack/status`
### Important current finding
- The likely current bug is **not** generic “Velocity broken”
- The likely issue is that the API/plugin path can expose/register a backend while the game server is `running` but not actually `ready`
- The plugin is the critical part of that registration path
### Important implementation note
- Current plugin default endpoint behavior still references `zpack-api.internal.zlh` unless overridden
- That default is stale relative to current hot-path architecture and should not be relied on long-term
---
@ -199,7 +204,7 @@ Terraria, Project Zomboid
## Developer-to-Player Pipeline (Revenue Model)
```
```text
LXC Dev Environment ($15-40/mo)
→ Game/mod creation + testing
→ Testing servers (50% dev discount)
@ -212,12 +217,19 @@ Revenue multiplier: 1 developer → ~10 players → $147.50/mo total.
---
## Open Threads
## Open Threads (High Level)
1. Verify full browser behavior + WebSocket under hosted wildcard flow
2. Confirm "Open IDE" button in portal uses hosted URL in production path
3. Confirm Headscale `zlh-ctl` VM status
4. Curated provenance — tracking install origin
1. Billing / Stripe completion
2. Game server world backup / restore
3. User onboarding flow
4. Fabric readiness gating / Velocity exposure sequencing
5. Password reset verification
6. Usage limits / quota enforcement
7. Email notifications
8. Velocity resync / refresh behavior
9. Upload testing, stress testing, OPNsense audit, provisioning validation
See `OPEN_THREADS.md` for active detail and priority order.
---
@ -225,11 +237,13 @@ Revenue multiplier: 1 developer → ~10 players → $147.50/mo total.
| Repo | Purpose |
|------|---------|
| zlh-grind | Execution workspace / continuity / active constraints |
| zlh-docs | API/agent/portal reference docs (read from source) |
| zpack-api | API source (mirror) |
| zpack-portal | Portal source (mirror) |
| zlh-agent | Agent source |
| `zlh-grind` | execution workspace / continuity / active constraints |
| `knowledge-base` | canonical architecture / strategy / bootstrap |
| `zlh-docs` | API/agent/portal reference docs |
| `zpack-api` | API source |
| `zpack-portal` | portal source |
| `zlh-agent` | agent source |
| `ZpackVelocityBridge` | Velocity plugin / backend registration layer |
All at `git.zerolaghub.com/jester/<repo>`
@ -237,13 +251,13 @@ All at `git.zerolaghub.com/jester/<repo>`
## Session Guidance
- zlh-grind is the execution continuity layer, not the architecture authority
- zlh-docs has full agent documentation (routes, filesystem rules, provisioning pipeline)
- Agent is the authority on filesystem enforcement — API must NOT duplicate filesystem logic
- `knowledge-base` is the architecture authority
- `zlh-grind` is the execution continuity layer
- `INFRASTRUCTURE.md` is the authoritative VM/IP inventory
- Agent is the authority on filesystem enforcement — API must **not** duplicate filesystem logic
- Portal does not enforce real policy — agent enforces
- Portal never calls agents directly — all traffic through API
- Upload transport uses raw http.request piping, never fetch()
- VMs 100, 101, 103, 1000 are legacy/unused — not active production
- Portal never calls agents directly — all traffic goes through API
- Upload transport uses raw `http.request` piping, never `fetch()`
- Do not mark unimplemented work as complete
- Game publish flow must never be modified by dev routing changes
- `proxyClient.js` must not be deleted — used by game edge publish path