diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md index f26a767..1eef8b2 100644 --- a/PROJECT_CONTEXT.md +++ b/PROJECT_CONTEXT.md @@ -3,7 +3,7 @@ ## What It Is Game server hosting platform targeting modded, indie, and emerging games. Competitive advantages: LXC containers (20-30% perf over Docker), custom -agent architecture, open-source stack, developer-to-player pipeline that +agent architecture, open-source stack, and a developer-to-player pipeline that turns mod developers into a distribution channel. System posture: stable, controlled expansion phase. @@ -12,72 +12,79 @@ System posture: stable, controlled expansion phase. ## Naming Convention -- `zlh-*` = core infrastructure (DNS, monitoring, backup, routing, artifacts) -- `zpack-*` = game and dev server stack (portal, API, containers) +- `zlh-*` = core infrastructure (routing, monitoring, backup, artifacts, shared services) +- `zpack-*` = game and dev stack (portal, API, containers, Velocity/game edge) --- -## Infrastructure (Proxmox) +## Infrastructure (Current Reality) -### Active VMs +### Active host / environment +- Active dedicated host: **GTHost Detroit** +- Denver host: **decommissioned Apr 2, 2026** +- Active infra VM/LXC IDs are in the **9000s range** +- Legacy IDs in the 100s / 300s / 1000s / 2000s are reference only and should **not** be treated as active production -| VM | Name | Role | -|----|------|------| -| 104 | zlh-monitor | Prometheus/Grafana monitoring | -| 105 | zlh-router | Core services router | -| 300 | zlh-velocity | Minecraft Velocity proxy | -| 1001 | zlh-dns | Technitium DNS | -| 1002 | zlh-proxy | Traefik — core/frontend SSL termination (portal traffic) | -| 1003 | zlh-artifacts | Runtime binaries + Minecraft server jars (agent install source) | -| 1004 | zlh-zpack-proxy | Traefik — game/dev edge routing + dev IDE wildcard TLS | -| 1005 | zpack-api | Node.js API | -| 1006 | zlh-zpack-router | Game/dev router | -| 1100 | zpack-portal | Next.js frontend | -| 2001 | zlh-back | PBS backup + Backblaze B2 | +### Infrastructure source of truth +- Authoritative VM/IP inventory: `INFRASTRUCTURE.md` +- `PROJECT_CONTEXT.md` is a platform snapshot, **not** the authoritative VM/IP registry -### Legacy / Reference Only (not active production) +### Active core infrastructure (high level) +- `9001 zlh-router` — OPNsense core router +- `9002 zpack-router` — OPNsense game/dev router +- `9010 zpack-dns` — Technitium DNS +- `9011 zlh-proxy` — core reverse proxy +- `9012 zpack-proxy` — game/dev edge proxy +- `9014 zlh-artifacts` — runtime, jar, and agent artifact server +- `9015 zpack-velocity` — Velocity proxy +- `9016 zlh-monitor` — Prometheus/Grafana +- `9017 zlh-back` — Proxmox Backup Server +- `9020 zpack-api` — API VM +- `9021 zpack-portal` — portal VM -| VM | Name | Notes | -|----|------|-------| -| 100 | zlh-panel | Old Pterodactyl panel — kept for reference | -| 101 | zlh-wings | Old Wings — kept for reference | -| 103 | zlh-api | Old API VM — kept for reference | -| 1000 | zlh-router | Not in use | +### Service discovery note +- `internal.zlh` is **not currently used** for hot-path runtime service discovery +- Prefer explicit env-configured IPs / addresses in runtime-critical paths --- ## Stack -**API (zpack-api, VM 1005):** Node.js ESM, Express 5, Prisma 6, MariaDB, -Redis, BullMQ, JWT, Stripe, argon2, ssh2, WebSocket, http-proxy-middleware +**API (`zpack-api`):** Node.js ESM, Express 5, Prisma 6, MariaDB, Redis, +BullMQ, JWT, Stripe, argon2, ssh2, WebSocket, http-proxy-middleware -**Portal (zpack-portal, VM 1100):** Next.js 15, TypeScript, TailwindCSS, +**Portal (`zpack-portal`):** Next.js 15, TypeScript, TailwindCSS, Axios, WebSocket console. Sci-fi HUD aesthetic (steel textures, neon accents, beveled panels). -**Agent (zlh-agent):** Go 1.21, stdlib HTTP, creack/pty, gorilla/websocket. +**Agent (`zlh-agent`):** Go 1.21, stdlib HTTP, creack/pty, gorilla/websocket. Runs inside every game/dev container. Only process with direct filesystem -access. Pulls runtimes + server jars from zlh-artifacts (VM 1003). +access. Pulls runtimes + server jars from `zlh-artifacts`. + +**Velocity plugin (`ZpackVelocityBridge`):** custom Velocity-side bridge that +hydrates/registers backend servers for the proxy and exposes plugin-local +register/unregister/status HTTP endpoints. --- ## Agent (Operational) -- HTTP server on :18888, internal only — API is the only caller +- HTTP server on :18888, internal only — API is the only intended caller - Container types: `game` and `dev` -- Lifecycle: POST /config triggers async provision + start pipeline +- Lifecycle: `POST /config` triggers async provision + start pipeline - Filesystem: strict path allowlist for games, workspace-root sandbox for dev containers -- Upload transport: raw `http.request` piping (`req.pipe(proxyReq)`), never fetch() +- Upload transport: raw `http.request` piping (`req.pipe(proxyReq)`), never `fetch()` - Console: PTY-backed WebSocket, one read loop per container - Self-update: periodic check + apply -- Forge/Neoforge: automated 5-step post-install patch sequence -- Modrinth mod lifecycle: install/enable/disable/delete — fully operational +- Forge/Neoforge: automated post-install patch sequence +- Modrinth mod lifecycle: install/enable/disable/delete — operational - Provenance: `.zlh_metadata.json` — source is `null` if not set - Status transport model: poll-based (`/status`), not push-based - State transitions: `idle`, `installing`, `starting`, `running`, `stopping`, `crashed`, `error` - Crash recovery: backoff 30s/60s/120s, resets if uptime ≥ 30s, `error` state after repeated failures -- Crash observability: exit code, signal, uptime, log tail, classification (oom/mod_error/missing_dep/nonzero/unexpected) -- Structured logging across provisioning, installs, file ops, control plane +- Crash observability: exit code, signal, uptime, log tail, classification +- Real Minecraft readiness probing exists in `internal/minecraft/readiness.go` +- Current open Minecraft issue is sequencing/use of readiness, not absence of readiness logic --- @@ -96,7 +103,6 @@ access. Pulls runtimes + server jars from zlh-artifacts (VM 1003). - agent port: `18888` Code-server launch model: - - binds to `0.0.0.0` - `--auth none` - API/hosted flow handles auth and proxying @@ -107,75 +113,50 @@ Code-server launch model: ### Browser IDE (Current Working Model) -``` +```text Browser ↓ -Traefik (dev-.zerolaghub.dev, 10.70.0.242) +Traefik / hosted dev edge ↓ -API (10.60.0.245:4000) +API ↓ container:6000 ``` Working hosted flow: - 1. frontend calls `POST /api/dev/:id/ide-token` -2. API returns `https://dev-.zerolaghub.dev/?token=...` +2. API returns hosted IDE URL with short-lived token 3. browser opens hosted URL -4. Traefik wildcard router forwards to API at `http://10.60.0.245:4000` +4. edge forwards to API 5. API validates token, sets HTTP-only IDE cookie, redirects to clean hosted URL -6. subsequent cookie-backed request proxied to container code-server +6. subsequent cookie-backed requests proxy to container code-server 7. code-server redirects to `/?folder=/home/dev/workspace` 8. IDE loads successfully -Curl-verified response chain: +### Traefik / edge role +- terminates TLS for hosted dev traffic +- forwards hosted dev traffic to the API +- preserves original `Host` header +- does **not** route directly to containers for hosted IDE access -- `GET /?token=...` → `302` + `Set-Cookie` -- `GET /` with cookie → `302` to `/?folder=/home/dev/workspace` -- `GET /?folder=/home/dev/workspace` → `200` code-server HTML - -### Traefik Role - -- terminates TLS via wildcard cert `*.zerolaghub.dev` (Let's Encrypt DNS-01 via Cloudflare) -- matches `dev-*.zerolaghub.dev` via `HostRegexp` -- forwards to API at `http://10.60.0.245:4000` -- preserves original `Host` header (`passHostHeader: true`) -- does NOT route directly to containers - -### API Role - -- extracts vmid from `Host` header via `handleHostedProxy` +### API role +- extracts vmid from hosted request context - validates short-lived IDE token -- sets HTTP-only `zlh_dev_ide_token` cookie +- sets HTTP-only IDE cookie - redirects token URL to clean hostname URL -- proxies all live code-server HTTP + WebSocket traffic to correct container - -### Local Developer Access (Future) +- proxies live code-server HTTP + WebSocket traffic to the correct container +### Local developer access (future / separate track) Headscale/Tailscale for SSH, VS Code Remote, local tools. -Headscale server: `zlh-ctl` (status to be confirmed). Constraints: no exit nodes, `magic_dns: false`. ### Removed / No Longer Current - -- path-based `/api/dev/:id/ide` as primary browser entry +- path-based `/api/dev/:id/ide` as the primary browser entry - Caddy-hosted dev IDE edge - per-container Traefik file creation from dev provisioning -- per-container Cloudflare/Technitium publish/unpublish from API for dev IDE access +- per-container Cloudflare/Technitium publish/unpublish for dev IDE browser access -`proxyClient.js` remains in repo — still used by game edge publish logic. - ---- - -## API Routes — Dev IDE - -``` -POST /api/dev/:id/ide-token — generate short-lived IDE token + hosted URL -``` - -Hosted requests land on the API through Traefik using the dev hostname. -API handles host-based vmid extraction, token bootstrap, cookie handoff, -HTTP + WebSocket proxy to code-server. +`proxyClient.js` remains in repo and is still used by game edge publish logic. --- @@ -183,8 +164,32 @@ HTTP + WebSocket proxy to code-server. - API polls agent `/status` - API exposes polled state back to frontend via `GET /api/servers/:id/status` +- Portal uses the API-mediated hosted IDE flow +- Portal uses the API websocket bridge for console access +- Portal still has some migration debt through `src/lib/api/legacy.ts` - Portal no longer relies on stale DB-only state for console availability -- Game publish flow remains untouched +- Game publish flow remains untouched by dev routing work + +--- + +## Velocity / Registration Model + +### Current model +- API is the source of backend inventory/state for Minecraft routing +- Velocity plugin (`ZpackVelocityBridge`) is the component that actually registers/unregisters backends inside Velocity +- Plugin supports startup rehydrate from API plus plugin-local webhook endpoints: + - `POST /zpack/register` + - `POST /zpack/unregister` + - `GET /zpack/status` + +### Important current finding +- The likely current bug is **not** generic “Velocity broken” +- The likely issue is that the API/plugin path can expose/register a backend while the game server is `running` but not actually `ready` +- The plugin is the critical part of that registration path + +### Important implementation note +- Current plugin default endpoint behavior still references `zpack-api.internal.zlh` unless overridden +- That default is stale relative to current hot-path architecture and should not be relied on long-term --- @@ -199,7 +204,7 @@ Terraria, Project Zomboid ## Developer-to-Player Pipeline (Revenue Model) -``` +```text LXC Dev Environment ($15-40/mo) → Game/mod creation + testing → Testing servers (50% dev discount) @@ -212,12 +217,19 @@ Revenue multiplier: 1 developer → ~10 players → $147.50/mo total. --- -## Open Threads +## Open Threads (High Level) -1. Verify full browser behavior + WebSocket under hosted wildcard flow -2. Confirm "Open IDE" button in portal uses hosted URL in production path -3. Confirm Headscale `zlh-ctl` VM status -4. Curated provenance — tracking install origin +1. Billing / Stripe completion +2. Game server world backup / restore +3. User onboarding flow +4. Fabric readiness gating / Velocity exposure sequencing +5. Password reset verification +6. Usage limits / quota enforcement +7. Email notifications +8. Velocity resync / refresh behavior +9. Upload testing, stress testing, OPNsense audit, provisioning validation + +See `OPEN_THREADS.md` for active detail and priority order. --- @@ -225,11 +237,13 @@ Revenue multiplier: 1 developer → ~10 players → $147.50/mo total. | Repo | Purpose | |------|---------| -| zlh-grind | Execution workspace / continuity / active constraints | -| zlh-docs | API/agent/portal reference docs (read from source) | -| zpack-api | API source (mirror) | -| zpack-portal | Portal source (mirror) | -| zlh-agent | Agent source | +| `zlh-grind` | execution workspace / continuity / active constraints | +| `knowledge-base` | canonical architecture / strategy / bootstrap | +| `zlh-docs` | API/agent/portal reference docs | +| `zpack-api` | API source | +| `zpack-portal` | portal source | +| `zlh-agent` | agent source | +| `ZpackVelocityBridge` | Velocity plugin / backend registration layer | All at `git.zerolaghub.com/jester/` @@ -237,13 +251,13 @@ All at `git.zerolaghub.com/jester/` ## Session Guidance -- zlh-grind is the execution continuity layer, not the architecture authority -- zlh-docs has full agent documentation (routes, filesystem rules, provisioning pipeline) -- Agent is the authority on filesystem enforcement — API must NOT duplicate filesystem logic +- `knowledge-base` is the architecture authority +- `zlh-grind` is the execution continuity layer +- `INFRASTRUCTURE.md` is the authoritative VM/IP inventory +- Agent is the authority on filesystem enforcement — API must **not** duplicate filesystem logic - Portal does not enforce real policy — agent enforces -- Portal never calls agents directly — all traffic through API -- Upload transport uses raw http.request piping, never fetch() -- VMs 100, 101, 103, 1000 are legacy/unused — not active production +- Portal never calls agents directly — all traffic goes through API +- Upload transport uses raw `http.request` piping, never `fetch()` - Do not mark unimplemented work as complete - Game publish flow must never be modified by dev routing changes - `proxyClient.js` must not be deleted — used by game edge publish path