zlh-grind/PROJECT_CONTEXT.md

250 lines
8.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ZeroLagHub Project Context
## What It Is
Game server hosting platform targeting modded, indie, and emerging games.
Competitive advantages: LXC containers (20-30% perf over Docker), custom
agent architecture, open-source stack, developer-to-player pipeline that
turns mod developers into a distribution channel.
System posture: stable, controlled expansion phase.
---
## Naming Convention
- `zlh-*` = core infrastructure (DNS, monitoring, backup, routing, artifacts)
- `zpack-*` = game and dev server stack (portal, API, containers)
---
## Infrastructure (Proxmox)
### Active VMs
| VM | Name | Role |
|----|------|------|
| 104 | zlh-monitor | Prometheus/Grafana monitoring |
| 105 | zlh-router | Core services router |
| 300 | zlh-velocity | Minecraft Velocity proxy |
| 1001 | zlh-dns | Technitium DNS |
| 1002 | zlh-proxy | Traefik — core/frontend SSL termination (portal traffic) |
| 1003 | zlh-artifacts | Runtime binaries + Minecraft server jars (agent install source) |
| 1004 | zlh-zpack-proxy | Traefik — game/dev edge routing + dev IDE wildcard TLS |
| 1005 | zpack-api | Node.js API |
| 1006 | zlh-zpack-router | Game/dev router |
| 1100 | zpack-portal | Next.js frontend |
| 2001 | zlh-back | PBS backup + Backblaze B2 |
### Legacy / Reference Only (not active production)
| VM | Name | Notes |
|----|------|-------|
| 100 | zlh-panel | Old Pterodactyl panel — kept for reference |
| 101 | zlh-wings | Old Wings — kept for reference |
| 103 | zlh-api | Old API VM — kept for reference |
| 1000 | zlh-router | Not in use |
---
## Stack
**API (zpack-api, VM 1005):** Node.js ESM, Express 5, Prisma 6, MariaDB,
Redis, BullMQ, JWT, Stripe, argon2, ssh2, WebSocket, http-proxy-middleware
**Portal (zpack-portal, VM 1100):** Next.js 15, TypeScript, TailwindCSS,
Axios, WebSocket console. Sci-fi HUD aesthetic (steel textures, neon
accents, beveled panels).
**Agent (zlh-agent):** Go 1.21, stdlib HTTP, creack/pty, gorilla/websocket.
Runs inside every game/dev container. Only process with direct filesystem
access. Pulls runtimes + server jars from zlh-artifacts (VM 1003).
---
## Agent (Operational)
- HTTP server on :18888, internal only — API is the only caller
- Container types: `game` and `dev`
- Lifecycle: POST /config triggers async provision + start pipeline
- Filesystem: strict path allowlist for games, workspace-root sandbox for dev containers
- Upload transport: raw `http.request` piping (`req.pipe(proxyReq)`), never fetch()
- Console: PTY-backed WebSocket, one read loop per container
- Self-update: periodic check + apply
- Forge/Neoforge: automated 5-step post-install patch sequence
- Modrinth mod lifecycle: install/enable/disable/delete — fully operational
- Provenance: `.zlh_metadata.json` — source is `null` if not set
- Status transport model: poll-based (`/status`), not push-based
- State transitions: `idle`, `installing`, `starting`, `running`, `stopping`, `crashed`, `error`
- Crash recovery: backoff 30s/60s/120s, resets if uptime ≥ 30s, `error` state after repeated failures
- Crash observability: exit code, signal, uptime, log tail, classification (oom/mod_error/missing_dep/nonzero/unexpected)
- Structured logging across provisioning, installs, file ops, control plane
---
## Dev Containers (Current State)
- supported runtimes: node, python, go, java, dotnet
- runtime installs are artifact-backed and idempotent
- runtime root: `/opt/zlh/runtimes/<runtime>/<version>`
- dev identity: `dev:dev`
- workspace root: `/home/dev/workspace`
- shell env: `HOME`, `USER`, `LOGNAME`, `TERM` set correctly
- code-server install path: `/opt/zlh/services/code-server`
- code-server port: `6000`
- code-server lifecycle: `POST /dev/codeserver/start|stop|restart`
- code-server detection: `/proc/*/cmdline` scan
- agent port: `18888`
Code-server launch model:
- binds to `0.0.0.0`
- `--auth none`
- API/hosted flow handles auth and proxying
---
## Dev Container Access Model
### Browser IDE (Current Working Model)
```
Browser
Traefik (dev-<vmid>.zerolaghub.dev, 10.70.0.242)
API (10.60.0.245:4000)
container:6000
```
Working hosted flow:
1. frontend calls `POST /api/dev/:id/ide-token`
2. API returns `https://dev-<vmid>.zerolaghub.dev/?token=...`
3. browser opens hosted URL
4. Traefik wildcard router forwards to API at `http://10.60.0.245:4000`
5. API validates token, sets HTTP-only IDE cookie, redirects to clean hosted URL
6. subsequent cookie-backed request proxied to container code-server
7. code-server redirects to `/?folder=/home/dev/workspace`
8. IDE loads successfully
Curl-verified response chain:
- `GET /?token=...``302` + `Set-Cookie`
- `GET /` with cookie → `302` to `/?folder=/home/dev/workspace`
- `GET /?folder=/home/dev/workspace``200` code-server HTML
### Traefik Role
- terminates TLS via wildcard cert `*.zerolaghub.dev` (Let's Encrypt DNS-01 via Cloudflare)
- matches `dev-*.zerolaghub.dev` via `HostRegexp`
- forwards to API at `http://10.60.0.245:4000`
- preserves original `Host` header (`passHostHeader: true`)
- does NOT route directly to containers
### API Role
- extracts vmid from `Host` header via `handleHostedProxy`
- validates short-lived IDE token
- sets HTTP-only `zlh_dev_ide_token` cookie
- redirects token URL to clean hostname URL
- proxies all live code-server HTTP + WebSocket traffic to correct container
### Local Developer Access (Future)
Headscale/Tailscale for SSH, VS Code Remote, local tools.
Headscale server: `zlh-ctl` (status to be confirmed).
Constraints: no exit nodes, `magic_dns: false`.
### Removed / No Longer Current
- path-based `/api/dev/:id/ide` as primary browser entry
- Caddy-hosted dev IDE edge
- per-container Traefik file creation from dev provisioning
- per-container Cloudflare/Technitium publish/unpublish from API for dev IDE access
`proxyClient.js` remains in repo — still used by game edge publish logic.
---
## API Routes — Dev IDE
```
POST /api/dev/:id/ide-token — generate short-lived IDE token + hosted URL
```
Hosted requests land on the API through Traefik using the dev hostname.
API handles host-based vmid extraction, token bootstrap, cookie handoff,
HTTP + WebSocket proxy to code-server.
---
## API / Frontend Status
- API polls agent `/status`
- API exposes polled state back to frontend via `GET /api/servers/:id/status`
- Portal no longer relies on stale DB-only state for console availability
- Game publish flow remains untouched
---
## Game Support
**Production:** Minecraft (vanilla/Fabric/Paper/Forge/Neoforge), Rust,
Terraria, Project Zomboid
**In Pipeline:** Valheim, Palworld, Vintage Story, Core Keeper
---
## Developer-to-Player Pipeline (Revenue Model)
```
LXC Dev Environment ($15-40/mo)
→ Game/mod creation + testing
→ Testing servers (50% dev discount)
→ Player community referrals (25% player discount)
→ Developer revenue share (5-10% commission)
→ Viral growth
```
Revenue multiplier: 1 developer → ~10 players → $147.50/mo total.
---
## Open Threads
1. Verify full browser behavior + WebSocket under hosted wildcard flow
2. Confirm "Open IDE" button in portal uses hosted URL in production path
3. Confirm Headscale `zlh-ctl` VM status
4. Curated provenance — tracking install origin
---
## Repo Registry
| Repo | Purpose |
|------|---------|
| zlh-grind | Execution workspace / continuity / active constraints |
| zlh-docs | API/agent/portal reference docs (read from source) |
| zpack-api | API source (mirror) |
| zpack-portal | Portal source (mirror) |
| zlh-agent | Agent source |
All at `git.zerolaghub.com/jester/<repo>`
---
## Session Guidance
- zlh-grind is the execution continuity layer, not the architecture authority
- zlh-docs has full agent documentation (routes, filesystem rules, provisioning pipeline)
- Agent is the authority on filesystem enforcement — API must NOT duplicate filesystem logic
- Portal does not enforce real policy — agent enforces
- Portal never calls agents directly — all traffic through API
- Upload transport uses raw http.request piping, never fetch()
- VMs 100, 101, 103, 1000 are legacy/unused — not active production
- Do not mark unimplemented work as complete
- Game publish flow must never be modified by dev routing changes
- `proxyClient.js` must not be deleted — used by game edge publish path