zlh-grind/PROJECT_CONTEXT.md

7.1 KiB
Raw Permalink Blame History

ZeroLagHub Project Context

What It Is

Game server hosting platform targeting modded, indie, and emerging games. Competitive advantages: LXC containers (20-30% perf over Docker), custom agent architecture, open-source stack, developer-to-player pipeline that turns mod developers into a distribution channel.

System posture: stable, controlled expansion phase.


Naming Convention

  • zlh-* = core infrastructure (DNS, monitoring, backup, routing, artifacts)
  • zpack-* = game and dev server stack (portal, API, containers)

Infrastructure (Proxmox)

Active VMs

VM Name Role
104 zlh-monitor Prometheus/Grafana monitoring
105 zlh-router Core services router
300 zlh-velocity Minecraft Velocity proxy
1001 zlh-dns Technitium DNS
1002 zlh-proxy Traefik — core/frontend SSL termination (portal traffic)
1003 zlh-artifacts Runtime binaries + Minecraft server jars (agent install source)
1004 zlh-zpack-proxy Traefik — game server traffic only
1005 zpack-api Node.js API
1006 zlh-zpack-router Game server router
1100 zpack-portal Next.js frontend
2001 zlh-back PBS backup + Backblaze B2

Legacy / Reference Only (not active production)

VM Name Notes
100 zlh-panel Old Pterodactyl panel — kept for reference
101 zlh-wings Old Wings — kept for reference
103 zlh-api Old API VM — kept for reference
1000 zlh-router Not in use

Stack

API (zpack-api, VM 1005): Node.js ESM, Express 5, Prisma 6, MariaDB, Redis, BullMQ, JWT, Stripe, argon2, ssh2, WebSocket, http-proxy-middleware

Portal (zpack-portal, VM 1100): Next.js 15, TypeScript, TailwindCSS, Axios, WebSocket console. Sci-fi HUD aesthetic (steel textures, neon accents, beveled panels).

Agent (zlh-agent): Go 1.21, stdlib HTTP, creack/pty, gorilla/websocket. Runs inside every game/dev container. Only process with direct filesystem access. Pulls runtimes + server jars from zlh-artifacts (VM 1003).


Agent (Operational)

  • HTTP server on :18888, internal only — API is the only caller
  • Container types: game and dev
  • Lifecycle: POST /config triggers async provision + start pipeline
  • Filesystem: strict path allowlist for games, workspace-root sandbox for dev containers
  • Upload transport: raw http.request piping (req.pipe(proxyReq)), never fetch()
  • Console: PTY-backed WebSocket, one read loop per container
  • Self-update: periodic check + apply
  • Forge/Neoforge: automated 5-step post-install patch sequence
  • Modrinth mod lifecycle: install/enable/disable/delete — fully operational
  • Provenance: .zlh_metadata.json — source is null if not set
  • Status transport model: poll-based (/status), not push-based
  • State transitions: idle, installing, starting, running, stopping, crashed, error
  • Crash recovery: backoff 30s/60s/120s, resets if uptime ≥ 30s, error state after repeated failures
  • Crash observability: exit code, signal, uptime, log tail, classification (oom/mod_error/missing_dep/nonzero/unexpected)
  • Structured logging across provisioning, installs, file ops, control plane

Dev Containers (Current State)

  • supported runtimes: node, python, go, java, dotnet
  • runtime installs are artifact-backed and idempotent
  • runtime root: /opt/zlh/runtimes/<runtime>/<version>
  • dev identity: dev:dev
  • workspace root: /home/dev/workspace
  • shell env: HOME, USER, LOGNAME, TERM set correctly
  • code-server install path: /opt/zlh/services/code-server
  • code-server port: 6000
  • code-server lifecycle: POST /dev/codeserver/start|stop|restart
  • code-server detection: /proc/*/cmdline scan
  • agent port: 18888

Current blocking issue: code-server missing --base-path /api/dev/<vmid>/ide in launch args. Causes WS 1006, filesystem provider failure, extension host crash. Fix is one line in the agent launch script.


Dev Container Access Model

Browser IDE (API implemented, agent fix pending)

Browser → Portal → API (/api/dev/:id/ide) → container:6000

Portal calls POST /api/dev/:id/ide-token, opens returned URL in new tab. Token TTL: 300s. Proxy accepts Authorization: Bearer or ?token=. WebSocket upgrades validated with same token. Containers never publicly exposed.

Local Developer Access (Future)

Headscale/Tailscale for SSH, VS Code Remote, local tools. Headscale server: zlh-ctl (status to be confirmed). Constraints: no exit nodes, magic_dns: false.

Removed

DNS-per-container + Traefik dynamic routing abandoned. Removed from API: devRouting.js, devDePublisher.js, Traefik file writes. proxyClient.js retained — still used by game edge publish path.


API Routes — Dev IDE

POST /api/dev/:id/ide-token   — generate short-lived IDE token
GET  /api/dev/:id/ide         — proxy to container:6000
GET  /api/dev/:id/ide/*       — proxy to container:6000
GET  /api/servers/:id/status  — expose polled agent state to frontend

API / Frontend Status

  • API polls agent /status
  • API exposes polled state back to frontend via GET /api/servers/:id/status
  • Portal no longer relies on stale DB-only state for console availability
  • Game publish flow remains untouched

Game Support

Production: Minecraft (vanilla/Fabric/Paper/Forge/Neoforge), Rust, Terraria, Project Zomboid

In Pipeline: Valheim, Palworld, Vintage Story, Core Keeper


Developer-to-Player Pipeline (Revenue Model)

LXC Dev Environment ($15-40/mo)
  → Game/mod creation + testing
  → Testing servers (50% dev discount)
  → Player community referrals (25% player discount)
  → Developer revenue share (5-10% commission)
  → Viral growth

Revenue multiplier: 1 developer → ~10 players → $147.50/mo total.


Open Threads

  1. Agent: fix code-server --base-path /api/dev/<vmid>/ide — unblocks IDE
  2. Portal: "Open IDE" button calling /api/dev/:id/ide-token
  3. Confirm Headscale zlh-ctl VM status
  4. Curated provenance — tracking install origin

Repo Registry

Repo Purpose
zlh-grind Execution workspace / continuity / active constraints
zlh-docs API/agent/portal reference docs (read from source)
zpack-api API source (mirror)
zpack-portal Portal source (mirror)
zlh-agent Agent source

All at git.zerolaghub.com/jester/<repo>


Session Guidance

  • zlh-grind is the execution continuity layer, not the architecture authority
  • zlh-docs has full agent documentation (routes, filesystem rules, provisioning pipeline)
  • Agent is the authority on filesystem enforcement — API must NOT duplicate filesystem logic
  • Portal does not enforce real policy — agent enforces
  • Portal never calls agents directly — all traffic through API
  • Upload transport uses raw http.request piping, never fetch()
  • VMs 100, 101, 103, 1000 are legacy/unused — not active production
  • Do not mark unimplemented work as complete
  • Game publish flow must never be modified by dev routing changes
  • proxyClient.js must not be deleted — used by game edge publish path