zlh-grind/PROJECT_CONTEXT.md

9.5 KiB
Raw Blame History

ZeroLagHub Project Context

What It Is

Game server hosting platform targeting modded, indie, and emerging games. Competitive advantages: LXC containers, custom agent architecture, open-source stack, and a developer-to-player pipeline that turns mod developers into a distribution channel.

System posture: stable, controlled expansion phase.


Naming Convention

  • zlh-* = core infrastructure (routing, monitoring, backup, artifacts, shared services)
  • zpack-* = game and dev stack (portal, API, containers, Velocity/game edge)

Infrastructure (Current Reality)

Active host / environment

  • Active dedicated host: GTHost Detroit
  • Denver host: decommissioned Apr 2, 2026
  • Active infra VM/LXC IDs are in the 9000s range
  • Legacy IDs in the 100s / 300s / 1000s / 2000s are reference only and should not be treated as active production

Infrastructure source of truth

  • Authoritative VM/IP inventory: INFRASTRUCTURE.md
  • PROJECT_CONTEXT.md is a platform snapshot, not the authoritative VM/IP registry

Active core infrastructure (high level)

  • 9001 zlh-router — OPNsense core router
  • 9002 zpack-router — OPNsense game/dev router
  • 9010 zpack-dns — Technitium DNS
  • 9011 zlh-proxy — core reverse proxy
  • 9012 zpack-proxy — game/dev edge proxy
  • 9014 zlh-artifacts — runtime, jar, and agent artifact server
  • 9015 zpack-velocity — Velocity proxy
  • 9016 zlh-monitor — Prometheus/Grafana
  • 9017 zlh-back — Proxmox Backup Server
  • 9020 zpack-api — API VM
  • 9021 zpack-portal — portal VM

Service discovery note

  • internal.zlh is not currently used for hot-path runtime service discovery
  • Prefer explicit env-configured IPs / addresses in runtime-critical paths

Stack

API (zpack-api): Node.js ESM, Express 5, Prisma 6, MariaDB, Redis, BullMQ, JWT, Stripe, argon2, ssh2, WebSocket, http-proxy-middleware

Portal (zpack-portal): Next.js 15, TypeScript, TailwindCSS, Axios, WebSocket console. Sci-fi HUD aesthetic (steel textures, neon accents, beveled panels).

Agent (zlh-agent): Go 1.21, stdlib HTTP, creack/pty, gorilla/websocket. Runs inside every game/dev container. Only process with direct filesystem access. Pulls runtimes + server jars from zlh-artifacts.

Velocity plugin (ZpackVelocityBridge): custom Velocity-side bridge that hydrates/registers backend servers for the proxy and exposes plugin-local register/unregister/status HTTP endpoints.


Agent (Operational)

  • HTTP server on :18888, internal only — API is the only intended caller
  • Container types: game and dev
  • Lifecycle: POST /config triggers async provision + start pipeline
  • Filesystem: strict path allowlist for games, workspace-root sandbox for dev containers
  • Upload transport: raw http.request piping (req.pipe(proxyReq)), never fetch()
  • Console: PTY-backed WebSocket, one read loop per container
  • Self-update: periodic check + apply
  • Forge/Neoforge: automated post-install patch sequence
  • Modrinth mod lifecycle: install/enable/disable/delete — operational
  • Provenance: .zlh_metadata.json — source is null if not set
  • Status transport model: poll-based (/status), not push-based
  • State transitions: idle, installing, starting, running, stopping, crashed, error
  • Crash recovery: backoff 30s/60s/120s, resets if uptime ≥ 30s, error state after repeated failures
  • Crash observability: exit code, signal, uptime, log tail, classification
  • Real Minecraft readiness probing exists in internal/minecraft/readiness.go

Backup boundary

  • Agent-owned game backups are local, app-aware rollback backups
  • Current implemented game backup scope is local Minecraft backup create/list/restore/delete plus pre-restore checkpoint hardening
  • PBS / platform backups are the durability and disaster-recovery layer
  • Do not treat offsite/PBS durability work as agent implementation work unless ownership changes

Dev Containers (Current State)

  • supported runtimes: node, python, go, java, dotnet
  • runtime installs are artifact-backed and idempotent
  • runtime root: /opt/zlh/runtimes/<runtime>/<version>
  • dev identity: dev:dev
  • workspace root: /home/dev/workspace
  • shell env: HOME, USER, LOGNAME, TERM set correctly
  • code-server install path: /opt/zlh/services/code-server
  • code-server port: 6000
  • code-server lifecycle: POST /dev/codeserver/start|stop|restart
  • code-server detection: /proc/*/cmdline scan
  • agent port: 18888

Code-server launch model:

  • binds to 0.0.0.0
  • --auth none
  • API/hosted flow handles auth and proxying

Dev Container Access Model

Browser IDE (Current Working Model)

Browser
  ↓
Traefik / hosted dev edge
  ↓
API
  ↓
container:6000

Working hosted flow:

  1. frontend calls POST /api/dev/:id/ide-token
  2. API returns hosted IDE URL with short-lived token
  3. browser opens hosted URL
  4. edge forwards to API
  5. API validates token, sets HTTP-only IDE cookie, redirects to clean hosted URL
  6. subsequent cookie-backed requests proxy to container code-server
  7. code-server redirects to /?folder=/home/dev/workspace
  8. IDE loads successfully

Traefik / edge role

  • terminates TLS for hosted dev traffic
  • forwards hosted dev traffic to the API
  • preserves original Host header
  • does not route directly to containers for hosted IDE access

API role

  • extracts vmid from hosted request context
  • validates short-lived IDE token
  • sets HTTP-only IDE cookie
  • redirects token URL to clean hostname URL
  • proxies live code-server HTTP + WebSocket traffic to the correct container

Local developer access (future / separate track)

Headscale/Tailscale for SSH, VS Code Remote, local tools. Constraints: no exit nodes, magic_dns: false.

Removed / No Longer Current

  • path-based /api/dev/:id/ide as the primary browser entry
  • Caddy-hosted dev IDE edge
  • per-container Traefik file creation from dev provisioning
  • per-container Cloudflare/Technitium publish/unpublish for dev IDE browser access

proxyClient.js remains in repo and is still used by game edge publish logic.


API / Frontend Status

  • API polls agent /status
  • API exposes polled state back to frontend via GET /api/servers/:id/status
  • Portal uses the API-mediated hosted IDE flow
  • Portal uses the API websocket bridge for console access
  • Portal no longer relies on stale DB-only state for console availability
  • Game publish flow remains untouched by dev routing work

Velocity / Registration Model

Current model

  • API is the source of backend inventory/state for Minecraft routing
  • Velocity plugin (ZpackVelocityBridge) is the component that actually registers/unregisters backends inside Velocity
  • Plugin supports startup rehydrate from API plus plugin-local webhook endpoints:
    • POST /zpack/register
    • POST /zpack/unregister
    • GET /zpack/status

Important current finding

  • The likely current issue is sequencing: a backend must not be surfaced before semantic readiness succeeds
  • Current work is verification of any remaining registration path that could expose a backend before readiness probe success

Important implementation note

  • Current plugin default endpoint behavior still references zpack-api.internal.zlh unless overridden
  • That default is stale relative to current hot-path architecture and should not be relied on long-term

Game Support

Production: Minecraft (vanilla/Fabric/Paper/Forge/Neoforge), Rust, Terraria, Project Zomboid

In Pipeline: Valheim, Palworld, Vintage Story, Core Keeper


Developer-to-Player Pipeline (Revenue Model)

LXC Dev Environment ($15-40/mo)
  → Game/mod creation + testing
  → Testing servers (50% dev discount)
  → Player community referrals (25% player discount)
  → Developer revenue share (5-10% commission)
  → Viral growth

Revenue multiplier: 1 developer → ~10 players → $147.50/mo total.


Open Threads (High Level)

Cross-repo / platform work remains in OPEN_THREADS.md.

Repo-specific active work now lives under:

  • Codex/API/*
  • Codex/Portal/*
  • Codex/Agent/*

High-level active themes:

  1. Backup contract normalization and live validation
  2. Dev access / SSH / hosted IDE hardening
  3. Service discovery and provisioning validation
  4. Email notifications and launch polish
  5. Launch testing and infrastructure audit

Repo Registry

Repo Purpose
zlh-grind execution workspace / continuity / active constraints
knowledge-base canonical architecture / strategy / bootstrap
zlh-docs API/agent/portal reference docs
zpack-api API source
zpack-portal portal source
zlh-agent agent source
ZpackVelocityBridge Velocity plugin / backend registration layer

All at git.zerolaghub.com/jester/<repo>


Session Guidance

  • knowledge-base is the architecture authority
  • zlh-grind is the execution continuity layer
  • INFRASTRUCTURE.md is the authoritative VM/IP inventory
  • repo-specific truth is maintained in Codex/*
  • root docs should stay focused on cross-repo/platform truth
  • Agent is the authority on filesystem enforcement — API must not duplicate filesystem logic
  • Portal does not enforce real policy — agent enforces
  • Portal never calls agents directly — all traffic goes through API
  • Upload transport uses raw http.request piping, never fetch()
  • Do not mark unimplemented work as complete
  • remove completed items from OPEN_ITEMS.md instead of letting them linger
  • Game publish flow must never be modified by dev routing changes
  • proxyClient.js must not be deleted — used by game edge publish path