zlh-grind/OPEN_THREADS.md

4.7 KiB
Raw Blame History

Open Threads zlh-grind

This file tracks active but unfinished work.

Keep it short.


Agent (zlh-agent)

Dev Runtime System

Completed:

  • catalog validation implemented
  • runtime installs artifact-backed
  • install guard implemented
  • all installs now fetch from artifact server (no local artifact assumption)

Outstanding:

  • runtime install verification improvements
  • catalog hash validation
  • runtime removal / upgrade handling

Dev Environment

Completed:

  • dev user creation
  • workspace root /home/dev/workspace
  • console runs as dev user
  • HOME, USER, LOGNAME, TERM env vars set correctly

Outstanding:

  • PATH normalization
  • shell profile consistency
  • runtime PATH injection

Code Server Addon

Status: Installed, running, and proxied through API

Confirmed:

  • pulled from artifact server (tar.gz)
  • installed to /opt/zlh/services/code-server
  • binds to 0.0.0.0:8080
  • lifecycle endpoints: POST /dev/codeserver/start|stop|restart
  • detection via /proc/*/cmdline scan
  • browser IDE fully working end-to-end via API proxy

Game Server Supervision

Completed:

  • crash recovery with backoff: 30s → 60s → 120s
  • backoff resets if uptime ≥ 30s
  • transitions to error state after repeated failures
  • crash observability: time, exit code, signal, uptime, log tail, classification
  • classifications: oom, mod_or_plugin_error, missing_dependency, nonzero_exit, unexpected_exit

Agent Future Work (priority order)

  1. Structured logging (slog) for Loki
  2. Dev container provisioningComplete state in /status
  3. Graceful shutdown verification (SIGTERM + wait for Minecraft)
  4. Process reattachment on agent restart

Dev IDE Access

Browser IDE Working

Browser → Portal → API (bootstrap) → /__ide/:id/* → container:8080

Working flow:

  1. frontend calls POST /api/dev/:id/ide-token
  2. API returns /api/dev/:id/ide?token=...
  3. frontend opens that URL in new tab
  4. bootstrap route validates token, sets HTTP-only IDE cookie, redirects to /__ide/:id/
  5. all live code-server HTTP + WS traffic proxied through /__ide/:id/*
  6. API proxies to http://<container-ip>:8080

Key fixes that made it work:

  • token bootstrap fixed new-tab auth loss
  • /__ide/:id tunnel separated from bootstrap to avoid API route interference
  • upstream port corrected to 8080 (Chrome blocks 6000 as unsafe)
  • Host header changed to pass browser host (req.headers.host) not container host
  • Origin override removed — browser origin passed through only when present
  • WS proxy separated from shared HTTP proxy — built target-bound WS proxy at upgrade time
  • target-bound WS eliminated ECONNREFUSED 127.0.0.1:8080 fallback bug

Current state:

  • browser still sees API host/IP until portal is behind a proper domain/reverse proxy
  • host-based dev-<vmid>.zlh.dev support started but reverted — bootstrap path is canonical

Local Dev Access (Headscale/Tailscale — Future)

Outstanding:

  • confirm zlh-ctl Headscale server status
  • implement Tailscale addon install in agent
  • API auth key generation
  • portal setup instructions

Constraints: magic_dns: false, no exit nodes, no DNS takeover


API (zpack-api)

Completed:

  • dev provisioning payload
  • runtime/version fields
  • enable_code_server flag
  • GET /api/servers/:id/status — server status endpoint
  • POST /api/dev/:id/ide-token — IDE token generation
  • GET /api/dev/:id/ide — bootstrap route (validates token, sets cookie, redirects)
  • /__ide/:id/* — live tunnel proxy (HTTP + WS, target-bound)
  • dev routing experiment removed (devRouting.js, devDePublisher.js deleted)

Outstanding:

  • dev runtime catalog endpoint for portal
  • Headscale auth key generation

Portal (zpack-portal)

Completed:

  • dev runtime dropdown
  • dotnet runtime support
  • enable code-server checkbox
  • dev file browser support

Outstanding:

  • "Open IDE" button — calls POST /api/dev/:id/ide-token, opens returned URL in new tab
  • Headscale setup instructions

Platform

Future work:

  • Tailscale dev access
  • artifact version promotion
  • runtime rollback support

Closed Threads

  • PTY console (dev + game)
  • Mod lifecycle
  • Upload pipeline
  • Runtime artifact installs
  • Dev container filesystem model
  • Code-server artifact fix
  • API status endpoint for frontend agent-state consumption
  • Dev DNS/Traefik routing experiment — removed
  • Game server crash recovery with backoff
  • Crash observability (classification, log tail, exit metadata)
  • Code-server lifecycle endpoints (start/stop/restart)
  • Code-server process detection via /proc scan
  • Dev IDE proxy — browser IDE fully working end-to-end