# Open Threads – zlh-grind This file tracks active but unfinished work. Keep it short. --- ## Agent (zlh-agent) ### Dev Runtime System Completed: - catalog validation implemented - runtime installs artifact-backed - install guard implemented - all installs now fetch from artifact server (no local artifact assumption) Outstanding: - runtime install verification improvements - catalog hash validation - runtime removal / upgrade handling --- ### Dev Environment Completed: - dev user creation - workspace root `/home/dev/workspace` - console runs as dev user - `HOME`, `USER`, `LOGNAME`, `TERM` env vars set correctly Outstanding: - PATH normalization - shell profile consistency - runtime PATH injection --- ### Code Server Addon Status: ✅ Installed and running Confirmed: - pulled from artifact server (tar.gz) - installed to `/opt/zlh/services/code-server` - binds to `0.0.0.0:6000` - lifecycle endpoints: `POST /dev/codeserver/start|stop|restart` - detection via `/proc/*/cmdline` scan (no longer relies solely on PID file) **BLOCKING — next task:** code-server must launch with: ``` --bind-addr 0.0.0.0:6000 --auth none --disable-telemetry --base-path /api/dev//ide /home/dev/workspace ``` Without `--base-path`, WebSocket paths and static assets mismatch through the proxy. Result: IDE loads partially, WS closes with 1006, workspace shows `!` (not mounted), extension host fails to start. --- ### Game Server Supervision Completed: - crash recovery with backoff: 30s → 60s → 120s - backoff resets if uptime ≥ 30s - transitions to `error` state after repeated failures - crash observability: time, exit code, signal, uptime, log tail, classification - classifications: `oom`, `mod_or_plugin_error`, `missing_dependency`, `nonzero_exit`, `unexpected_exit` --- ### Agent Future Work (priority order) 1. **Fix code-server `--base-path` launch arg** — unblocks IDE (IMMEDIATE) 2. Structured logging (slog) for Loki 3. Dev container `provisioningComplete` state in `/status` 4. Graceful shutdown verification (SIGTERM + wait for Minecraft) 5. Process reattachment on agent restart --- ## Dev IDE Access ### Browser IDE ``` Browser → Portal → API (/api/dev/:id/ide) → container:6000 ``` API layer: ✅ complete Agent layer: ⚠️ blocked on `--base-path` What is confirmed working: - API auth ✅ - Token flow ✅ - Proxy routing ✅ - WebSocket upgrade handler ✅ - Upstream targeting ✅ - code-server process running ✅ What is failing: - Workbench WebSocket session ❌ - Filesystem provider initialization ❌ - Extension host startup ❌ Root cause: code-server launched without `--base-path /api/dev//ide` ### Local Dev Access (Headscale/Tailscale — Future) Outstanding: - confirm `zlh-ctl` Headscale server status - implement Tailscale addon install in agent - API auth key generation - portal setup instructions Constraints: `magic_dns: false`, no exit nodes, no DNS takeover --- ## API (zpack-api) Completed: - dev provisioning payload - runtime/version fields - enable_code_server flag - `GET /api/servers/:id/status` — server status endpoint - `POST /api/dev/:id/ide-token` — IDE token generation - `GET /api/dev/:id/ide` + `GET /api/dev/:id/ide/*` — IDE proxy with WebSocket - dev routing experiment removed (`devRouting.js`, `devDePublisher.js` deleted) Outstanding: - dev runtime catalog endpoint for portal - Headscale auth key generation --- ## Portal (zpack-portal) Completed: - dev runtime dropdown - dotnet runtime support - enable code-server checkbox - dev file browser support Outstanding: - "Open IDE" button — calls `POST /api/dev/:id/ide-token`, opens returned URL in new tab - Headscale setup instructions --- ## Platform Future work: - Tailscale dev access - artifact version promotion - runtime rollback support --- ## Closed Threads - ✅ PTY console (dev + game) - ✅ Mod lifecycle - ✅ Upload pipeline - ✅ Runtime artifact installs - ✅ Dev container filesystem model - ✅ Code-server artifact fix - ✅ API status endpoint for frontend agent-state consumption - ✅ Dev IDE proxy implementation (API proxy + token system) - ✅ Dev DNS/Traefik routing experiment — removed - ✅ Game server crash recovery with backoff - ✅ Crash observability (classification, log tail, exit metadata) - ✅ Code-server lifecycle endpoints (start/stop/restart) - ✅ Code-server process detection via /proc scan