# Open Threads – zlh-grind This file tracks active but unfinished work. Keep it short. --- ## Agent (zlh-agent) ### Dev Runtime System Completed: - catalog validation implemented - runtime installs artifact-backed - install guard implemented - all installs now fetch from artifact server (no local artifact assumption) Outstanding: - runtime install verification improvements - catalog hash validation - runtime removal / upgrade handling --- ### Dev Environment Completed: - dev user creation - workspace root `/home/dev/workspace` - console runs as dev user - `HOME`, `USER`, `LOGNAME`, `TERM` env vars set correctly Outstanding: - PATH normalization - shell profile consistency - runtime PATH injection --- ### Code Server Addon Status: ✅ Installed, running, and proxied through API Confirmed: - pulled from artifact server (tar.gz) - installed to `/opt/zlh/services/code-server` - binds to `0.0.0.0:8080` - lifecycle endpoints: `POST /dev/codeserver/start|stop|restart` - detection via `/proc/*/cmdline` scan - browser IDE fully working end-to-end via API proxy --- ### Game Server Supervision Completed: - crash recovery with backoff: 30s → 60s → 120s - backoff resets if uptime ≥ 30s - transitions to `error` state after repeated failures - crash observability: time, exit code, signal, uptime, log tail, classification - classifications: `oom`, `mod_or_plugin_error`, `missing_dependency`, `nonzero_exit`, `unexpected_exit` --- ### Agent Future Work (priority order) 1. Structured logging (slog) for Loki 2. Dev container `provisioningComplete` state in `/status` 3. Graceful shutdown verification (SIGTERM + wait for Minecraft) 4. Process reattachment on agent restart --- ## Dev IDE Access ### Browser IDE ✅ Working ``` Browser → Portal → API (bootstrap) → /__ide/:id/* → container:8080 ``` Working flow: 1. frontend calls `POST /api/dev/:id/ide-token` 2. API returns `/api/dev/:id/ide?token=...` 3. frontend opens that URL in new tab 4. bootstrap route validates token, sets HTTP-only IDE cookie, redirects to `/__ide/:id/` 5. all live code-server HTTP + WS traffic proxied through `/__ide/:id/*` 6. API proxies to `http://:8080` Key fixes that made it work: - token bootstrap fixed new-tab auth loss - `/__ide/:id` tunnel separated from bootstrap to avoid API route interference - upstream port corrected to `8080` (Chrome blocks `6000` as unsafe) - `Host` header changed to pass browser host (`req.headers.host`) not container host - `Origin` override removed — browser origin passed through only when present - WS proxy separated from shared HTTP proxy — built target-bound WS proxy at upgrade time - target-bound WS eliminated `ECONNREFUSED 127.0.0.1:8080` fallback bug Current state: - browser still sees API host/IP until portal is behind a proper domain/reverse proxy - host-based `dev-.zlh.dev` support started but reverted — bootstrap path is canonical ### Local Dev Access (Headscale/Tailscale — Future) Outstanding: - confirm `zlh-ctl` Headscale server status - implement Tailscale addon install in agent - API auth key generation - portal setup instructions Constraints: `magic_dns: false`, no exit nodes, no DNS takeover --- ## API (zpack-api) Completed: - dev provisioning payload - runtime/version fields - enable_code_server flag - `GET /api/servers/:id/status` — server status endpoint - `POST /api/dev/:id/ide-token` — IDE token generation - `GET /api/dev/:id/ide` — bootstrap route (validates token, sets cookie, redirects) - `/__ide/:id/*` — live tunnel proxy (HTTP + WS, target-bound) - dev routing experiment removed (`devRouting.js`, `devDePublisher.js` deleted) Outstanding: - dev runtime catalog endpoint for portal - Headscale auth key generation --- ## Portal (zpack-portal) Completed: - dev runtime dropdown - dotnet runtime support - enable code-server checkbox - dev file browser support Outstanding: - "Open IDE" button — calls `POST /api/dev/:id/ide-token`, opens returned URL in new tab - Headscale setup instructions --- ## Platform Future work: - Tailscale dev access - artifact version promotion - runtime rollback support --- ## Closed Threads - ✅ PTY console (dev + game) - ✅ Mod lifecycle - ✅ Upload pipeline - ✅ Runtime artifact installs - ✅ Dev container filesystem model - ✅ Code-server artifact fix - ✅ API status endpoint for frontend agent-state consumption - ✅ Dev DNS/Traefik routing experiment — removed - ✅ Game server crash recovery with backoff - ✅ Crash observability (classification, log tail, exit metadata) - ✅ Code-server lifecycle endpoints (start/stop/restart) - ✅ Code-server process detection via /proc scan - ✅ Dev IDE proxy — browser IDE fully working end-to-end