From 8c8af5ff62d4565ba2c8ce51165da4aaae5d17bf Mon Sep 17 00:00:00 2001 From: jester Date: Tue, 17 Mar 2026 23:05:57 +0000 Subject: [PATCH] =?UTF-8?q?Update=20OPEN=5FTHREADS=202026-03-17=20?= =?UTF-8?q?=E2=80=94=20base-path=20blocking=20issue,=20agent=20enhancement?= =?UTF-8?q?s=20completed?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- OPEN_THREADS.md | 112 +++++++++++++++++++++++++++--------------------- 1 file changed, 62 insertions(+), 50 deletions(-) diff --git a/OPEN_THREADS.md b/OPEN_THREADS.md index 2bde474..0f17759 100644 --- a/OPEN_THREADS.md +++ b/OPEN_THREADS.md @@ -15,6 +15,7 @@ Completed: - catalog validation implemented - runtime installs artifact-backed - install guard implemented +- all installs now fetch from artifact server (no local artifact assumption) Outstanding: @@ -31,6 +32,7 @@ Completed: - dev user creation - workspace root `/home/dev/workspace` - console runs as dev user +- `HOME`, `USER`, `LOGNAME`, `TERM` env vars set correctly Outstanding: @@ -40,64 +42,85 @@ Outstanding: --- -## Code Server Addon +### Code Server Addon -Status: ✅ Installed and running inside dev containers +Status: ✅ Installed and running Confirmed: -- compiled release artifact fixed on `zlh-artifacts` -- install confirmed working -- process confirmed running inside container +- pulled from artifact server (tar.gz) +- installed to `/opt/zlh/services/code-server` - binds to `0.0.0.0:6000` -- launched from `/opt/zlh/services/code-server` +- lifecycle endpoints: `POST /dev/codeserver/start|stop|restart` +- detection via `/proc/*/cmdline` scan (no longer relies solely on PID file) -Port: `6000` +**BLOCKING — next task:** -**Next session — agent change required:** - -code-server must be relaunched with: +code-server must launch with: ``` +--bind-addr 0.0.0.0:6000 --auth none +--disable-telemetry --base-path /api/dev//ide +/home/dev/workspace ``` -Reason: API token is now the sole auth mechanism. Password prompt must be removed. Base path required for correct asset loading through proxy. +Without `--base-path`, WebSocket paths and static assets mismatch through +the proxy. Result: IDE loads partially, WS closes with 1006, workspace +shows `!` (not mounted), extension host fails to start. + +--- + +### Game Server Supervision + +Completed: + +- crash recovery with backoff: 30s → 60s → 120s +- backoff resets if uptime ≥ 30s +- transitions to `error` state after repeated failures +- crash observability: time, exit code, signal, uptime, log tail, classification +- classifications: `oom`, `mod_or_plugin_error`, `missing_dependency`, `nonzero_exit`, `unexpected_exit` + +--- + +### Agent Future Work (priority order) + +1. **Fix code-server `--base-path` launch arg** — unblocks IDE (IMMEDIATE) +2. Structured logging (slog) for Loki +3. Dev container `provisioningComplete` state in `/status` +4. Graceful shutdown verification (SIGTERM + wait for Minecraft) +5. Process reattachment on agent restart --- ## Dev IDE Access -### Browser IDE (Implemented ✅) +### Browser IDE ``` -Browser - ↓ -Portal - ↓ -API (/api/dev/:id/ide) - ↓ -container:6000 +Browser → Portal → API (/api/dev/:id/ide) → container:6000 ``` -Implemented in API: +API layer: ✅ complete +Agent layer: ⚠️ blocked on `--base-path` -- `src/routes/devProxy.js` — proxy route mounted in `src/app.js` -- `GET /api/dev/:id/ide` and `GET /api/dev/:id/ide/*` -- ownership verification before proxying -- `ctype === "dev"` required -- WebSocket support via `http-proxy-middleware` (`ws: true`) -- `server.on('upgrade')` handler wired +What is confirmed working: -IDE token system implemented: +- API auth ✅ +- Token flow ✅ +- Proxy routing ✅ +- WebSocket upgrade handler ✅ +- Upstream targeting ✅ +- code-server process running ✅ -- `POST /api/dev/:id/ide-token` — returns signed short-lived token -- token payload: `sub`, `vmid`, `type: "dev-ide"` -- default TTL: 300 seconds -- env overrides: `API_AUTH_IDE_TTL_SECONDS`, `API_AUTH_IDE_SECRET` -- proxy accepts `Authorization: Bearer` or `?token=` -- WebSocket upgrades validate same token +What is failing: + +- Workbench WebSocket session ❌ +- Filesystem provider initialization ❌ +- Extension host startup ❌ + +Root cause: code-server launched without `--base-path /api/dev//ide` ### Local Dev Access (Headscale/Tailscale — Future) @@ -108,22 +131,7 @@ Outstanding: - API auth key generation - portal setup instructions -Constraints: - -- `magic_dns: false` -- no exit nodes -- no DNS takeover - ---- - -## Agent Future Work (priority order) - -1. Update code-server launch args (`--auth none`, `--base-path /api/dev//ide`) -2. Structured logging (slog) for Loki -3. Dev container provisioningComplete state -4. Crash recovery backoff -5. Graceful shutdown verification -6. Process reattachment on agent restart +Constraints: `magic_dns: false`, no exit nodes, no DNS takeover --- @@ -136,7 +144,7 @@ Completed: - enable_code_server flag - `GET /api/servers/:id/status` — server status endpoint - `POST /api/dev/:id/ide-token` — IDE token generation -- `GET /api/dev/:id/ide` — IDE proxy route with WebSocket support +- `GET /api/dev/:id/ide` + `GET /api/dev/:id/ide/*` — IDE proxy with WebSocket - dev routing experiment removed (`devRouting.js`, `devDePublisher.js` deleted) Outstanding: @@ -183,3 +191,7 @@ Future work: - ✅ API status endpoint for frontend agent-state consumption - ✅ Dev IDE proxy implementation (API proxy + token system) - ✅ Dev DNS/Traefik routing experiment — removed +- ✅ Game server crash recovery with backoff +- ✅ Crash observability (classification, log tail, exit metadata) +- ✅ Code-server lifecycle endpoints (start/stop/restart) +- ✅ Code-server process detection via /proc scan