Update OPEN_THREADS 2026-03-17 — base-path blocking issue, agent enhancements completed

This commit is contained in:
jester 2026-03-17 23:05:57 +00:00
parent 56178ead38
commit 8c8af5ff62

View File

@ -15,6 +15,7 @@ Completed:
- catalog validation implemented
- runtime installs artifact-backed
- install guard implemented
- all installs now fetch from artifact server (no local artifact assumption)
Outstanding:
@ -31,6 +32,7 @@ Completed:
- dev user creation
- workspace root `/home/dev/workspace`
- console runs as dev user
- `HOME`, `USER`, `LOGNAME`, `TERM` env vars set correctly
Outstanding:
@ -40,64 +42,85 @@ Outstanding:
---
## Code Server Addon
### Code Server Addon
Status: ✅ Installed and running inside dev containers
Status: ✅ Installed and running
Confirmed:
- compiled release artifact fixed on `zlh-artifacts`
- install confirmed working
- process confirmed running inside container
- pulled from artifact server (tar.gz)
- installed to `/opt/zlh/services/code-server`
- binds to `0.0.0.0:6000`
- launched from `/opt/zlh/services/code-server`
- lifecycle endpoints: `POST /dev/codeserver/start|stop|restart`
- detection via `/proc/*/cmdline` scan (no longer relies solely on PID file)
Port: `6000`
**BLOCKING — next task:**
**Next session — agent change required:**
code-server must be relaunched with:
code-server must launch with:
```
--bind-addr 0.0.0.0:6000
--auth none
--disable-telemetry
--base-path /api/dev/<vmid>/ide
/home/dev/workspace
```
Reason: API token is now the sole auth mechanism. Password prompt must be removed. Base path required for correct asset loading through proxy.
Without `--base-path`, WebSocket paths and static assets mismatch through
the proxy. Result: IDE loads partially, WS closes with 1006, workspace
shows `!` (not mounted), extension host fails to start.
---
### Game Server Supervision
Completed:
- crash recovery with backoff: 30s → 60s → 120s
- backoff resets if uptime ≥ 30s
- transitions to `error` state after repeated failures
- crash observability: time, exit code, signal, uptime, log tail, classification
- classifications: `oom`, `mod_or_plugin_error`, `missing_dependency`, `nonzero_exit`, `unexpected_exit`
---
### Agent Future Work (priority order)
1. **Fix code-server `--base-path` launch arg** — unblocks IDE (IMMEDIATE)
2. Structured logging (slog) for Loki
3. Dev container `provisioningComplete` state in `/status`
4. Graceful shutdown verification (SIGTERM + wait for Minecraft)
5. Process reattachment on agent restart
---
## Dev IDE Access
### Browser IDE (Implemented ✅)
### Browser IDE
```
Browser
Portal
API (/api/dev/:id/ide)
container:6000
Browser → Portal → API (/api/dev/:id/ide) → container:6000
```
Implemented in API:
API layer: ✅ complete
Agent layer: ⚠️ blocked on `--base-path`
- `src/routes/devProxy.js` — proxy route mounted in `src/app.js`
- `GET /api/dev/:id/ide` and `GET /api/dev/:id/ide/*`
- ownership verification before proxying
- `ctype === "dev"` required
- WebSocket support via `http-proxy-middleware` (`ws: true`)
- `server.on('upgrade')` handler wired
What is confirmed working:
IDE token system implemented:
- API auth ✅
- Token flow ✅
- Proxy routing ✅
- WebSocket upgrade handler ✅
- Upstream targeting ✅
- code-server process running ✅
- `POST /api/dev/:id/ide-token` — returns signed short-lived token
- token payload: `sub`, `vmid`, `type: "dev-ide"`
- default TTL: 300 seconds
- env overrides: `API_AUTH_IDE_TTL_SECONDS`, `API_AUTH_IDE_SECRET`
- proxy accepts `Authorization: Bearer` or `?token=<ide-token>`
- WebSocket upgrades validate same token
What is failing:
- Workbench WebSocket session ❌
- Filesystem provider initialization ❌
- Extension host startup ❌
Root cause: code-server launched without `--base-path /api/dev/<vmid>/ide`
### Local Dev Access (Headscale/Tailscale — Future)
@ -108,22 +131,7 @@ Outstanding:
- API auth key generation
- portal setup instructions
Constraints:
- `magic_dns: false`
- no exit nodes
- no DNS takeover
---
## Agent Future Work (priority order)
1. Update code-server launch args (`--auth none`, `--base-path /api/dev/<vmid>/ide`)
2. Structured logging (slog) for Loki
3. Dev container provisioningComplete state
4. Crash recovery backoff
5. Graceful shutdown verification
6. Process reattachment on agent restart
Constraints: `magic_dns: false`, no exit nodes, no DNS takeover
---
@ -136,7 +144,7 @@ Completed:
- enable_code_server flag
- `GET /api/servers/:id/status` — server status endpoint
- `POST /api/dev/:id/ide-token` — IDE token generation
- `GET /api/dev/:id/ide` — IDE proxy route with WebSocket support
- `GET /api/dev/:id/ide` + `GET /api/dev/:id/ide/*` — IDE proxy with WebSocket
- dev routing experiment removed (`devRouting.js`, `devDePublisher.js` deleted)
Outstanding:
@ -183,3 +191,7 @@ Future work:
- ✅ API status endpoint for frontend agent-state consumption
- ✅ Dev IDE proxy implementation (API proxy + token system)
- ✅ Dev DNS/Traefik routing experiment — removed
- ✅ Game server crash recovery with backoff
- ✅ Crash observability (classification, log tail, exit metadata)
- ✅ Code-server lifecycle endpoints (start/stop/restart)
- ✅ Code-server process detection via /proc scan