Update OPEN_THREADS 2026-03-17 — base-path blocking issue, agent enhancements completed
This commit is contained in:
parent
56178ead38
commit
8c8af5ff62
112
OPEN_THREADS.md
112
OPEN_THREADS.md
@ -15,6 +15,7 @@ Completed:
|
||||
- catalog validation implemented
|
||||
- runtime installs artifact-backed
|
||||
- install guard implemented
|
||||
- all installs now fetch from artifact server (no local artifact assumption)
|
||||
|
||||
Outstanding:
|
||||
|
||||
@ -31,6 +32,7 @@ Completed:
|
||||
- dev user creation
|
||||
- workspace root `/home/dev/workspace`
|
||||
- console runs as dev user
|
||||
- `HOME`, `USER`, `LOGNAME`, `TERM` env vars set correctly
|
||||
|
||||
Outstanding:
|
||||
|
||||
@ -40,64 +42,85 @@ Outstanding:
|
||||
|
||||
---
|
||||
|
||||
## Code Server Addon
|
||||
### Code Server Addon
|
||||
|
||||
Status: ✅ Installed and running inside dev containers
|
||||
Status: ✅ Installed and running
|
||||
|
||||
Confirmed:
|
||||
|
||||
- compiled release artifact fixed on `zlh-artifacts`
|
||||
- install confirmed working
|
||||
- process confirmed running inside container
|
||||
- pulled from artifact server (tar.gz)
|
||||
- installed to `/opt/zlh/services/code-server`
|
||||
- binds to `0.0.0.0:6000`
|
||||
- launched from `/opt/zlh/services/code-server`
|
||||
- lifecycle endpoints: `POST /dev/codeserver/start|stop|restart`
|
||||
- detection via `/proc/*/cmdline` scan (no longer relies solely on PID file)
|
||||
|
||||
Port: `6000`
|
||||
**BLOCKING — next task:**
|
||||
|
||||
**Next session — agent change required:**
|
||||
|
||||
code-server must be relaunched with:
|
||||
code-server must launch with:
|
||||
|
||||
```
|
||||
--bind-addr 0.0.0.0:6000
|
||||
--auth none
|
||||
--disable-telemetry
|
||||
--base-path /api/dev/<vmid>/ide
|
||||
/home/dev/workspace
|
||||
```
|
||||
|
||||
Reason: API token is now the sole auth mechanism. Password prompt must be removed. Base path required for correct asset loading through proxy.
|
||||
Without `--base-path`, WebSocket paths and static assets mismatch through
|
||||
the proxy. Result: IDE loads partially, WS closes with 1006, workspace
|
||||
shows `!` (not mounted), extension host fails to start.
|
||||
|
||||
---
|
||||
|
||||
### Game Server Supervision
|
||||
|
||||
Completed:
|
||||
|
||||
- crash recovery with backoff: 30s → 60s → 120s
|
||||
- backoff resets if uptime ≥ 30s
|
||||
- transitions to `error` state after repeated failures
|
||||
- crash observability: time, exit code, signal, uptime, log tail, classification
|
||||
- classifications: `oom`, `mod_or_plugin_error`, `missing_dependency`, `nonzero_exit`, `unexpected_exit`
|
||||
|
||||
---
|
||||
|
||||
### Agent Future Work (priority order)
|
||||
|
||||
1. **Fix code-server `--base-path` launch arg** — unblocks IDE (IMMEDIATE)
|
||||
2. Structured logging (slog) for Loki
|
||||
3. Dev container `provisioningComplete` state in `/status`
|
||||
4. Graceful shutdown verification (SIGTERM + wait for Minecraft)
|
||||
5. Process reattachment on agent restart
|
||||
|
||||
---
|
||||
|
||||
## Dev IDE Access
|
||||
|
||||
### Browser IDE (Implemented ✅)
|
||||
### Browser IDE
|
||||
|
||||
```
|
||||
Browser
|
||||
↓
|
||||
Portal
|
||||
↓
|
||||
API (/api/dev/:id/ide)
|
||||
↓
|
||||
container:6000
|
||||
Browser → Portal → API (/api/dev/:id/ide) → container:6000
|
||||
```
|
||||
|
||||
Implemented in API:
|
||||
API layer: ✅ complete
|
||||
Agent layer: ⚠️ blocked on `--base-path`
|
||||
|
||||
- `src/routes/devProxy.js` — proxy route mounted in `src/app.js`
|
||||
- `GET /api/dev/:id/ide` and `GET /api/dev/:id/ide/*`
|
||||
- ownership verification before proxying
|
||||
- `ctype === "dev"` required
|
||||
- WebSocket support via `http-proxy-middleware` (`ws: true`)
|
||||
- `server.on('upgrade')` handler wired
|
||||
What is confirmed working:
|
||||
|
||||
IDE token system implemented:
|
||||
- API auth ✅
|
||||
- Token flow ✅
|
||||
- Proxy routing ✅
|
||||
- WebSocket upgrade handler ✅
|
||||
- Upstream targeting ✅
|
||||
- code-server process running ✅
|
||||
|
||||
- `POST /api/dev/:id/ide-token` — returns signed short-lived token
|
||||
- token payload: `sub`, `vmid`, `type: "dev-ide"`
|
||||
- default TTL: 300 seconds
|
||||
- env overrides: `API_AUTH_IDE_TTL_SECONDS`, `API_AUTH_IDE_SECRET`
|
||||
- proxy accepts `Authorization: Bearer` or `?token=<ide-token>`
|
||||
- WebSocket upgrades validate same token
|
||||
What is failing:
|
||||
|
||||
- Workbench WebSocket session ❌
|
||||
- Filesystem provider initialization ❌
|
||||
- Extension host startup ❌
|
||||
|
||||
Root cause: code-server launched without `--base-path /api/dev/<vmid>/ide`
|
||||
|
||||
### Local Dev Access (Headscale/Tailscale — Future)
|
||||
|
||||
@ -108,22 +131,7 @@ Outstanding:
|
||||
- API auth key generation
|
||||
- portal setup instructions
|
||||
|
||||
Constraints:
|
||||
|
||||
- `magic_dns: false`
|
||||
- no exit nodes
|
||||
- no DNS takeover
|
||||
|
||||
---
|
||||
|
||||
## Agent Future Work (priority order)
|
||||
|
||||
1. Update code-server launch args (`--auth none`, `--base-path /api/dev/<vmid>/ide`)
|
||||
2. Structured logging (slog) for Loki
|
||||
3. Dev container provisioningComplete state
|
||||
4. Crash recovery backoff
|
||||
5. Graceful shutdown verification
|
||||
6. Process reattachment on agent restart
|
||||
Constraints: `magic_dns: false`, no exit nodes, no DNS takeover
|
||||
|
||||
---
|
||||
|
||||
@ -136,7 +144,7 @@ Completed:
|
||||
- enable_code_server flag
|
||||
- `GET /api/servers/:id/status` — server status endpoint
|
||||
- `POST /api/dev/:id/ide-token` — IDE token generation
|
||||
- `GET /api/dev/:id/ide` — IDE proxy route with WebSocket support
|
||||
- `GET /api/dev/:id/ide` + `GET /api/dev/:id/ide/*` — IDE proxy with WebSocket
|
||||
- dev routing experiment removed (`devRouting.js`, `devDePublisher.js` deleted)
|
||||
|
||||
Outstanding:
|
||||
@ -183,3 +191,7 @@ Future work:
|
||||
- ✅ API status endpoint for frontend agent-state consumption
|
||||
- ✅ Dev IDE proxy implementation (API proxy + token system)
|
||||
- ✅ Dev DNS/Traefik routing experiment — removed
|
||||
- ✅ Game server crash recovery with backoff
|
||||
- ✅ Crash observability (classification, log tail, exit metadata)
|
||||
- ✅ Code-server lifecycle endpoints (start/stop/restart)
|
||||
- ✅ Code-server process detection via /proc scan
|
||||
|
||||
Loading…
Reference in New Issue
Block a user