From 393911c4439908abf711ad15a53c05515fa6ef62 Mon Sep 17 00:00:00 2001 From: jester Date: Tue, 17 Mar 2026 23:07:24 +0000 Subject: [PATCH] =?UTF-8?q?Add=20session=20summary=202026-03-17=20?= =?UTF-8?q?=E2=80=94=20IDE=20proxy=20complete,=20base-path=20blocking=20is?= =?UTF-8?q?sue,=20agent=20enhancements?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../2026-03-17_IDE-Proxy-Blocking-Issue.md | 132 ++++++++++++++++++ 1 file changed, 132 insertions(+) create mode 100644 Session_Summaries/2026-03-17_IDE-Proxy-Blocking-Issue.md diff --git a/Session_Summaries/2026-03-17_IDE-Proxy-Blocking-Issue.md b/Session_Summaries/2026-03-17_IDE-Proxy-Blocking-Issue.md new file mode 100644 index 0000000..1f827b4 --- /dev/null +++ b/Session_Summaries/2026-03-17_IDE-Proxy-Blocking-Issue.md @@ -0,0 +1,132 @@ +# 2026-03-17 – IDE proxy blocking issue + agent enhancements + +## Summary + +API proxy layer for dev IDE is complete. Agent has significant new +capabilities. One blocking issue remains before IDE is fully functional. + +--- + +## What Was Completed + +### API + +- Dev routing removed: `devRouting.js`, `devDePublisher.js` deleted +- All hooks removed from provisioning and container routes +- No DNS records, no Traefik routes, no container exposure + +- `GET /api/dev/:id/ide` + `GET /api/dev/:id/ide/*` — proxy to container:6000 +- WebSocket: `server.on("upgrade")` wired, `ws: true` enabled +- Path rewrite: `/api/dev/:id/ide/...` → `/...` + +- `POST /api/dev/:id/ide-token` — short-lived signed token (300s TTL) +- Token payload: `sub`, `vmid`, `type: "dev-ide"` +- Proxy accepts `Authorization: Bearer` or `?token=` +- WebSocket upgrades validate same token + +- `GET /api/servers/:id/status` — reads Redis `agent:`, returns agent state + +- `proxyClient.js` intentionally untouched — game edge publish path depends on it + +### Agent + +- Full dev container filesystem support, workspace root `/home/dev/workspace` +- Shell runs as `dev` user, `HOME`/`USER`/`LOGNAME`/`TERM` set +- Dev provisioning creates `/home/dev` and `/home/dev/workspace` with correct ownership +- Catalog-driven runtime validation against `devcontainer/_catalog.json` +- All installs now fetch from artifact server — no local artifact assumption +- Dotnet installer fetched from artifact server, installed into runtime path + +- code-server pulled from artifact server (tar.gz) +- Installed to `/opt/zlh/services/code-server` +- Lifecycle endpoints: `POST /dev/codeserver/start|stop|restart` +- Process detection via `/proc/*/cmdline` scan — no longer relies solely on PID file + +- Game server crash recovery with backoff: 30s → 60s → 120s +- Backoff resets if process uptime ≥ 30s +- Transitions to `error` state after repeated failures +- Crash observability: time, exit code, signal, uptime, log tail, classification +- Classifications: `oom`, `mod_or_plugin_error`, `missing_dependency`, `nonzero_exit`, `unexpected_exit` + +- Structured logging across provisioning, installs, file ops, control plane +- Installer failures now include output tail for debugging + +- `/status` now exposes: `runtimeInstalled`, `devProvisioned`, `devReadyAt`, + `workspaceRoot`, `serverRoot`, `codeServerInstalled`, `codeServerRunning`, + `lastCrashClassification` + +--- + +## Current Blocking Issue + +### Symptom + +Browser opens IDE URL. Loads partially. Popup: "An unexpected error occurred." + +Console: +- `WebSocket close 1006` +- `No file system provider found` + +Workspace shows `!` (not mounted). Extension host fails to start. + +### What works + +- API auth ✅ +- Token validation ✅ +- Proxy routing to container ✅ +- WebSocket upgrade handler ✅ +- code-server process running on :6000 ✅ + +### What fails + +- Workbench WebSocket session ❌ +- Filesystem provider initialization ❌ +- Extension host startup ❌ + +### Root cause (high confidence) + +code-server is running without `--base-path /api/dev//ide`. + +It assumes it is served from `/`. All WebSocket connection paths, static +asset paths, and VS Code remote authority resolution are wrong when served +through a proxy subpath without this flag. + +--- + +## Required Fix + +Update code-server launch in agent to: + +```bash +code-server \ + --bind-addr 0.0.0.0:6000 \ + --auth none \ + --disable-telemetry \ + --base-path /api/dev/${vmid}/ide \ + /home/dev/workspace +``` + +The `vmid` must be injected at launch time from the agent's config. +This is a one-line change to the code-server launch script in the agent. + +--- + +## Architectural State + +``` +User → Portal → API → Proxy → Container (code-server) +``` + +Containers never publicly exposed. No Traefik per dev container. +No DNS per dev container. API token is sole auth mechanism. + +--- + +## Next Tasks + +1. Fix `--base-path` in agent code-server launch — immediate priority +2. Redeploy container, verify: no popup, no WS 1006, workspace loads, terminal works +3. Validate proxy consistency (HTTP + WS + static assets) +4. Portal "Open IDE" button implementation +5. Container hardening (non-root dev user, restrict /opt) +6. Headscale/Tailscale integration (future)