4.1 KiB
2026-03-17 – IDE proxy blocking issue + agent enhancements
Summary
API proxy layer for dev IDE is complete. Agent has significant new capabilities. One blocking issue remains before IDE is fully functional.
What Was Completed
API
-
Dev routing removed:
devRouting.js,devDePublisher.jsdeleted -
All hooks removed from provisioning and container routes
-
No DNS records, no Traefik routes, no container exposure
-
GET /api/dev/:id/ide+GET /api/dev/:id/ide/*— proxy to container:6000 -
WebSocket:
server.on("upgrade")wired,ws: trueenabled -
Path rewrite:
/api/dev/:id/ide/...→/... -
POST /api/dev/:id/ide-token— short-lived signed token (300s TTL) -
Token payload:
sub,vmid,type: "dev-ide" -
Proxy accepts
Authorization: Beareror?token= -
WebSocket upgrades validate same token
-
GET /api/servers/:id/status— reads Redisagent:<vmid>, returns agent state -
proxyClient.jsintentionally untouched — game edge publish path depends on it
Agent
-
Full dev container filesystem support, workspace root
/home/dev/workspace -
Shell runs as
devuser,HOME/USER/LOGNAME/TERMset -
Dev provisioning creates
/home/devand/home/dev/workspacewith correct ownership -
Catalog-driven runtime validation against
devcontainer/_catalog.json -
All installs now fetch from artifact server — no local artifact assumption
-
Dotnet installer fetched from artifact server, installed into runtime path
-
code-server pulled from artifact server (tar.gz)
-
Installed to
/opt/zlh/services/code-server -
Lifecycle endpoints:
POST /dev/codeserver/start|stop|restart -
Process detection via
/proc/*/cmdlinescan — no longer relies solely on PID file -
Game server crash recovery with backoff: 30s → 60s → 120s
-
Backoff resets if process uptime ≥ 30s
-
Transitions to
errorstate after repeated failures -
Crash observability: time, exit code, signal, uptime, log tail, classification
-
Classifications:
oom,mod_or_plugin_error,missing_dependency,nonzero_exit,unexpected_exit -
Structured logging across provisioning, installs, file ops, control plane
-
Installer failures now include output tail for debugging
-
/statusnow exposes:runtimeInstalled,devProvisioned,devReadyAt,workspaceRoot,serverRoot,codeServerInstalled,codeServerRunning,lastCrashClassification
Current Blocking Issue
Symptom
Browser opens IDE URL. Loads partially. Popup: "An unexpected error occurred."
Console:
WebSocket close 1006No file system provider found
Workspace shows ! (not mounted). Extension host fails to start.
What works
- API auth ✅
- Token validation ✅
- Proxy routing to container ✅
- WebSocket upgrade handler ✅
- code-server process running on :6000 ✅
What fails
- Workbench WebSocket session ❌
- Filesystem provider initialization ❌
- Extension host startup ❌
Root cause (high confidence)
code-server is running without --base-path /api/dev/<vmid>/ide.
It assumes it is served from /. All WebSocket connection paths, static
asset paths, and VS Code remote authority resolution are wrong when served
through a proxy subpath without this flag.
Required Fix
Update code-server launch in agent to:
code-server \
--bind-addr 0.0.0.0:6000 \
--auth none \
--disable-telemetry \
--base-path /api/dev/${vmid}/ide \
/home/dev/workspace
The vmid must be injected at launch time from the agent's config.
This is a one-line change to the code-server launch script in the agent.
Architectural State
User → Portal → API → Proxy → Container (code-server)
Containers never publicly exposed. No Traefik per dev container. No DNS per dev container. API token is sole auth mechanism.
Next Tasks
- Fix
--base-pathin agent code-server launch — immediate priority - Redeploy container, verify: no popup, no WS 1006, workspace loads, terminal works
- Validate proxy consistency (HTTP + WS + static assets)
- Portal "Open IDE" button implementation
- Container hardening (non-root dev user, restrict /opt)
- Headscale/Tailscale integration (future)