zlh-grind/Session_Summaries/2026-03-17_IDE-Proxy-Blocking-Issue.md

4.1 KiB
Raw Blame History

2026-03-17 IDE proxy blocking issue + agent enhancements

Summary

API proxy layer for dev IDE is complete. Agent has significant new capabilities. One blocking issue remains before IDE is fully functional.


What Was Completed

API

  • Dev routing removed: devRouting.js, devDePublisher.js deleted

  • All hooks removed from provisioning and container routes

  • No DNS records, no Traefik routes, no container exposure

  • GET /api/dev/:id/ide + GET /api/dev/:id/ide/* — proxy to container:6000

  • WebSocket: server.on("upgrade") wired, ws: true enabled

  • Path rewrite: /api/dev/:id/ide/.../...

  • POST /api/dev/:id/ide-token — short-lived signed token (300s TTL)

  • Token payload: sub, vmid, type: "dev-ide"

  • Proxy accepts Authorization: Bearer or ?token=

  • WebSocket upgrades validate same token

  • GET /api/servers/:id/status — reads Redis agent:<vmid>, returns agent state

  • proxyClient.js intentionally untouched — game edge publish path depends on it

Agent

  • Full dev container filesystem support, workspace root /home/dev/workspace

  • Shell runs as dev user, HOME/USER/LOGNAME/TERM set

  • Dev provisioning creates /home/dev and /home/dev/workspace with correct ownership

  • Catalog-driven runtime validation against devcontainer/_catalog.json

  • All installs now fetch from artifact server — no local artifact assumption

  • Dotnet installer fetched from artifact server, installed into runtime path

  • code-server pulled from artifact server (tar.gz)

  • Installed to /opt/zlh/services/code-server

  • Lifecycle endpoints: POST /dev/codeserver/start|stop|restart

  • Process detection via /proc/*/cmdline scan — no longer relies solely on PID file

  • Game server crash recovery with backoff: 30s → 60s → 120s

  • Backoff resets if process uptime ≥ 30s

  • Transitions to error state after repeated failures

  • Crash observability: time, exit code, signal, uptime, log tail, classification

  • Classifications: oom, mod_or_plugin_error, missing_dependency, nonzero_exit, unexpected_exit

  • Structured logging across provisioning, installs, file ops, control plane

  • Installer failures now include output tail for debugging

  • /status now exposes: runtimeInstalled, devProvisioned, devReadyAt, workspaceRoot, serverRoot, codeServerInstalled, codeServerRunning, lastCrashClassification


Current Blocking Issue

Symptom

Browser opens IDE URL. Loads partially. Popup: "An unexpected error occurred."

Console:

  • WebSocket close 1006
  • No file system provider found

Workspace shows ! (not mounted). Extension host fails to start.

What works

  • API auth
  • Token validation
  • Proxy routing to container
  • WebSocket upgrade handler
  • code-server process running on :6000

What fails

  • Workbench WebSocket session
  • Filesystem provider initialization
  • Extension host startup

Root cause (high confidence)

code-server is running without --base-path /api/dev/<vmid>/ide.

It assumes it is served from /. All WebSocket connection paths, static asset paths, and VS Code remote authority resolution are wrong when served through a proxy subpath without this flag.


Required Fix

Update code-server launch in agent to:

code-server \
  --bind-addr 0.0.0.0:6000 \
  --auth none \
  --disable-telemetry \
  --base-path /api/dev/${vmid}/ide \
  /home/dev/workspace

The vmid must be injected at launch time from the agent's config. This is a one-line change to the code-server launch script in the agent.


Architectural State

User → Portal → API → Proxy → Container (code-server)

Containers never publicly exposed. No Traefik per dev container. No DNS per dev container. API token is sole auth mechanism.


Next Tasks

  1. Fix --base-path in agent code-server launch — immediate priority
  2. Redeploy container, verify: no popup, no WS 1006, workspace loads, terminal works
  3. Validate proxy consistency (HTTP + WS + static assets)
  4. Portal "Open IDE" button implementation
  5. Container hardening (non-root dev user, restrict /opt)
  6. Headscale/Tailscale integration (future)