Add session summary 2026-03-17 — IDE proxy complete, base-path blocking issue, agent enhancements
This commit is contained in:
parent
525366c5df
commit
393911c443
132
Session_Summaries/2026-03-17_IDE-Proxy-Blocking-Issue.md
Normal file
132
Session_Summaries/2026-03-17_IDE-Proxy-Blocking-Issue.md
Normal file
@ -0,0 +1,132 @@
|
|||||||
|
# 2026-03-17 – IDE proxy blocking issue + agent enhancements
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
API proxy layer for dev IDE is complete. Agent has significant new
|
||||||
|
capabilities. One blocking issue remains before IDE is fully functional.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What Was Completed
|
||||||
|
|
||||||
|
### API
|
||||||
|
|
||||||
|
- Dev routing removed: `devRouting.js`, `devDePublisher.js` deleted
|
||||||
|
- All hooks removed from provisioning and container routes
|
||||||
|
- No DNS records, no Traefik routes, no container exposure
|
||||||
|
|
||||||
|
- `GET /api/dev/:id/ide` + `GET /api/dev/:id/ide/*` — proxy to container:6000
|
||||||
|
- WebSocket: `server.on("upgrade")` wired, `ws: true` enabled
|
||||||
|
- Path rewrite: `/api/dev/:id/ide/...` → `/...`
|
||||||
|
|
||||||
|
- `POST /api/dev/:id/ide-token` — short-lived signed token (300s TTL)
|
||||||
|
- Token payload: `sub`, `vmid`, `type: "dev-ide"`
|
||||||
|
- Proxy accepts `Authorization: Bearer` or `?token=`
|
||||||
|
- WebSocket upgrades validate same token
|
||||||
|
|
||||||
|
- `GET /api/servers/:id/status` — reads Redis `agent:<vmid>`, returns agent state
|
||||||
|
|
||||||
|
- `proxyClient.js` intentionally untouched — game edge publish path depends on it
|
||||||
|
|
||||||
|
### Agent
|
||||||
|
|
||||||
|
- Full dev container filesystem support, workspace root `/home/dev/workspace`
|
||||||
|
- Shell runs as `dev` user, `HOME`/`USER`/`LOGNAME`/`TERM` set
|
||||||
|
- Dev provisioning creates `/home/dev` and `/home/dev/workspace` with correct ownership
|
||||||
|
- Catalog-driven runtime validation against `devcontainer/_catalog.json`
|
||||||
|
- All installs now fetch from artifact server — no local artifact assumption
|
||||||
|
- Dotnet installer fetched from artifact server, installed into runtime path
|
||||||
|
|
||||||
|
- code-server pulled from artifact server (tar.gz)
|
||||||
|
- Installed to `/opt/zlh/services/code-server`
|
||||||
|
- Lifecycle endpoints: `POST /dev/codeserver/start|stop|restart`
|
||||||
|
- Process detection via `/proc/*/cmdline` scan — no longer relies solely on PID file
|
||||||
|
|
||||||
|
- Game server crash recovery with backoff: 30s → 60s → 120s
|
||||||
|
- Backoff resets if process uptime ≥ 30s
|
||||||
|
- Transitions to `error` state after repeated failures
|
||||||
|
- Crash observability: time, exit code, signal, uptime, log tail, classification
|
||||||
|
- Classifications: `oom`, `mod_or_plugin_error`, `missing_dependency`, `nonzero_exit`, `unexpected_exit`
|
||||||
|
|
||||||
|
- Structured logging across provisioning, installs, file ops, control plane
|
||||||
|
- Installer failures now include output tail for debugging
|
||||||
|
|
||||||
|
- `/status` now exposes: `runtimeInstalled`, `devProvisioned`, `devReadyAt`,
|
||||||
|
`workspaceRoot`, `serverRoot`, `codeServerInstalled`, `codeServerRunning`,
|
||||||
|
`lastCrashClassification`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current Blocking Issue
|
||||||
|
|
||||||
|
### Symptom
|
||||||
|
|
||||||
|
Browser opens IDE URL. Loads partially. Popup: "An unexpected error occurred."
|
||||||
|
|
||||||
|
Console:
|
||||||
|
- `WebSocket close 1006`
|
||||||
|
- `No file system provider found`
|
||||||
|
|
||||||
|
Workspace shows `!` (not mounted). Extension host fails to start.
|
||||||
|
|
||||||
|
### What works
|
||||||
|
|
||||||
|
- API auth ✅
|
||||||
|
- Token validation ✅
|
||||||
|
- Proxy routing to container ✅
|
||||||
|
- WebSocket upgrade handler ✅
|
||||||
|
- code-server process running on :6000 ✅
|
||||||
|
|
||||||
|
### What fails
|
||||||
|
|
||||||
|
- Workbench WebSocket session ❌
|
||||||
|
- Filesystem provider initialization ❌
|
||||||
|
- Extension host startup ❌
|
||||||
|
|
||||||
|
### Root cause (high confidence)
|
||||||
|
|
||||||
|
code-server is running without `--base-path /api/dev/<vmid>/ide`.
|
||||||
|
|
||||||
|
It assumes it is served from `/`. All WebSocket connection paths, static
|
||||||
|
asset paths, and VS Code remote authority resolution are wrong when served
|
||||||
|
through a proxy subpath without this flag.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Required Fix
|
||||||
|
|
||||||
|
Update code-server launch in agent to:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
code-server \
|
||||||
|
--bind-addr 0.0.0.0:6000 \
|
||||||
|
--auth none \
|
||||||
|
--disable-telemetry \
|
||||||
|
--base-path /api/dev/${vmid}/ide \
|
||||||
|
/home/dev/workspace
|
||||||
|
```
|
||||||
|
|
||||||
|
The `vmid` must be injected at launch time from the agent's config.
|
||||||
|
This is a one-line change to the code-server launch script in the agent.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architectural State
|
||||||
|
|
||||||
|
```
|
||||||
|
User → Portal → API → Proxy → Container (code-server)
|
||||||
|
```
|
||||||
|
|
||||||
|
Containers never publicly exposed. No Traefik per dev container.
|
||||||
|
No DNS per dev container. API token is sole auth mechanism.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Tasks
|
||||||
|
|
||||||
|
1. Fix `--base-path` in agent code-server launch — immediate priority
|
||||||
|
2. Redeploy container, verify: no popup, no WS 1006, workspace loads, terminal works
|
||||||
|
3. Validate proxy consistency (HTTP + WS + static assets)
|
||||||
|
4. Portal "Open IDE" button implementation
|
||||||
|
5. Container hardening (non-root dev user, restrict /opt)
|
||||||
|
6. Headscale/Tailscale integration (future)
|
||||||
Loading…
Reference in New Issue
Block a user