From 2a6bfe77dcfc44e935946a291516721e5c6ea914 Mon Sep 17 00:00:00 2001 From: jester Date: Tue, 24 Mar 2026 22:56:12 +0000 Subject: [PATCH] Add dev IDE access architecture doc --- architecture/dev-ide-access.md | 226 +++++++++++++++++++++++++++++++++ 1 file changed, 226 insertions(+) create mode 100644 architecture/dev-ide-access.md diff --git a/architecture/dev-ide-access.md b/architecture/dev-ide-access.md new file mode 100644 index 0000000..e23f367 --- /dev/null +++ b/architecture/dev-ide-access.md @@ -0,0 +1,226 @@ +# Dev IDE Access Architecture + +## Overview + +Dev containers expose a browser-accessible VS Code IDE (code-server) via a +host-based routing model. The API is the auth and proxy boundary. Traefik +handles TLS and routing. Containers are never directly exposed. + +--- + +## Architecture + +``` +Browser + ↓ https://dev-.zerolaghub.dev +Traefik (zlh-zpack-proxy, 10.70.0.242) + ↓ wildcard TLS + host routing → http://10.60.0.245:4000 +API (zpack-api, 10.60.0.245:4000) + ↓ token validation + cookie handoff + ↓ HTTP + WebSocket proxy +Container code-server (:6000) +``` + +--- + +## Why API Proxy (Not Direct Traefik → Container) + +Direct Traefik → container routing was tested and rejected for the following reasons: + +1. **No auth boundary** — code-server runs with `--auth none`. Without the API + in the path, any request reaching the container would have full IDE access. +2. **No ownership validation** — the API verifies the token against the + database to confirm the requesting user owns the container. Traefik has no + concept of this. +3. **Per-container routing complexity** — direct routing requires a Traefik + dynamic config entry per container. The API proxy approach requires only + one wildcard rule regardless of how many containers exist. + +The API proxy adds one network hop but provides auth, ownership enforcement, +and operational simplicity. + +--- + +## Token Flow + +### Step 1 — Token generation + +``` +POST /api/dev/:id/ide-token +Authorization: Bearer +``` + +Response: +```json +{ + "token": "", + "url": "https://dev-6070.zerolaghub.dev/?token=...", + "expiresIn": 300 +} +``` + +The IDE proxy token is short-lived (300s TTL), signed with a separate secret, +and carries `{ sub, vmid, type: "dev-ide" }`. It is scoped to a specific +container — a token for vmid 6070 cannot access vmid 6071. + +### Step 2 — Bootstrap + +Browser navigates to `https://dev-6070.zerolaghub.dev/?token=...` + +Traefik forwards to the API. `handleHostedProxy` extracts the vmid from the +`Host` header (`dev-6070` → `6070`), validates the token, confirms ownership +against the database, then: + +1. Sets `zlh_dev_ide_token` HTTP-only cookie (path: `/`, scoped to the hosted domain) +2. Redirects to the clean URL (token stripped from query string) + +### Step 3 — Live traffic + +All subsequent requests carry the cookie. The API validates it on every +request and proxies to `http://:6000`. + +WebSocket upgrades are handled in `attachDevProxyServer` on the raw Node.js +HTTP server — the upgrade event is intercepted before Express routing, the +cookie/token is validated, and a target-bound WS proxy is built at upgrade +time using the resolved container IP. + +--- + +## Request Flow Detail + +``` +1. GET https://dev-6070.zerolaghub.dev/?token= + → Traefik → API handleHostedProxy + → validate token (vmid match + ownership) + → Set-Cookie: zlh_dev_ide_token=; HttpOnly; Path=/ + → 302 → https://dev-6070.zerolaghub.dev/ + +2. GET https://dev-6070.zerolaghub.dev/ + → Traefik → API handleHostedProxy + → validate cookie + → proxy to http://10.100.x.x:6000/ + → code-server 302 → /?folder=/home/dev/workspace + +3. GET https://dev-6070.zerolaghub.dev/?folder=/home/dev/workspace + → Traefik → API handleHostedProxy + → validate cookie + → proxy to http://10.100.x.x:6000/?folder=/home/dev/workspace + → 200 code-server HTML + +4. WS wss://dev-6070.zerolaghub.dev/stable-/... + → Traefik → API server upgrade handler (attachDevProxyServer) + → validate cookie from WS request headers + → build target-bound WS proxy → ws://10.100.x.x:6000/stable-/... +``` + +--- + +## Traefik Configuration + +```yaml +http: + routers: + dev-ide: + rule: "HostRegexp(`dev-{vmid:[0-9]+}.zerolaghub.dev`)" + entryPoints: + - websecure + service: dev-ide-api + tls: + certResolver: zpackv2 + domains: + - main: "zerolaghub.dev" + sans: + - "*.zerolaghub.dev" + services: + dev-ide-api: + loadBalancer: + passHostHeader: true + servers: + - url: "http://10.60.0.245:4000" +``` + +`passHostHeader: true` is critical — without it Express resolves relative +redirects against the internal API IP, leaking it to the browser. + +TLS: wildcard cert `*.zerolaghub.dev` issued via Let's Encrypt DNS-01 challenge +through Cloudflare. certResolver `zpackv2` handles renewal automatically. + +--- + +## API Key Files + +- `src/routes/devProxy.js` — all IDE proxy logic +- `src/app.js` — mounts devProxy router + calls `attachDevProxyServer` +- `src/auth/tokens.js` — `signIdeProxyToken` / `verifyIdeProxyToken` + +Key functions in `devProxy.js`: + +- `handleHostedProxy` — entry point for all host-based requests +- `parseHostedVmid` — extracts vmid from `Host: dev-6070.zerolaghub.dev` +- `resolveDevTarget` — DB lookup confirming ownership + returning container IP +- `attachDevProxyServer` — raw HTTP upgrade handler for WebSockets +- `buildWsProxy` — builds a target-bound WS proxy instance at upgrade time + +--- + +## WebSocket Critical Detail + +The shared HTTP proxy instance (`ideTunnelProxy`) has `ws: false`. WebSocket +upgrades are handled exclusively in `attachDevProxyServer` which builds a +new proxy instance per upgrade with the resolved container target hardcoded. + +This was the fix for the `ECONNREFUSED 127.0.0.1:6000` bug — the shared +proxy was falling back to `DEFAULT_IDE_TARGET` (localhost) instead of the +actual container IP because per-request context was lost during the upgrade. + +--- + +## code-server Container Setup + +- Install path: `/opt/zlh/services/code-server` +- Port: `6000` (bound to `0.0.0.0`) +- Launch flags: `--auth none --disable-telemetry` +- Auth: disabled at code-server level — API handles all auth +- Workspace: `/home/dev/workspace` +- User: `dev` + +Chrome blocks port 6000 as an unsafe port. This is internal only and never +directly browser-accessible, so it is not an issue. + +--- + +## Known Pitfalls + +| Issue | Cause | Fix | +|-------|-------|-----| +| `ERR_CONNECTION_CLOSED` in browser | Traefik/API not reachable or wrong code deployed | Check connectivity + confirm deployed code has `handleHostedProxy` | +| 404 from API | `handleHostedProxy` not in running process | Restart API after deploy | +| Redirect loops to raw IP | `passHostHeader: true` missing in Traefik | Add it to loadBalancer config | +| WS 1006 disconnect | WS proxy falling back to `127.0.0.1` | Ensure `ws: false` on HTTP proxy, WS handled only in `attachDevProxyServer` | +| Cookie not set | Token expired (300s TTL) | Generate a fresh token | +| Wildcard cert fails to issue | Stale `_acme-challenge` TXT records in Cloudflare | Delete stale records manually, reload Traefik | + +--- + +## Future: SSH Access via CF Tunnel + +Planned addition — same hostname, SSH protocol routed through Cloudflare Tunnel: + +``` +Developer laptop + ↓ ssh dev-6070.zerolaghub.dev +Cloudflare edge (CF Tunnel) + ↓ +Bastion VM + ↓ SSH proxy jump +Dev container +``` + +Developer one-time SSH config: +``` +Host *.zerolaghub.dev + ProxyCommand cloudflared access ssh --hostname %h +``` + +CF Tunnel runs persistently on the bastion VM. Free tier covers up to 50 users. +Browser IDE and SSH share the same hostname — different protocols routed separately.