Add dev IDE access architecture doc
This commit is contained in:
parent
1b1b22e8f8
commit
2a6bfe77dc
226
architecture/dev-ide-access.md
Normal file
226
architecture/dev-ide-access.md
Normal file
@ -0,0 +1,226 @@
|
||||
# Dev IDE Access Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
Dev containers expose a browser-accessible VS Code IDE (code-server) via a
|
||||
host-based routing model. The API is the auth and proxy boundary. Traefik
|
||||
handles TLS and routing. Containers are never directly exposed.
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Browser
|
||||
↓ https://dev-<vmid>.zerolaghub.dev
|
||||
Traefik (zlh-zpack-proxy, 10.70.0.242)
|
||||
↓ wildcard TLS + host routing → http://10.60.0.245:4000
|
||||
API (zpack-api, 10.60.0.245:4000)
|
||||
↓ token validation + cookie handoff
|
||||
↓ HTTP + WebSocket proxy
|
||||
Container code-server (:6000)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Why API Proxy (Not Direct Traefik → Container)
|
||||
|
||||
Direct Traefik → container routing was tested and rejected for the following reasons:
|
||||
|
||||
1. **No auth boundary** — code-server runs with `--auth none`. Without the API
|
||||
in the path, any request reaching the container would have full IDE access.
|
||||
2. **No ownership validation** — the API verifies the token against the
|
||||
database to confirm the requesting user owns the container. Traefik has no
|
||||
concept of this.
|
||||
3. **Per-container routing complexity** — direct routing requires a Traefik
|
||||
dynamic config entry per container. The API proxy approach requires only
|
||||
one wildcard rule regardless of how many containers exist.
|
||||
|
||||
The API proxy adds one network hop but provides auth, ownership enforcement,
|
||||
and operational simplicity.
|
||||
|
||||
---
|
||||
|
||||
## Token Flow
|
||||
|
||||
### Step 1 — Token generation
|
||||
|
||||
```
|
||||
POST /api/dev/:id/ide-token
|
||||
Authorization: Bearer <user-jwt>
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"token": "<ide-proxy-token>",
|
||||
"url": "https://dev-6070.zerolaghub.dev/?token=...",
|
||||
"expiresIn": 300
|
||||
}
|
||||
```
|
||||
|
||||
The IDE proxy token is short-lived (300s TTL), signed with a separate secret,
|
||||
and carries `{ sub, vmid, type: "dev-ide" }`. It is scoped to a specific
|
||||
container — a token for vmid 6070 cannot access vmid 6071.
|
||||
|
||||
### Step 2 — Bootstrap
|
||||
|
||||
Browser navigates to `https://dev-6070.zerolaghub.dev/?token=...`
|
||||
|
||||
Traefik forwards to the API. `handleHostedProxy` extracts the vmid from the
|
||||
`Host` header (`dev-6070` → `6070`), validates the token, confirms ownership
|
||||
against the database, then:
|
||||
|
||||
1. Sets `zlh_dev_ide_token` HTTP-only cookie (path: `/`, scoped to the hosted domain)
|
||||
2. Redirects to the clean URL (token stripped from query string)
|
||||
|
||||
### Step 3 — Live traffic
|
||||
|
||||
All subsequent requests carry the cookie. The API validates it on every
|
||||
request and proxies to `http://<container-ip>:6000`.
|
||||
|
||||
WebSocket upgrades are handled in `attachDevProxyServer` on the raw Node.js
|
||||
HTTP server — the upgrade event is intercepted before Express routing, the
|
||||
cookie/token is validated, and a target-bound WS proxy is built at upgrade
|
||||
time using the resolved container IP.
|
||||
|
||||
---
|
||||
|
||||
## Request Flow Detail
|
||||
|
||||
```
|
||||
1. GET https://dev-6070.zerolaghub.dev/?token=<ide-token>
|
||||
→ Traefik → API handleHostedProxy
|
||||
→ validate token (vmid match + ownership)
|
||||
→ Set-Cookie: zlh_dev_ide_token=<token>; HttpOnly; Path=/
|
||||
→ 302 → https://dev-6070.zerolaghub.dev/
|
||||
|
||||
2. GET https://dev-6070.zerolaghub.dev/
|
||||
→ Traefik → API handleHostedProxy
|
||||
→ validate cookie
|
||||
→ proxy to http://10.100.x.x:6000/
|
||||
→ code-server 302 → /?folder=/home/dev/workspace
|
||||
|
||||
3. GET https://dev-6070.zerolaghub.dev/?folder=/home/dev/workspace
|
||||
→ Traefik → API handleHostedProxy
|
||||
→ validate cookie
|
||||
→ proxy to http://10.100.x.x:6000/?folder=/home/dev/workspace
|
||||
→ 200 code-server HTML
|
||||
|
||||
4. WS wss://dev-6070.zerolaghub.dev/stable-<hash>/...
|
||||
→ Traefik → API server upgrade handler (attachDevProxyServer)
|
||||
→ validate cookie from WS request headers
|
||||
→ build target-bound WS proxy → ws://10.100.x.x:6000/stable-<hash>/...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Traefik Configuration
|
||||
|
||||
```yaml
|
||||
http:
|
||||
routers:
|
||||
dev-ide:
|
||||
rule: "HostRegexp(`dev-{vmid:[0-9]+}.zerolaghub.dev`)"
|
||||
entryPoints:
|
||||
- websecure
|
||||
service: dev-ide-api
|
||||
tls:
|
||||
certResolver: zpackv2
|
||||
domains:
|
||||
- main: "zerolaghub.dev"
|
||||
sans:
|
||||
- "*.zerolaghub.dev"
|
||||
services:
|
||||
dev-ide-api:
|
||||
loadBalancer:
|
||||
passHostHeader: true
|
||||
servers:
|
||||
- url: "http://10.60.0.245:4000"
|
||||
```
|
||||
|
||||
`passHostHeader: true` is critical — without it Express resolves relative
|
||||
redirects against the internal API IP, leaking it to the browser.
|
||||
|
||||
TLS: wildcard cert `*.zerolaghub.dev` issued via Let's Encrypt DNS-01 challenge
|
||||
through Cloudflare. certResolver `zpackv2` handles renewal automatically.
|
||||
|
||||
---
|
||||
|
||||
## API Key Files
|
||||
|
||||
- `src/routes/devProxy.js` — all IDE proxy logic
|
||||
- `src/app.js` — mounts devProxy router + calls `attachDevProxyServer`
|
||||
- `src/auth/tokens.js` — `signIdeProxyToken` / `verifyIdeProxyToken`
|
||||
|
||||
Key functions in `devProxy.js`:
|
||||
|
||||
- `handleHostedProxy` — entry point for all host-based requests
|
||||
- `parseHostedVmid` — extracts vmid from `Host: dev-6070.zerolaghub.dev`
|
||||
- `resolveDevTarget` — DB lookup confirming ownership + returning container IP
|
||||
- `attachDevProxyServer` — raw HTTP upgrade handler for WebSockets
|
||||
- `buildWsProxy` — builds a target-bound WS proxy instance at upgrade time
|
||||
|
||||
---
|
||||
|
||||
## WebSocket Critical Detail
|
||||
|
||||
The shared HTTP proxy instance (`ideTunnelProxy`) has `ws: false`. WebSocket
|
||||
upgrades are handled exclusively in `attachDevProxyServer` which builds a
|
||||
new proxy instance per upgrade with the resolved container target hardcoded.
|
||||
|
||||
This was the fix for the `ECONNREFUSED 127.0.0.1:6000` bug — the shared
|
||||
proxy was falling back to `DEFAULT_IDE_TARGET` (localhost) instead of the
|
||||
actual container IP because per-request context was lost during the upgrade.
|
||||
|
||||
---
|
||||
|
||||
## code-server Container Setup
|
||||
|
||||
- Install path: `/opt/zlh/services/code-server`
|
||||
- Port: `6000` (bound to `0.0.0.0`)
|
||||
- Launch flags: `--auth none --disable-telemetry`
|
||||
- Auth: disabled at code-server level — API handles all auth
|
||||
- Workspace: `/home/dev/workspace`
|
||||
- User: `dev`
|
||||
|
||||
Chrome blocks port 6000 as an unsafe port. This is internal only and never
|
||||
directly browser-accessible, so it is not an issue.
|
||||
|
||||
---
|
||||
|
||||
## Known Pitfalls
|
||||
|
||||
| Issue | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| `ERR_CONNECTION_CLOSED` in browser | Traefik/API not reachable or wrong code deployed | Check connectivity + confirm deployed code has `handleHostedProxy` |
|
||||
| 404 from API | `handleHostedProxy` not in running process | Restart API after deploy |
|
||||
| Redirect loops to raw IP | `passHostHeader: true` missing in Traefik | Add it to loadBalancer config |
|
||||
| WS 1006 disconnect | WS proxy falling back to `127.0.0.1` | Ensure `ws: false` on HTTP proxy, WS handled only in `attachDevProxyServer` |
|
||||
| Cookie not set | Token expired (300s TTL) | Generate a fresh token |
|
||||
| Wildcard cert fails to issue | Stale `_acme-challenge` TXT records in Cloudflare | Delete stale records manually, reload Traefik |
|
||||
|
||||
---
|
||||
|
||||
## Future: SSH Access via CF Tunnel
|
||||
|
||||
Planned addition — same hostname, SSH protocol routed through Cloudflare Tunnel:
|
||||
|
||||
```
|
||||
Developer laptop
|
||||
↓ ssh dev-6070.zerolaghub.dev
|
||||
Cloudflare edge (CF Tunnel)
|
||||
↓
|
||||
Bastion VM
|
||||
↓ SSH proxy jump
|
||||
Dev container
|
||||
```
|
||||
|
||||
Developer one-time SSH config:
|
||||
```
|
||||
Host *.zerolaghub.dev
|
||||
ProxyCommand cloudflared access ssh --hostname %h
|
||||
```
|
||||
|
||||
CF Tunnel runs persistently on the bastion VM. Free tier covers up to 50 users.
|
||||
Browser IDE and SSH share the same hostname — different protocols routed separately.
|
||||
Loading…
Reference in New Issue
Block a user