Add dev IDE access architecture doc
This commit is contained in:
parent
1b1b22e8f8
commit
2a6bfe77dc
226
architecture/dev-ide-access.md
Normal file
226
architecture/dev-ide-access.md
Normal file
@ -0,0 +1,226 @@
|
|||||||
|
# Dev IDE Access Architecture
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Dev containers expose a browser-accessible VS Code IDE (code-server) via a
|
||||||
|
host-based routing model. The API is the auth and proxy boundary. Traefik
|
||||||
|
handles TLS and routing. Containers are never directly exposed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
Browser
|
||||||
|
↓ https://dev-<vmid>.zerolaghub.dev
|
||||||
|
Traefik (zlh-zpack-proxy, 10.70.0.242)
|
||||||
|
↓ wildcard TLS + host routing → http://10.60.0.245:4000
|
||||||
|
API (zpack-api, 10.60.0.245:4000)
|
||||||
|
↓ token validation + cookie handoff
|
||||||
|
↓ HTTP + WebSocket proxy
|
||||||
|
Container code-server (:6000)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why API Proxy (Not Direct Traefik → Container)
|
||||||
|
|
||||||
|
Direct Traefik → container routing was tested and rejected for the following reasons:
|
||||||
|
|
||||||
|
1. **No auth boundary** — code-server runs with `--auth none`. Without the API
|
||||||
|
in the path, any request reaching the container would have full IDE access.
|
||||||
|
2. **No ownership validation** — the API verifies the token against the
|
||||||
|
database to confirm the requesting user owns the container. Traefik has no
|
||||||
|
concept of this.
|
||||||
|
3. **Per-container routing complexity** — direct routing requires a Traefik
|
||||||
|
dynamic config entry per container. The API proxy approach requires only
|
||||||
|
one wildcard rule regardless of how many containers exist.
|
||||||
|
|
||||||
|
The API proxy adds one network hop but provides auth, ownership enforcement,
|
||||||
|
and operational simplicity.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Token Flow
|
||||||
|
|
||||||
|
### Step 1 — Token generation
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /api/dev/:id/ide-token
|
||||||
|
Authorization: Bearer <user-jwt>
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"token": "<ide-proxy-token>",
|
||||||
|
"url": "https://dev-6070.zerolaghub.dev/?token=...",
|
||||||
|
"expiresIn": 300
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The IDE proxy token is short-lived (300s TTL), signed with a separate secret,
|
||||||
|
and carries `{ sub, vmid, type: "dev-ide" }`. It is scoped to a specific
|
||||||
|
container — a token for vmid 6070 cannot access vmid 6071.
|
||||||
|
|
||||||
|
### Step 2 — Bootstrap
|
||||||
|
|
||||||
|
Browser navigates to `https://dev-6070.zerolaghub.dev/?token=...`
|
||||||
|
|
||||||
|
Traefik forwards to the API. `handleHostedProxy` extracts the vmid from the
|
||||||
|
`Host` header (`dev-6070` → `6070`), validates the token, confirms ownership
|
||||||
|
against the database, then:
|
||||||
|
|
||||||
|
1. Sets `zlh_dev_ide_token` HTTP-only cookie (path: `/`, scoped to the hosted domain)
|
||||||
|
2. Redirects to the clean URL (token stripped from query string)
|
||||||
|
|
||||||
|
### Step 3 — Live traffic
|
||||||
|
|
||||||
|
All subsequent requests carry the cookie. The API validates it on every
|
||||||
|
request and proxies to `http://<container-ip>:6000`.
|
||||||
|
|
||||||
|
WebSocket upgrades are handled in `attachDevProxyServer` on the raw Node.js
|
||||||
|
HTTP server — the upgrade event is intercepted before Express routing, the
|
||||||
|
cookie/token is validated, and a target-bound WS proxy is built at upgrade
|
||||||
|
time using the resolved container IP.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Request Flow Detail
|
||||||
|
|
||||||
|
```
|
||||||
|
1. GET https://dev-6070.zerolaghub.dev/?token=<ide-token>
|
||||||
|
→ Traefik → API handleHostedProxy
|
||||||
|
→ validate token (vmid match + ownership)
|
||||||
|
→ Set-Cookie: zlh_dev_ide_token=<token>; HttpOnly; Path=/
|
||||||
|
→ 302 → https://dev-6070.zerolaghub.dev/
|
||||||
|
|
||||||
|
2. GET https://dev-6070.zerolaghub.dev/
|
||||||
|
→ Traefik → API handleHostedProxy
|
||||||
|
→ validate cookie
|
||||||
|
→ proxy to http://10.100.x.x:6000/
|
||||||
|
→ code-server 302 → /?folder=/home/dev/workspace
|
||||||
|
|
||||||
|
3. GET https://dev-6070.zerolaghub.dev/?folder=/home/dev/workspace
|
||||||
|
→ Traefik → API handleHostedProxy
|
||||||
|
→ validate cookie
|
||||||
|
→ proxy to http://10.100.x.x:6000/?folder=/home/dev/workspace
|
||||||
|
→ 200 code-server HTML
|
||||||
|
|
||||||
|
4. WS wss://dev-6070.zerolaghub.dev/stable-<hash>/...
|
||||||
|
→ Traefik → API server upgrade handler (attachDevProxyServer)
|
||||||
|
→ validate cookie from WS request headers
|
||||||
|
→ build target-bound WS proxy → ws://10.100.x.x:6000/stable-<hash>/...
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Traefik Configuration
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
http:
|
||||||
|
routers:
|
||||||
|
dev-ide:
|
||||||
|
rule: "HostRegexp(`dev-{vmid:[0-9]+}.zerolaghub.dev`)"
|
||||||
|
entryPoints:
|
||||||
|
- websecure
|
||||||
|
service: dev-ide-api
|
||||||
|
tls:
|
||||||
|
certResolver: zpackv2
|
||||||
|
domains:
|
||||||
|
- main: "zerolaghub.dev"
|
||||||
|
sans:
|
||||||
|
- "*.zerolaghub.dev"
|
||||||
|
services:
|
||||||
|
dev-ide-api:
|
||||||
|
loadBalancer:
|
||||||
|
passHostHeader: true
|
||||||
|
servers:
|
||||||
|
- url: "http://10.60.0.245:4000"
|
||||||
|
```
|
||||||
|
|
||||||
|
`passHostHeader: true` is critical — without it Express resolves relative
|
||||||
|
redirects against the internal API IP, leaking it to the browser.
|
||||||
|
|
||||||
|
TLS: wildcard cert `*.zerolaghub.dev` issued via Let's Encrypt DNS-01 challenge
|
||||||
|
through Cloudflare. certResolver `zpackv2` handles renewal automatically.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## API Key Files
|
||||||
|
|
||||||
|
- `src/routes/devProxy.js` — all IDE proxy logic
|
||||||
|
- `src/app.js` — mounts devProxy router + calls `attachDevProxyServer`
|
||||||
|
- `src/auth/tokens.js` — `signIdeProxyToken` / `verifyIdeProxyToken`
|
||||||
|
|
||||||
|
Key functions in `devProxy.js`:
|
||||||
|
|
||||||
|
- `handleHostedProxy` — entry point for all host-based requests
|
||||||
|
- `parseHostedVmid` — extracts vmid from `Host: dev-6070.zerolaghub.dev`
|
||||||
|
- `resolveDevTarget` — DB lookup confirming ownership + returning container IP
|
||||||
|
- `attachDevProxyServer` — raw HTTP upgrade handler for WebSockets
|
||||||
|
- `buildWsProxy` — builds a target-bound WS proxy instance at upgrade time
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## WebSocket Critical Detail
|
||||||
|
|
||||||
|
The shared HTTP proxy instance (`ideTunnelProxy`) has `ws: false`. WebSocket
|
||||||
|
upgrades are handled exclusively in `attachDevProxyServer` which builds a
|
||||||
|
new proxy instance per upgrade with the resolved container target hardcoded.
|
||||||
|
|
||||||
|
This was the fix for the `ECONNREFUSED 127.0.0.1:6000` bug — the shared
|
||||||
|
proxy was falling back to `DEFAULT_IDE_TARGET` (localhost) instead of the
|
||||||
|
actual container IP because per-request context was lost during the upgrade.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## code-server Container Setup
|
||||||
|
|
||||||
|
- Install path: `/opt/zlh/services/code-server`
|
||||||
|
- Port: `6000` (bound to `0.0.0.0`)
|
||||||
|
- Launch flags: `--auth none --disable-telemetry`
|
||||||
|
- Auth: disabled at code-server level — API handles all auth
|
||||||
|
- Workspace: `/home/dev/workspace`
|
||||||
|
- User: `dev`
|
||||||
|
|
||||||
|
Chrome blocks port 6000 as an unsafe port. This is internal only and never
|
||||||
|
directly browser-accessible, so it is not an issue.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Known Pitfalls
|
||||||
|
|
||||||
|
| Issue | Cause | Fix |
|
||||||
|
|-------|-------|-----|
|
||||||
|
| `ERR_CONNECTION_CLOSED` in browser | Traefik/API not reachable or wrong code deployed | Check connectivity + confirm deployed code has `handleHostedProxy` |
|
||||||
|
| 404 from API | `handleHostedProxy` not in running process | Restart API after deploy |
|
||||||
|
| Redirect loops to raw IP | `passHostHeader: true` missing in Traefik | Add it to loadBalancer config |
|
||||||
|
| WS 1006 disconnect | WS proxy falling back to `127.0.0.1` | Ensure `ws: false` on HTTP proxy, WS handled only in `attachDevProxyServer` |
|
||||||
|
| Cookie not set | Token expired (300s TTL) | Generate a fresh token |
|
||||||
|
| Wildcard cert fails to issue | Stale `_acme-challenge` TXT records in Cloudflare | Delete stale records manually, reload Traefik |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Future: SSH Access via CF Tunnel
|
||||||
|
|
||||||
|
Planned addition — same hostname, SSH protocol routed through Cloudflare Tunnel:
|
||||||
|
|
||||||
|
```
|
||||||
|
Developer laptop
|
||||||
|
↓ ssh dev-6070.zerolaghub.dev
|
||||||
|
Cloudflare edge (CF Tunnel)
|
||||||
|
↓
|
||||||
|
Bastion VM
|
||||||
|
↓ SSH proxy jump
|
||||||
|
Dev container
|
||||||
|
```
|
||||||
|
|
||||||
|
Developer one-time SSH config:
|
||||||
|
```
|
||||||
|
Host *.zerolaghub.dev
|
||||||
|
ProxyCommand cloudflared access ssh --hostname %h
|
||||||
|
```
|
||||||
|
|
||||||
|
CF Tunnel runs persistently on the bastion VM. Free tier covers up to 50 users.
|
||||||
|
Browser IDE and SSH share the same hostname — different protocols routed separately.
|
||||||
Loading…
Reference in New Issue
Block a user