knowledge-base/architecture/dev-ide-access.md

7.1 KiB

Dev IDE Access Architecture

Overview

Dev containers expose a browser-accessible VS Code IDE (code-server) via a host-based routing model. The API is the auth and proxy boundary. Traefik handles TLS and routing. Containers are never directly exposed.


Architecture

Browser
  ↓ https://dev-<vmid>.zerolaghub.dev
Traefik (zlh-zpack-proxy, 10.70.0.242)
  ↓ wildcard TLS + host routing → http://10.60.0.245:4000
API (zpack-api, 10.60.0.245:4000)
  ↓ token validation + cookie handoff
  ↓ HTTP + WebSocket proxy
Container code-server (:6000)

Why API Proxy (Not Direct Traefik → Container)

Direct Traefik → container routing was tested and rejected for the following reasons:

  1. No auth boundary — code-server runs with --auth none. Without the API in the path, any request reaching the container would have full IDE access.
  2. No ownership validation — the API verifies the token against the database to confirm the requesting user owns the container. Traefik has no concept of this.
  3. Per-container routing complexity — direct routing requires a Traefik dynamic config entry per container. The API proxy approach requires only one wildcard rule regardless of how many containers exist.

The API proxy adds one network hop but provides auth, ownership enforcement, and operational simplicity.


Token Flow

Step 1 — Token generation

POST /api/dev/:id/ide-token
Authorization: Bearer <user-jwt>

Response:

{
  "token": "<ide-proxy-token>",
  "url": "https://dev-6070.zerolaghub.dev/?token=...",
  "expiresIn": 300
}

The IDE proxy token is short-lived (300s TTL), signed with a separate secret, and carries { sub, vmid, type: "dev-ide" }. It is scoped to a specific container — a token for vmid 6070 cannot access vmid 6071.

Step 2 — Bootstrap

Browser navigates to https://dev-6070.zerolaghub.dev/?token=...

Traefik forwards to the API. handleHostedProxy extracts the vmid from the Host header (dev-60706070), validates the token, confirms ownership against the database, then:

  1. Sets zlh_dev_ide_token HTTP-only cookie (path: /, scoped to the hosted domain)
  2. Redirects to the clean URL (token stripped from query string)

Step 3 — Live traffic

All subsequent requests carry the cookie. The API validates it on every request and proxies to http://<container-ip>:6000.

WebSocket upgrades are handled in attachDevProxyServer on the raw Node.js HTTP server — the upgrade event is intercepted before Express routing, the cookie/token is validated, and a target-bound WS proxy is built at upgrade time using the resolved container IP.


Request Flow Detail

1. GET https://dev-6070.zerolaghub.dev/?token=<ide-token>
   → Traefik → API handleHostedProxy
   → validate token (vmid match + ownership)
   → Set-Cookie: zlh_dev_ide_token=<token>; HttpOnly; Path=/
   → 302 → https://dev-6070.zerolaghub.dev/

2. GET https://dev-6070.zerolaghub.dev/
   → Traefik → API handleHostedProxy
   → validate cookie
   → proxy to http://10.100.x.x:6000/
   → code-server 302 → /?folder=/home/dev/workspace

3. GET https://dev-6070.zerolaghub.dev/?folder=/home/dev/workspace
   → Traefik → API handleHostedProxy
   → validate cookie
   → proxy to http://10.100.x.x:6000/?folder=/home/dev/workspace
   → 200 code-server HTML

4. WS wss://dev-6070.zerolaghub.dev/stable-<hash>/...
   → Traefik → API server upgrade handler (attachDevProxyServer)
   → validate cookie from WS request headers
   → build target-bound WS proxy → ws://10.100.x.x:6000/stable-<hash>/...

Traefik Configuration

http:
  routers:
    dev-ide:
      rule: "HostRegexp(`dev-{vmid:[0-9]+}.zerolaghub.dev`)"
      entryPoints:
        - websecure
      service: dev-ide-api
      tls:
        certResolver: zpackv2
        domains:
          - main: "zerolaghub.dev"
            sans:
              - "*.zerolaghub.dev"
  services:
    dev-ide-api:
      loadBalancer:
        passHostHeader: true
        servers:
          - url: "http://10.60.0.245:4000"

passHostHeader: true is critical — without it Express resolves relative redirects against the internal API IP, leaking it to the browser.

TLS: wildcard cert *.zerolaghub.dev issued via Let's Encrypt DNS-01 challenge through Cloudflare. certResolver zpackv2 handles renewal automatically.


API Key Files

  • src/routes/devProxy.js — all IDE proxy logic
  • src/app.js — mounts devProxy router + calls attachDevProxyServer
  • src/auth/tokens.jssignIdeProxyToken / verifyIdeProxyToken

Key functions in devProxy.js:

  • handleHostedProxy — entry point for all host-based requests
  • parseHostedVmid — extracts vmid from Host: dev-6070.zerolaghub.dev
  • resolveDevTarget — DB lookup confirming ownership + returning container IP
  • attachDevProxyServer — raw HTTP upgrade handler for WebSockets
  • buildWsProxy — builds a target-bound WS proxy instance at upgrade time

WebSocket Critical Detail

The shared HTTP proxy instance (ideTunnelProxy) has ws: false. WebSocket upgrades are handled exclusively in attachDevProxyServer which builds a new proxy instance per upgrade with the resolved container target hardcoded.

This was the fix for the ECONNREFUSED 127.0.0.1:6000 bug — the shared proxy was falling back to DEFAULT_IDE_TARGET (localhost) instead of the actual container IP because per-request context was lost during the upgrade.


code-server Container Setup

  • Install path: /opt/zlh/services/code-server
  • Port: 6000 (bound to 0.0.0.0)
  • Launch flags: --auth none --disable-telemetry
  • Auth: disabled at code-server level — API handles all auth
  • Workspace: /home/dev/workspace
  • User: dev

Chrome blocks port 6000 as an unsafe port. This is internal only and never directly browser-accessible, so it is not an issue.


Known Pitfalls

Issue Cause Fix
ERR_CONNECTION_CLOSED in browser Traefik/API not reachable or wrong code deployed Check connectivity + confirm deployed code has handleHostedProxy
404 from API handleHostedProxy not in running process Restart API after deploy
Redirect loops to raw IP passHostHeader: true missing in Traefik Add it to loadBalancer config
WS 1006 disconnect WS proxy falling back to 127.0.0.1 Ensure ws: false on HTTP proxy, WS handled only in attachDevProxyServer
Cookie not set Token expired (300s TTL) Generate a fresh token
Wildcard cert fails to issue Stale _acme-challenge TXT records in Cloudflare Delete stale records manually, reload Traefik

Future: SSH Access via CF Tunnel

Planned addition — same hostname, SSH protocol routed through Cloudflare Tunnel:

Developer laptop
  ↓ ssh dev-6070.zerolaghub.dev
Cloudflare edge (CF Tunnel)
  ↓
Bastion VM
  ↓ SSH proxy jump
Dev container

Developer one-time SSH config:

Host *.zerolaghub.dev
    ProxyCommand cloudflared access ssh --hostname %h

CF Tunnel runs persistently on the bastion VM. Free tier covers up to 50 users. Browser IDE and SSH share the same hostname — different protocols routed separately.