# Open Threads – zlh-grind This file tracks active but unfinished work. Keep it short. --- ## Agent (zlh-agent) ### Dev Runtime System Completed: - catalog validation implemented - runtime installs artifact-backed - install guard implemented - all installs now fetch from artifact server (no local artifact assumption) Outstanding: - runtime install verification improvements - catalog hash validation - runtime removal / upgrade handling --- ### Dev Environment Completed: - dev user creation - workspace root `/home/dev/workspace` - console runs as dev user - `HOME`, `USER`, `LOGNAME`, `TERM` env vars set correctly Outstanding: - PATH normalization - shell profile consistency - runtime PATH injection --- ### Code Server Addon Status: ✅ Installed, running, browser-verified end-to-end Confirmed: - pulled from artifact server (tar.gz) - installed to `/opt/zlh/services/code-server` - binds to `0.0.0.0:6000` - lifecycle endpoints: `POST /dev/codeserver/start|stop|restart` - detection via `/proc/*/cmdline` scan - full browser IDE loading confirmed at `dev-6070.zerolaghub.dev` --- ### Game Server Supervision Completed: - crash recovery with backoff: 30s → 60s → 120s - backoff resets if uptime ≥ 30s - transitions to `error` state after repeated failures - crash observability: time, exit code, signal, uptime, log tail, classification - classifications: `oom`, `mod_or_plugin_error`, `missing_dependency`, `nonzero_exit`, `unexpected_exit` --- ### Agent Future Work (priority order) 1. Structured logging (slog) for Loki 2. Dev container `provisioningComplete` state in `/status` 3. Graceful shutdown verification (SIGTERM + wait for Minecraft) 4. Process reattachment on agent restart --- ## Dev IDE Access ### Browser IDE ✅ Fully Working (browser-verified) ``` Browser → dev-.zerolaghub.dev → Traefik → API → container:6000 ``` Browser-verified: VS Code loads in browser at `dev-6070.zerolaghub.dev/?folder=/home/dev/workspace` with workspace mounted, extensions panel visible, AI chat panel active. Verified flow: 1. frontend calls `POST /api/dev/:id/ide-token` 2. API returns `https://dev-.zerolaghub.dev/?token=...` 3. browser opens hosted URL 4. Traefik wildcard router forwards to API at `http://10.60.0.245:4000` 5. API validates token, sets `zlh_dev_ide_token`, redirects to clean host URL 6. subsequent cookie-backed request redirects to `/?folder=/home/dev/workspace` 7. IDE loads fully in browser ### Remaining Work - confirm "Open IDE" button in portal uses hosted URL in production path - reduce legacy `/__ide/:id` compatibility paths once portal button confirmed - simplify and harden `devProxy` — remove stale path-based assumptions ### Wildcard Edge (Traefik) - Traefik on `zlh-zpack-proxy` (10.70.0.242) handles wildcard TLS via DNS challenge - wildcard cert `*.zerolaghub.dev` issued via Let's Encrypt + Cloudflare DNS-01 - Traefik routes `dev-*.zerolaghub.dev` → API at `http://10.60.0.245:4000` - `passHostHeader: true` preserves original hostname through to API - no Caddy, no `:8081`, no per-container DNS/Traefik side effects from API ### Local Dev Access — SSH via CF Tunnel (Next Step) Decision: Cloudflare Tunnel on bastion VM for SSH access. Free tier covers up to 50 users. Planned architecture: ``` Developer laptop ↓ ssh dev-6070.zerolaghub.dev Cloudflare edge ↓ CF Tunnel (persistent, runs on bastion) Bastion VM (internal) ↓ SSH proxy jump Dev container (10.100.x.x) ``` Same hostname as browser IDE — different protocol. Cloudflare routes HTTPS to Traefik and SSH to CF Tunnel separately. Developer one-time SSH config: ``` Host *.zerolaghub.dev ProxyCommand cloudflared access ssh --hostname %h ``` After that `ssh dev-6070.zerolaghub.dev` just works. Portal can surface this config snippet as a copyable block. Outstanding: - Install `cloudflared` on bastion VM - Create CF Tunnel pointed at bastion SSH port - Map `*.zerolaghub.dev` SSH through tunnel - Portal SSH config snippet UI - Agent: surface SSH hostname in `/status` or via API --- ## API (zpack-api) Completed: - dev provisioning payload - runtime/version fields - enable_code_server flag - `GET /api/servers/:id/status` — server status endpoint - `POST /api/dev/:id/ide-token` — IDE token generation + hosted URL - `GET /api/dev/:id/ide` — bootstrap route (validates token, sets cookie, redirects) - `/__ide/:id/*` — live tunnel proxy (HTTP + WS, target-bound) - dev routing experiment removed (`devRouting.js`, `devDePublisher.js` deleted) - host-based URL generation (`DEV_IDE_HOST_SUFFIX`, `DEV_IDE_RETURN_HOSTED_URL`) - `handleHostedProxy` — host-based routing via `Host` header vmid extraction - token bootstrap → cookie handoff working under hosted flow - hosted flow browser-verified end-to-end Outstanding: - simplify and harden host-native `devProxy` — remove stale path-based assumptions - dev runtime catalog endpoint for portal - Headscale auth key generation --- ## Portal (zpack-portal) Completed: - dev runtime dropdown - dotnet runtime support - enable code-server checkbox - dev file browser support Outstanding: - confirm "Open IDE" button fully uses hosted URL flow - SSH config snippet for local VS Code / terminal access - Headscale setup instructions --- ## Pre-Launch Checklist Outstanding before launch: - **Upload testing** — test file upload flow end-to-end in dev containers - **Portal copy/wording** — site needs rewriting for public audience - **Dedicated host migration** — evaluate GTHost upgrade (Gold 6152, Detroit) - Trial period approach: $5/day up to 10 days - PBS restore for safe migration validation - Two-host split (core vs game/dev) is longer term option --- ## Platform Future work: - CF Tunnel SSH access (see Local Dev Access above) - Tailscale dev access (alternative/complement to CF Tunnel) - artifact version promotion - runtime rollback support - Cloudflare R2 for large artifact/mod file delivery at scale --- ## Closed Threads - ✅ PTY console (dev + game) - ✅ Mod lifecycle - ✅ Upload pipeline - ✅ Runtime artifact installs - ✅ Dev container filesystem model - ✅ Code-server artifact fix - ✅ API status endpoint for frontend agent-state consumption - ✅ Game server crash recovery with backoff - ✅ Crash observability (classification, log tail, exit metadata) - ✅ Code-server lifecycle endpoints (start/stop/restart) - ✅ Code-server process detection via /proc scan - ✅ Dev IDE proxy — path-based browser IDE working end-to-end - ✅ Hosted wildcard Traefik → API → container dev IDE flow — browser-verified - ✅ Per-container dev IDE edge publish/unpublish removed from API - ✅ Wildcard TLS cert `*.zerolaghub.dev` via Let's Encrypt + Cloudflare DNS-01 - ✅ Browser IDE fully loading at dev-.zerolaghub.dev