From d128d92d15cb70c66f1e0836aea297dfecf3cc7d Mon Sep 17 00:00:00 2001 From: jester Date: Sun, 15 Mar 2026 22:58:55 +0000 Subject: [PATCH] =?UTF-8?q?Pivot=20dev=20access=20=E2=80=94=20abandon=20Tr?= =?UTF-8?q?aefik/DNS,=20adopt=20API=20proxy=20+=20Headscale?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- OPEN_THREADS.md | 173 +++++++++++++++++++++++++++--------------------- 1 file changed, 97 insertions(+), 76 deletions(-) diff --git a/OPEN_THREADS.md b/OPEN_THREADS.md index db7a3f0..f3f0fe7 100644 --- a/OPEN_THREADS.md +++ b/OPEN_THREADS.md @@ -40,9 +40,9 @@ Outstanding: --- -### Code Server Addon +## Code Server Addon -Status: ✅ Install + launch operational inside dev containers +Status: ✅ Installed and running inside dev containers Confirmed: @@ -51,62 +51,114 @@ Confirmed: - process confirmed running inside container - binds to `0.0.0.0:6000` - launched from `/opt/zlh/services/code-server` -- API now writes dev Traefik dynamic config during provisioning -- API now uses proxy SSH service account (`zlh`) instead of personal user Port: `6000` -Routing model: +--- -- DNS: Cloudflare + Technitium -- Proxy: Traefik dynamic file written by API during dev provisioning -- Host format currently in use: `dev-.zerolaghub.dev` +### Access Model (Updated) -Outstanding: +The previous approach using: -- finalize external browser reachability for code-server through Cloudflare → Traefik → container -- remove manual proxy-file edits from debugging path and ensure generated config is the sole source -- standardize hostname format everywhere (`dev-` only) -- add code-server launch link in portal -- remove dynamic Traefik file on dev container deletion +- Cloudflare DNS +- Technitium DNS +- Traefik dynamic config per container + +has been **abandoned**. + +Reason: + +- too many moving pieces +- TLS and proxy complexity +- per-container DNS automation +- unnecessary exposure of internal dev services --- -### Agent Future Work (priority order) +### New Access Strategy -1. Unified structured logging (slog) — Promtail/Loki needs structured fields -2. Dev container /status — provisioningComplete + provisioningError fields -3. Crash recovery with backoff — 30s/60s/120s, max 3 attempts, then error state -4. Graceful shutdown verification — SIGTERM + wait before SIGKILL for Minecraft -5. Agent restart/process reattachment — detect existing process on restart +Dev containers will support **two access paths**. + +#### Path 1 — Browser IDE (Primary) + +``` +Browser + ↓ +Portal + ↓ +API proxy + ↓ +container:6000 +``` + +URL format: `/dev//ide` + +Implementation requirements: + +- API proxy using `http-proxy-middleware` +- WebSocket support (`ws: true`) +- `server.on('upgrade', proxy.upgrade)` +- code-server launch args: `--base-path /dev//ide --auth none` + +Authentication handled by portal JWT. --- -## API (zlh-api) +#### Path 2 — Local Dev Access (Advanced Users) + +Direct developer access via **Headscale/Tailscale**. + +Use cases: + +- SSH +- VS Code Remote +- local development tools + +Outstanding tasks: + +- confirm `zlh-ctl` Headscale server status +- implement Tailscale addon install +- API auth key generation +- portal instructions + +Headscale constraints: + +- `magic_dns: false` +- no exit nodes +- no DNS takeover + +--- + +## Agent Future Work (priority order) + +1. Structured logging (slog) for Loki +2. Dev container provisioningComplete state +3. Crash recovery backoff +4. Graceful shutdown verification +5. Process reattachment on agent restart + +--- + +## API (zpack-api) Completed: - dev provisioning payload - runtime/version fields - enable_code_server flag -- dev-only routing hook added during provisioning -- Technitium + Cloudflare dev DNS creation -- remote Traefik dynamic file writing via proxy SSH -- proxy SSH moved to service-user model (`zlh`) -- server status endpoint added so frontend can consume agent state -- frontend status/console availability now update correctly via API polling model +- API status endpoint for frontend state Outstanding: -- runtime validation endpoint -- dev runtime catalog endpoint for portal -- remove Traefik dynamic config on dev container deletion -- domain / hostname normalization audit -- proxy/TLS generation cleanup so manual edits are no longer needed +- `/dev/:id/ide` proxy route +- websocket upgrade handling +- ownership validation before proxy +- Headscale auth key generation +- dev runtime catalog endpoint --- -## Portal (zlh-portal) +## Portal (zpack-portal) Completed: @@ -114,30 +166,12 @@ Completed: - dotnet runtime support - enable code-server checkbox - dev file browser support -- frontend now consumes API-backed status correctly for host/console state Outstanding: -- runtime list driven from catalog API -- dev port exposure UI -- code-server launch link -- clearer dev readiness states (`installing`, `starting`, `running`, `error`, etc.) - ---- - -## Artifact Server - -Completed: - -- runtime artifacts hosted -- devcontainer catalog -- runtime archive structure -- code-server compiled release artifact ✅ - -Outstanding: - -- checksum publishing -- artifact metadata support +- "Open IDE" button +- `/dev//ide` page +- Headscale setup instructions --- @@ -145,12 +179,11 @@ Outstanding: Active thread: -- complete external dev IDE access path end-to-end +- implement browser IDE proxy Future work: -- dev port routing -- dev service detection +- Tailscale dev access - artifact version promotion - runtime rollback support @@ -158,22 +191,10 @@ Future work: ## Closed Threads -- ✅ Interactive PTY-backed console (dev + game) -- ✅ WebSocket stability and PTY ownership -- ✅ Customer isolation (API + frontend) -- ✅ Agent update system (versioned, hash-verified) -- ✅ Minecraft player presence (agent-sourced) -- ✅ Game telemetry router separation (`/api/game/*`) -- ✅ Agent Phase 1 mod management endpoints -- ✅ Agent process metrics endpoint -- ✅ Minecraft readiness probe + restart race mitigation -- ✅ Modrinth resolver + full mod lifecycle -- ✅ Direct runtime upload model (no staging, no symlinks) -- ✅ `.zlh_metadata.json` provenance tracking -- ✅ Raw `http.request` streaming in API upload proxy -- ✅ Filesystem architecture docs consolidated -- ✅ Upload transport timeout tuning -- ✅ Dev container filesystem support (container-aware, /workspace root) -- ✅ Code-server artifact fix — compiled release on zlh-artifacts -- ✅ Dev routing hook added to provisioning without changing game publish flow -- ✅ API status endpoint added for frontend agent-state consumption +- ✅ PTY console (dev + game) +- ✅ Mod lifecycle +- ✅ Upload pipeline +- ✅ Runtime artifact installs +- ✅ Dev container filesystem model +- ✅ Code-server artifact fix +- ✅ API status endpoint for frontend agent-state consumption