Pivot dev access — abandon Traefik/DNS, adopt API proxy + Headscale
This commit is contained in:
parent
88db6134f3
commit
d128d92d15
173
OPEN_THREADS.md
173
OPEN_THREADS.md
@ -40,9 +40,9 @@ Outstanding:
|
||||
|
||||
---
|
||||
|
||||
### Code Server Addon
|
||||
## Code Server Addon
|
||||
|
||||
Status: ✅ Install + launch operational inside dev containers
|
||||
Status: ✅ Installed and running inside dev containers
|
||||
|
||||
Confirmed:
|
||||
|
||||
@ -51,62 +51,114 @@ Confirmed:
|
||||
- process confirmed running inside container
|
||||
- binds to `0.0.0.0:6000`
|
||||
- launched from `/opt/zlh/services/code-server`
|
||||
- API now writes dev Traefik dynamic config during provisioning
|
||||
- API now uses proxy SSH service account (`zlh`) instead of personal user
|
||||
|
||||
Port: `6000`
|
||||
|
||||
Routing model:
|
||||
---
|
||||
|
||||
- DNS: Cloudflare + Technitium
|
||||
- Proxy: Traefik dynamic file written by API during dev provisioning
|
||||
- Host format currently in use: `dev-<vmid>.zerolaghub.dev`
|
||||
### Access Model (Updated)
|
||||
|
||||
Outstanding:
|
||||
The previous approach using:
|
||||
|
||||
- finalize external browser reachability for code-server through Cloudflare → Traefik → container
|
||||
- remove manual proxy-file edits from debugging path and ensure generated config is the sole source
|
||||
- standardize hostname format everywhere (`dev-<vmid>` only)
|
||||
- add code-server launch link in portal
|
||||
- remove dynamic Traefik file on dev container deletion
|
||||
- Cloudflare DNS
|
||||
- Technitium DNS
|
||||
- Traefik dynamic config per container
|
||||
|
||||
has been **abandoned**.
|
||||
|
||||
Reason:
|
||||
|
||||
- too many moving pieces
|
||||
- TLS and proxy complexity
|
||||
- per-container DNS automation
|
||||
- unnecessary exposure of internal dev services
|
||||
|
||||
---
|
||||
|
||||
### Agent Future Work (priority order)
|
||||
### New Access Strategy
|
||||
|
||||
1. Unified structured logging (slog) — Promtail/Loki needs structured fields
|
||||
2. Dev container /status — provisioningComplete + provisioningError fields
|
||||
3. Crash recovery with backoff — 30s/60s/120s, max 3 attempts, then error state
|
||||
4. Graceful shutdown verification — SIGTERM + wait before SIGKILL for Minecraft
|
||||
5. Agent restart/process reattachment — detect existing process on restart
|
||||
Dev containers will support **two access paths**.
|
||||
|
||||
#### Path 1 — Browser IDE (Primary)
|
||||
|
||||
```
|
||||
Browser
|
||||
↓
|
||||
Portal
|
||||
↓
|
||||
API proxy
|
||||
↓
|
||||
container:6000
|
||||
```
|
||||
|
||||
URL format: `/dev/<vmid>/ide`
|
||||
|
||||
Implementation requirements:
|
||||
|
||||
- API proxy using `http-proxy-middleware`
|
||||
- WebSocket support (`ws: true`)
|
||||
- `server.on('upgrade', proxy.upgrade)`
|
||||
- code-server launch args: `--base-path /dev/<vmid>/ide --auth none`
|
||||
|
||||
Authentication handled by portal JWT.
|
||||
|
||||
---
|
||||
|
||||
## API (zlh-api)
|
||||
#### Path 2 — Local Dev Access (Advanced Users)
|
||||
|
||||
Direct developer access via **Headscale/Tailscale**.
|
||||
|
||||
Use cases:
|
||||
|
||||
- SSH
|
||||
- VS Code Remote
|
||||
- local development tools
|
||||
|
||||
Outstanding tasks:
|
||||
|
||||
- confirm `zlh-ctl` Headscale server status
|
||||
- implement Tailscale addon install
|
||||
- API auth key generation
|
||||
- portal instructions
|
||||
|
||||
Headscale constraints:
|
||||
|
||||
- `magic_dns: false`
|
||||
- no exit nodes
|
||||
- no DNS takeover
|
||||
|
||||
---
|
||||
|
||||
## Agent Future Work (priority order)
|
||||
|
||||
1. Structured logging (slog) for Loki
|
||||
2. Dev container provisioningComplete state
|
||||
3. Crash recovery backoff
|
||||
4. Graceful shutdown verification
|
||||
5. Process reattachment on agent restart
|
||||
|
||||
---
|
||||
|
||||
## API (zpack-api)
|
||||
|
||||
Completed:
|
||||
|
||||
- dev provisioning payload
|
||||
- runtime/version fields
|
||||
- enable_code_server flag
|
||||
- dev-only routing hook added during provisioning
|
||||
- Technitium + Cloudflare dev DNS creation
|
||||
- remote Traefik dynamic file writing via proxy SSH
|
||||
- proxy SSH moved to service-user model (`zlh`)
|
||||
- server status endpoint added so frontend can consume agent state
|
||||
- frontend status/console availability now update correctly via API polling model
|
||||
- API status endpoint for frontend state
|
||||
|
||||
Outstanding:
|
||||
|
||||
- runtime validation endpoint
|
||||
- dev runtime catalog endpoint for portal
|
||||
- remove Traefik dynamic config on dev container deletion
|
||||
- domain / hostname normalization audit
|
||||
- proxy/TLS generation cleanup so manual edits are no longer needed
|
||||
- `/dev/:id/ide` proxy route
|
||||
- websocket upgrade handling
|
||||
- ownership validation before proxy
|
||||
- Headscale auth key generation
|
||||
- dev runtime catalog endpoint
|
||||
|
||||
---
|
||||
|
||||
## Portal (zlh-portal)
|
||||
## Portal (zpack-portal)
|
||||
|
||||
Completed:
|
||||
|
||||
@ -114,30 +166,12 @@ Completed:
|
||||
- dotnet runtime support
|
||||
- enable code-server checkbox
|
||||
- dev file browser support
|
||||
- frontend now consumes API-backed status correctly for host/console state
|
||||
|
||||
Outstanding:
|
||||
|
||||
- runtime list driven from catalog API
|
||||
- dev port exposure UI
|
||||
- code-server launch link
|
||||
- clearer dev readiness states (`installing`, `starting`, `running`, `error`, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Artifact Server
|
||||
|
||||
Completed:
|
||||
|
||||
- runtime artifacts hosted
|
||||
- devcontainer catalog
|
||||
- runtime archive structure
|
||||
- code-server compiled release artifact ✅
|
||||
|
||||
Outstanding:
|
||||
|
||||
- checksum publishing
|
||||
- artifact metadata support
|
||||
- "Open IDE" button
|
||||
- `/dev/<vmid>/ide` page
|
||||
- Headscale setup instructions
|
||||
|
||||
---
|
||||
|
||||
@ -145,12 +179,11 @@ Outstanding:
|
||||
|
||||
Active thread:
|
||||
|
||||
- complete external dev IDE access path end-to-end
|
||||
- implement browser IDE proxy
|
||||
|
||||
Future work:
|
||||
|
||||
- dev port routing
|
||||
- dev service detection
|
||||
- Tailscale dev access
|
||||
- artifact version promotion
|
||||
- runtime rollback support
|
||||
|
||||
@ -158,22 +191,10 @@ Future work:
|
||||
|
||||
## Closed Threads
|
||||
|
||||
- ✅ Interactive PTY-backed console (dev + game)
|
||||
- ✅ WebSocket stability and PTY ownership
|
||||
- ✅ Customer isolation (API + frontend)
|
||||
- ✅ Agent update system (versioned, hash-verified)
|
||||
- ✅ Minecraft player presence (agent-sourced)
|
||||
- ✅ Game telemetry router separation (`/api/game/*`)
|
||||
- ✅ Agent Phase 1 mod management endpoints
|
||||
- ✅ Agent process metrics endpoint
|
||||
- ✅ Minecraft readiness probe + restart race mitigation
|
||||
- ✅ Modrinth resolver + full mod lifecycle
|
||||
- ✅ Direct runtime upload model (no staging, no symlinks)
|
||||
- ✅ `.zlh_metadata.json` provenance tracking
|
||||
- ✅ Raw `http.request` streaming in API upload proxy
|
||||
- ✅ Filesystem architecture docs consolidated
|
||||
- ✅ Upload transport timeout tuning
|
||||
- ✅ Dev container filesystem support (container-aware, /workspace root)
|
||||
- ✅ Code-server artifact fix — compiled release on zlh-artifacts
|
||||
- ✅ Dev routing hook added to provisioning without changing game publish flow
|
||||
- ✅ API status endpoint added for frontend agent-state consumption
|
||||
- ✅ PTY console (dev + game)
|
||||
- ✅ Mod lifecycle
|
||||
- ✅ Upload pipeline
|
||||
- ✅ Runtime artifact installs
|
||||
- ✅ Dev container filesystem model
|
||||
- ✅ Code-server artifact fix
|
||||
- ✅ API status endpoint for frontend agent-state consumption
|
||||
|
||||
Loading…
Reference in New Issue
Block a user