4.7 KiB
4.7 KiB
Open Threads – zlh-grind
This file tracks active but unfinished work.
Keep it short.
Agent (zlh-agent)
Dev Runtime System
Completed:
- catalog validation implemented
- runtime installs artifact-backed
- install guard implemented
- all installs now fetch from artifact server (no local artifact assumption)
Outstanding:
- runtime install verification improvements
- catalog hash validation
- runtime removal / upgrade handling
Dev Environment
Completed:
- dev user creation
- workspace root
/home/dev/workspace - console runs as dev user
HOME,USER,LOGNAME,TERMenv vars set correctly
Outstanding:
- PATH normalization
- shell profile consistency
- runtime PATH injection
Code Server Addon
Status: ✅ Installed, running, and proxied through API
Confirmed:
- pulled from artifact server (tar.gz)
- installed to
/opt/zlh/services/code-server - binds to
0.0.0.0:8080 - lifecycle endpoints:
POST /dev/codeserver/start|stop|restart - detection via
/proc/*/cmdlinescan - browser IDE fully working end-to-end via API proxy
Game Server Supervision
Completed:
- crash recovery with backoff: 30s → 60s → 120s
- backoff resets if uptime ≥ 30s
- transitions to
errorstate after repeated failures - crash observability: time, exit code, signal, uptime, log tail, classification
- classifications:
oom,mod_or_plugin_error,missing_dependency,nonzero_exit,unexpected_exit
Agent Future Work (priority order)
- Structured logging (slog) for Loki
- Dev container
provisioningCompletestate in/status - Graceful shutdown verification (SIGTERM + wait for Minecraft)
- Process reattachment on agent restart
Dev IDE Access
Browser IDE ✅ Working
Browser → Portal → API (bootstrap) → /__ide/:id/* → container:8080
Working flow:
- frontend calls
POST /api/dev/:id/ide-token - API returns
/api/dev/:id/ide?token=... - frontend opens that URL in new tab
- bootstrap route validates token, sets HTTP-only IDE cookie, redirects to
/__ide/:id/ - all live code-server HTTP + WS traffic proxied through
/__ide/:id/* - API proxies to
http://<container-ip>:8080
Key fixes that made it work:
- token bootstrap fixed new-tab auth loss
/__ide/:idtunnel separated from bootstrap to avoid API route interference- upstream port corrected to
8080(Chrome blocks6000as unsafe) Hostheader changed to pass browser host (req.headers.host) not container hostOriginoverride removed — browser origin passed through only when present- WS proxy separated from shared HTTP proxy — built target-bound WS proxy at upgrade time
- target-bound WS eliminated
ECONNREFUSED 127.0.0.1:8080fallback bug
Current state:
- browser still sees API host/IP until portal is behind a proper domain/reverse proxy
- host-based
dev-<vmid>.zlh.devsupport started but reverted — bootstrap path is canonical
Local Dev Access (Headscale/Tailscale — Future)
Outstanding:
- confirm
zlh-ctlHeadscale server status - implement Tailscale addon install in agent
- API auth key generation
- portal setup instructions
Constraints: magic_dns: false, no exit nodes, no DNS takeover
API (zpack-api)
Completed:
- dev provisioning payload
- runtime/version fields
- enable_code_server flag
GET /api/servers/:id/status— server status endpointPOST /api/dev/:id/ide-token— IDE token generationGET /api/dev/:id/ide— bootstrap route (validates token, sets cookie, redirects)/__ide/:id/*— live tunnel proxy (HTTP + WS, target-bound)- dev routing experiment removed (
devRouting.js,devDePublisher.jsdeleted)
Outstanding:
- dev runtime catalog endpoint for portal
- Headscale auth key generation
Portal (zpack-portal)
Completed:
- dev runtime dropdown
- dotnet runtime support
- enable code-server checkbox
- dev file browser support
Outstanding:
- "Open IDE" button — calls
POST /api/dev/:id/ide-token, opens returned URL in new tab - Headscale setup instructions
Platform
Future work:
- Tailscale dev access
- artifact version promotion
- runtime rollback support
Closed Threads
- ✅ PTY console (dev + game)
- ✅ Mod lifecycle
- ✅ Upload pipeline
- ✅ Runtime artifact installs
- ✅ Dev container filesystem model
- ✅ Code-server artifact fix
- ✅ API status endpoint for frontend agent-state consumption
- ✅ Dev DNS/Traefik routing experiment — removed
- ✅ Game server crash recovery with backoff
- ✅ Crash observability (classification, log tail, exit metadata)
- ✅ Code-server lifecycle endpoints (start/stop/restart)
- ✅ Code-server process detection via /proc scan
- ✅ Dev IDE proxy — browser IDE fully working end-to-end