180 lines
4.5 KiB
Markdown
180 lines
4.5 KiB
Markdown
# Open Threads – zlh-grind
|
||
|
||
This file tracks active but unfinished work.
|
||
|
||
Keep it short.
|
||
|
||
---
|
||
|
||
## Agent (zlh-agent)
|
||
|
||
### Dev Runtime System
|
||
|
||
Completed:
|
||
|
||
- catalog validation implemented
|
||
- runtime installs artifact-backed
|
||
- install guard implemented
|
||
|
||
Outstanding:
|
||
|
||
- runtime install verification improvements
|
||
- catalog hash validation
|
||
- runtime removal / upgrade handling
|
||
|
||
---
|
||
|
||
### Dev Environment
|
||
|
||
Completed:
|
||
|
||
- dev user creation
|
||
- workspace root `/home/dev/workspace`
|
||
- console runs as dev user
|
||
|
||
Outstanding:
|
||
|
||
- PATH normalization
|
||
- shell profile consistency
|
||
- runtime PATH injection
|
||
|
||
---
|
||
|
||
### Code Server Addon
|
||
|
||
Status: ✅ Install + launch operational inside dev containers
|
||
|
||
Confirmed:
|
||
|
||
- compiled release artifact fixed on `zlh-artifacts`
|
||
- install confirmed working
|
||
- process confirmed running inside container
|
||
- binds to `0.0.0.0:6000`
|
||
- launched from `/opt/zlh/services/code-server`
|
||
- API now writes dev Traefik dynamic config during provisioning
|
||
- API now uses proxy SSH service account (`zlh`) instead of personal user
|
||
|
||
Port: `6000`
|
||
|
||
Routing model:
|
||
|
||
- DNS: Cloudflare + Technitium
|
||
- Proxy: Traefik dynamic file written by API during dev provisioning
|
||
- Host format currently in use: `dev-<vmid>.zerolaghub.dev`
|
||
|
||
Outstanding:
|
||
|
||
- finalize external browser reachability for code-server through Cloudflare → Traefik → container
|
||
- remove manual proxy-file edits from debugging path and ensure generated config is the sole source
|
||
- standardize hostname format everywhere (`dev-<vmid>` only)
|
||
- add code-server launch link in portal
|
||
- remove dynamic Traefik file on dev container deletion
|
||
|
||
---
|
||
|
||
### Agent Future Work (priority order)
|
||
|
||
1. Unified structured logging (slog) — Promtail/Loki needs structured fields
|
||
2. Dev container /status — provisioningComplete + provisioningError fields
|
||
3. Crash recovery with backoff — 30s/60s/120s, max 3 attempts, then error state
|
||
4. Graceful shutdown verification — SIGTERM + wait before SIGKILL for Minecraft
|
||
5. Agent restart/process reattachment — detect existing process on restart
|
||
|
||
---
|
||
|
||
## API (zlh-api)
|
||
|
||
Completed:
|
||
|
||
- dev provisioning payload
|
||
- runtime/version fields
|
||
- enable_code_server flag
|
||
- dev-only routing hook added during provisioning
|
||
- Technitium + Cloudflare dev DNS creation
|
||
- remote Traefik dynamic file writing via proxy SSH
|
||
- proxy SSH moved to service-user model (`zlh`)
|
||
- server status endpoint added so frontend can consume agent state
|
||
- frontend status/console availability now update correctly via API polling model
|
||
|
||
Outstanding:
|
||
|
||
- runtime validation endpoint
|
||
- dev runtime catalog endpoint for portal
|
||
- remove Traefik dynamic config on dev container deletion
|
||
- domain / hostname normalization audit
|
||
- proxy/TLS generation cleanup so manual edits are no longer needed
|
||
|
||
---
|
||
|
||
## Portal (zlh-portal)
|
||
|
||
Completed:
|
||
|
||
- dev runtime dropdown
|
||
- dotnet runtime support
|
||
- enable code-server checkbox
|
||
- dev file browser support
|
||
- frontend now consumes API-backed status correctly for host/console state
|
||
|
||
Outstanding:
|
||
|
||
- runtime list driven from catalog API
|
||
- dev port exposure UI
|
||
- code-server launch link
|
||
- clearer dev readiness states (`installing`, `starting`, `running`, `error`, etc.)
|
||
|
||
---
|
||
|
||
## Artifact Server
|
||
|
||
Completed:
|
||
|
||
- runtime artifacts hosted
|
||
- devcontainer catalog
|
||
- runtime archive structure
|
||
- code-server compiled release artifact ✅
|
||
|
||
Outstanding:
|
||
|
||
- checksum publishing
|
||
- artifact metadata support
|
||
|
||
---
|
||
|
||
## Platform
|
||
|
||
Active thread:
|
||
|
||
- complete external dev IDE access path end-to-end
|
||
|
||
Future work:
|
||
|
||
- dev port routing
|
||
- dev service detection
|
||
- artifact version promotion
|
||
- runtime rollback support
|
||
|
||
---
|
||
|
||
## Closed Threads
|
||
|
||
- ✅ Interactive PTY-backed console (dev + game)
|
||
- ✅ WebSocket stability and PTY ownership
|
||
- ✅ Customer isolation (API + frontend)
|
||
- ✅ Agent update system (versioned, hash-verified)
|
||
- ✅ Minecraft player presence (agent-sourced)
|
||
- ✅ Game telemetry router separation (`/api/game/*`)
|
||
- ✅ Agent Phase 1 mod management endpoints
|
||
- ✅ Agent process metrics endpoint
|
||
- ✅ Minecraft readiness probe + restart race mitigation
|
||
- ✅ Modrinth resolver + full mod lifecycle
|
||
- ✅ Direct runtime upload model (no staging, no symlinks)
|
||
- ✅ `.zlh_metadata.json` provenance tracking
|
||
- ✅ Raw `http.request` streaming in API upload proxy
|
||
- ✅ Filesystem architecture docs consolidated
|
||
- ✅ Upload transport timeout tuning
|
||
- ✅ Dev container filesystem support (container-aware, /workspace root)
|
||
- ✅ Code-server artifact fix — compiled release on zlh-artifacts
|
||
- ✅ Dev routing hook added to provisioning without changing game publish flow
|
||
- ✅ API status endpoint added for frontend agent-state consumption
|