zlh-grind/OPEN_THREADS.md

180 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Open Threads zlh-grind
This file tracks active but unfinished work.
Keep it short.
---
## Agent (zlh-agent)
### Dev Runtime System
Completed:
- catalog validation implemented
- runtime installs artifact-backed
- install guard implemented
Outstanding:
- runtime install verification improvements
- catalog hash validation
- runtime removal / upgrade handling
---
### Dev Environment
Completed:
- dev user creation
- workspace root `/home/dev/workspace`
- console runs as dev user
Outstanding:
- PATH normalization
- shell profile consistency
- runtime PATH injection
---
### Code Server Addon
Status: ✅ Install + launch operational inside dev containers
Confirmed:
- compiled release artifact fixed on `zlh-artifacts`
- install confirmed working
- process confirmed running inside container
- binds to `0.0.0.0:6000`
- launched from `/opt/zlh/services/code-server`
- API now writes dev Traefik dynamic config during provisioning
- API now uses proxy SSH service account (`zlh`) instead of personal user
Port: `6000`
Routing model:
- DNS: Cloudflare + Technitium
- Proxy: Traefik dynamic file written by API during dev provisioning
- Host format currently in use: `dev-<vmid>.zerolaghub.dev`
Outstanding:
- finalize external browser reachability for code-server through Cloudflare → Traefik → container
- remove manual proxy-file edits from debugging path and ensure generated config is the sole source
- standardize hostname format everywhere (`dev-<vmid>` only)
- add code-server launch link in portal
- remove dynamic Traefik file on dev container deletion
---
### Agent Future Work (priority order)
1. Unified structured logging (slog) — Promtail/Loki needs structured fields
2. Dev container /status — provisioningComplete + provisioningError fields
3. Crash recovery with backoff — 30s/60s/120s, max 3 attempts, then error state
4. Graceful shutdown verification — SIGTERM + wait before SIGKILL for Minecraft
5. Agent restart/process reattachment — detect existing process on restart
---
## API (zlh-api)
Completed:
- dev provisioning payload
- runtime/version fields
- enable_code_server flag
- dev-only routing hook added during provisioning
- Technitium + Cloudflare dev DNS creation
- remote Traefik dynamic file writing via proxy SSH
- proxy SSH moved to service-user model (`zlh`)
- server status endpoint added so frontend can consume agent state
- frontend status/console availability now update correctly via API polling model
Outstanding:
- runtime validation endpoint
- dev runtime catalog endpoint for portal
- remove Traefik dynamic config on dev container deletion
- domain / hostname normalization audit
- proxy/TLS generation cleanup so manual edits are no longer needed
---
## Portal (zlh-portal)
Completed:
- dev runtime dropdown
- dotnet runtime support
- enable code-server checkbox
- dev file browser support
- frontend now consumes API-backed status correctly for host/console state
Outstanding:
- runtime list driven from catalog API
- dev port exposure UI
- code-server launch link
- clearer dev readiness states (`installing`, `starting`, `running`, `error`, etc.)
---
## Artifact Server
Completed:
- runtime artifacts hosted
- devcontainer catalog
- runtime archive structure
- code-server compiled release artifact ✅
Outstanding:
- checksum publishing
- artifact metadata support
---
## Platform
Active thread:
- complete external dev IDE access path end-to-end
Future work:
- dev port routing
- dev service detection
- artifact version promotion
- runtime rollback support
---
## Closed Threads
- ✅ Interactive PTY-backed console (dev + game)
- ✅ WebSocket stability and PTY ownership
- ✅ Customer isolation (API + frontend)
- ✅ Agent update system (versioned, hash-verified)
- ✅ Minecraft player presence (agent-sourced)
- ✅ Game telemetry router separation (`/api/game/*`)
- ✅ Agent Phase 1 mod management endpoints
- ✅ Agent process metrics endpoint
- ✅ Minecraft readiness probe + restart race mitigation
- ✅ Modrinth resolver + full mod lifecycle
- ✅ Direct runtime upload model (no staging, no symlinks)
-`.zlh_metadata.json` provenance tracking
- ✅ Raw `http.request` streaming in API upload proxy
- ✅ Filesystem architecture docs consolidated
- ✅ Upload transport timeout tuning
- ✅ Dev container filesystem support (container-aware, /workspace root)
- ✅ Code-server artifact fix — compiled release on zlh-artifacts
- ✅ Dev routing hook added to provisioning without changing game publish flow
- ✅ API status endpoint added for frontend agent-state consumption