zlh-grind/OPEN_THREADS.md

200 lines
4.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Open Threads zlh-grind
This file tracks active but unfinished work.
Keep it short.
---
## Agent (zlh-agent)
### Dev Runtime System
Completed:
- catalog validation implemented
- runtime installs artifact-backed
- install guard implemented
- all installs now fetch from artifact server (no local artifact assumption)
Outstanding:
- runtime install verification improvements
- catalog hash validation
- runtime removal / upgrade handling
---
### Dev Environment
Completed:
- dev user creation
- workspace root `/home/dev/workspace`
- console runs as dev user
- `HOME`, `USER`, `LOGNAME`, `TERM` env vars set correctly
Outstanding:
- PATH normalization
- shell profile consistency
- runtime PATH injection
---
### Code Server Addon
Status: ✅ Installed, running, and proxied through API
Confirmed:
- pulled from artifact server (tar.gz)
- installed to `/opt/zlh/services/code-server`
- binds to `0.0.0.0:8080`
- lifecycle endpoints: `POST /dev/codeserver/start|stop|restart`
- detection via `/proc/*/cmdline` scan
- browser IDE fully working end-to-end via API proxy
---
### Game Server Supervision
Completed:
- crash recovery with backoff: 30s → 60s → 120s
- backoff resets if uptime ≥ 30s
- transitions to `error` state after repeated failures
- crash observability: time, exit code, signal, uptime, log tail, classification
- classifications: `oom`, `mod_or_plugin_error`, `missing_dependency`, `nonzero_exit`, `unexpected_exit`
---
### Agent Future Work (priority order)
1. Structured logging (slog) for Loki
2. Dev container `provisioningComplete` state in `/status`
3. Graceful shutdown verification (SIGTERM + wait for Minecraft)
4. Process reattachment on agent restart
---
## Dev IDE Access
### Browser IDE ✅ Working (path-based)
```
Browser → Portal → API (bootstrap) → /__ide/:id/* → container:8080
```
Working flow:
1. frontend calls `POST /api/dev/:id/ide-token`
2. API returns `/api/dev/:id/ide?token=...`
3. frontend opens that URL in new tab
4. bootstrap route validates token, sets HTTP-only IDE cookie, redirects to `/__ide/:id/`
5. all live code-server HTTP + WS traffic proxied through `/__ide/:id/*`
6. API proxies to `http://<container-ip>:8080`
### Host-based IDE URL — Caddy edge (BLOCKED)
Goal: open IDE on `dev-<vmid>.zerolaghub.dev` instead of raw API IP.
```
Browser → dev-6070.zerolaghub.dev → Caddy → 127.0.0.1:4000 → API
```
State:
- API env vars set: `DEV_IDE_HOST_SUFFIX=zerolaghub.dev`, `DEV_IDE_RETURN_HOSTED_URL=true`
- API generating correct absolute URL: `http://dev-6070.zerolaghub.dev/?token=...`
- Caddyfile block correct:
```
http://dev-*.zerolaghub.dev {
@dev host dev-*.zerolaghub.dev
reverse_proxy @dev 127.0.0.1:4000
}
```
- `auto_https off` global option added
Blocking issue: browser HSTS cache forces `zerolaghub.dev` subdomains to HTTPS
regardless of Caddy config. Need to clear Chrome HSTS cache:
- `chrome://net-internals/#hsts`
- Delete `zerolaghub.dev` and `dev-6070.zerolaghub.dev`
Resume here next session.
### Local Dev Access (Headscale/Tailscale — Future)
Outstanding:
- confirm `zlh-ctl` Headscale server status
- implement Tailscale addon install in agent
- API auth key generation
- portal setup instructions
Constraints: `magic_dns: false`, no exit nodes, no DNS takeover
---
## API (zpack-api)
Completed:
- dev provisioning payload
- runtime/version fields
- enable_code_server flag
- `GET /api/servers/:id/status` — server status endpoint
- `POST /api/dev/:id/ide-token` — IDE token generation
- `GET /api/dev/:id/ide` — bootstrap route (validates token, sets cookie, redirects)
- `/__ide/:id/*` — live tunnel proxy (HTTP + WS, target-bound)
- dev routing experiment removed (`devRouting.js`, `devDePublisher.js` deleted)
- host-based URL generation (`DEV_IDE_HOST_SUFFIX`, `DEV_IDE_RETURN_HOSTED_URL`)
Outstanding:
- dev runtime catalog endpoint for portal
- Headscale auth key generation
---
## Portal (zpack-portal)
Completed:
- dev runtime dropdown
- dotnet runtime support
- enable code-server checkbox
- dev file browser support
Outstanding:
- "Open IDE" button — calls `POST /api/dev/:id/ide-token`, opens returned URL in new tab
- Headscale setup instructions
---
## Platform
Future work:
- Tailscale dev access
- artifact version promotion
- runtime rollback support
---
## Closed Threads
- ✅ PTY console (dev + game)
- ✅ Mod lifecycle
- ✅ Upload pipeline
- ✅ Runtime artifact installs
- ✅ Dev container filesystem model
- ✅ Code-server artifact fix
- ✅ API status endpoint for frontend agent-state consumption
- ✅ Dev DNS/Traefik routing experiment — removed
- ✅ Game server crash recovery with backoff
- ✅ Crash observability (classification, log tail, exit metadata)
- ✅ Code-server lifecycle endpoints (start/stop/restart)
- ✅ Code-server process detection via /proc scan
- ✅ Dev IDE proxy — browser IDE fully working end-to-end (path-based)