200 lines
4.9 KiB
Markdown
200 lines
4.9 KiB
Markdown
# Open Threads – zlh-grind
|
||
|
||
This file tracks active but unfinished work.
|
||
|
||
Keep it short.
|
||
|
||
---
|
||
|
||
## Agent (zlh-agent)
|
||
|
||
### Dev Runtime System
|
||
|
||
Completed:
|
||
|
||
- catalog validation implemented
|
||
- runtime installs artifact-backed
|
||
- install guard implemented
|
||
- all installs now fetch from artifact server (no local artifact assumption)
|
||
|
||
Outstanding:
|
||
|
||
- runtime install verification improvements
|
||
- catalog hash validation
|
||
- runtime removal / upgrade handling
|
||
|
||
---
|
||
|
||
### Dev Environment
|
||
|
||
Completed:
|
||
|
||
- dev user creation
|
||
- workspace root `/home/dev/workspace`
|
||
- console runs as dev user
|
||
- `HOME`, `USER`, `LOGNAME`, `TERM` env vars set correctly
|
||
|
||
Outstanding:
|
||
|
||
- PATH normalization
|
||
- shell profile consistency
|
||
- runtime PATH injection
|
||
|
||
---
|
||
|
||
### Code Server Addon
|
||
|
||
Status: ✅ Installed, running, and proxied through API
|
||
|
||
Confirmed:
|
||
|
||
- pulled from artifact server (tar.gz)
|
||
- installed to `/opt/zlh/services/code-server`
|
||
- binds to `0.0.0.0:8080`
|
||
- lifecycle endpoints: `POST /dev/codeserver/start|stop|restart`
|
||
- detection via `/proc/*/cmdline` scan
|
||
- browser IDE fully working end-to-end via API proxy
|
||
|
||
---
|
||
|
||
### Game Server Supervision
|
||
|
||
Completed:
|
||
|
||
- crash recovery with backoff: 30s → 60s → 120s
|
||
- backoff resets if uptime ≥ 30s
|
||
- transitions to `error` state after repeated failures
|
||
- crash observability: time, exit code, signal, uptime, log tail, classification
|
||
- classifications: `oom`, `mod_or_plugin_error`, `missing_dependency`, `nonzero_exit`, `unexpected_exit`
|
||
|
||
---
|
||
|
||
### Agent Future Work (priority order)
|
||
|
||
1. Structured logging (slog) for Loki
|
||
2. Dev container `provisioningComplete` state in `/status`
|
||
3. Graceful shutdown verification (SIGTERM + wait for Minecraft)
|
||
4. Process reattachment on agent restart
|
||
|
||
---
|
||
|
||
## Dev IDE Access
|
||
|
||
### Browser IDE ✅ Working (path-based)
|
||
|
||
```
|
||
Browser → Portal → API (bootstrap) → /__ide/:id/* → container:8080
|
||
```
|
||
|
||
Working flow:
|
||
|
||
1. frontend calls `POST /api/dev/:id/ide-token`
|
||
2. API returns `/api/dev/:id/ide?token=...`
|
||
3. frontend opens that URL in new tab
|
||
4. bootstrap route validates token, sets HTTP-only IDE cookie, redirects to `/__ide/:id/`
|
||
5. all live code-server HTTP + WS traffic proxied through `/__ide/:id/*`
|
||
6. API proxies to `http://<container-ip>:8080`
|
||
|
||
### Host-based IDE URL — Caddy edge (BLOCKED)
|
||
|
||
Goal: open IDE on `dev-<vmid>.zerolaghub.dev` instead of raw API IP.
|
||
|
||
```
|
||
Browser → dev-6070.zerolaghub.dev → Caddy → 127.0.0.1:4000 → API
|
||
```
|
||
|
||
State:
|
||
- API env vars set: `DEV_IDE_HOST_SUFFIX=zerolaghub.dev`, `DEV_IDE_RETURN_HOSTED_URL=true`
|
||
- API generating correct absolute URL: `http://dev-6070.zerolaghub.dev/?token=...`
|
||
- Caddyfile block correct:
|
||
```
|
||
http://dev-*.zerolaghub.dev {
|
||
@dev host dev-*.zerolaghub.dev
|
||
reverse_proxy @dev 127.0.0.1:4000
|
||
}
|
||
```
|
||
- `auto_https off` global option added
|
||
|
||
Blocking issue: browser HSTS cache forces `zerolaghub.dev` subdomains to HTTPS
|
||
regardless of Caddy config. Need to clear Chrome HSTS cache:
|
||
- `chrome://net-internals/#hsts`
|
||
- Delete `zerolaghub.dev` and `dev-6070.zerolaghub.dev`
|
||
|
||
Resume here next session.
|
||
|
||
### Local Dev Access (Headscale/Tailscale — Future)
|
||
|
||
Outstanding:
|
||
|
||
- confirm `zlh-ctl` Headscale server status
|
||
- implement Tailscale addon install in agent
|
||
- API auth key generation
|
||
- portal setup instructions
|
||
|
||
Constraints: `magic_dns: false`, no exit nodes, no DNS takeover
|
||
|
||
---
|
||
|
||
## API (zpack-api)
|
||
|
||
Completed:
|
||
|
||
- dev provisioning payload
|
||
- runtime/version fields
|
||
- enable_code_server flag
|
||
- `GET /api/servers/:id/status` — server status endpoint
|
||
- `POST /api/dev/:id/ide-token` — IDE token generation
|
||
- `GET /api/dev/:id/ide` — bootstrap route (validates token, sets cookie, redirects)
|
||
- `/__ide/:id/*` — live tunnel proxy (HTTP + WS, target-bound)
|
||
- dev routing experiment removed (`devRouting.js`, `devDePublisher.js` deleted)
|
||
- host-based URL generation (`DEV_IDE_HOST_SUFFIX`, `DEV_IDE_RETURN_HOSTED_URL`)
|
||
|
||
Outstanding:
|
||
|
||
- dev runtime catalog endpoint for portal
|
||
- Headscale auth key generation
|
||
|
||
---
|
||
|
||
## Portal (zpack-portal)
|
||
|
||
Completed:
|
||
|
||
- dev runtime dropdown
|
||
- dotnet runtime support
|
||
- enable code-server checkbox
|
||
- dev file browser support
|
||
|
||
Outstanding:
|
||
|
||
- "Open IDE" button — calls `POST /api/dev/:id/ide-token`, opens returned URL in new tab
|
||
- Headscale setup instructions
|
||
|
||
---
|
||
|
||
## Platform
|
||
|
||
Future work:
|
||
|
||
- Tailscale dev access
|
||
- artifact version promotion
|
||
- runtime rollback support
|
||
|
||
---
|
||
|
||
## Closed Threads
|
||
|
||
- ✅ PTY console (dev + game)
|
||
- ✅ Mod lifecycle
|
||
- ✅ Upload pipeline
|
||
- ✅ Runtime artifact installs
|
||
- ✅ Dev container filesystem model
|
||
- ✅ Code-server artifact fix
|
||
- ✅ API status endpoint for frontend agent-state consumption
|
||
- ✅ Dev DNS/Traefik routing experiment — removed
|
||
- ✅ Game server crash recovery with backoff
|
||
- ✅ Crash observability (classification, log tail, exit metadata)
|
||
- ✅ Code-server lifecycle endpoints (start/stop/restart)
|
||
- ✅ Code-server process detection via /proc scan
|
||
- ✅ Dev IDE proxy — browser IDE fully working end-to-end (path-based)
|