Update PROJECT_CONTEXT 2026-03-17 — agent enhancements, base-path blocking issue, crash recovery

This commit is contained in:
jester 2026-03-17 23:06:52 +00:00
parent 8c8af5ff62
commit 525366c5df

View File

@ -75,6 +75,9 @@ access. Pulls runtimes + server jars from zlh-artifacts (VM 1003).
- Provenance: `.zlh_metadata.json` — source is `null` if not set
- Status transport model: poll-based (`/status`), not push-based
- State transitions: `idle`, `installing`, `starting`, `running`, `stopping`, `crashed`, `error`
- Crash recovery: backoff 30s/60s/120s, resets if uptime ≥ 30s, `error` state after repeated failures
- Crash observability: exit code, signal, uptime, log tail, classification (oom/mod_error/missing_dep/nonzero/unexpected)
- Structured logging across provisioning, installs, file ops, control plane
---
@ -85,36 +88,31 @@ access. Pulls runtimes + server jars from zlh-artifacts (VM 1003).
- runtime root: `/opt/zlh/runtimes/<runtime>/<version>`
- dev identity: `dev:dev`
- workspace root: `/home/dev/workspace`
- shell env: `HOME`, `USER`, `LOGNAME`, `TERM` set correctly
- code-server install path: `/opt/zlh/services/code-server`
- code-server port: `6000`
- code-server lifecycle: `POST /dev/codeserver/start|stop|restart`
- code-server detection: `/proc/*/cmdline` scan
- agent port: `18888`
Confirmed:
- code-server process launches and binds to `0.0.0.0:6000`
- frontend host/console state updates correctly via API status endpoint
**Pending agent change:** code-server must be relaunched with `--auth none --base-path /api/dev/<vmid>/ide`
**Current blocking issue:** code-server missing `--base-path /api/dev/<vmid>/ide`
in launch args. Causes WS 1006, filesystem provider failure, extension host crash.
Fix is one line in the agent launch script.
---
## Dev Container Access Model
### Browser IDE (Implemented)
### Browser IDE (API implemented, agent fix pending)
```
Browser
Portal
API proxy (/api/dev/:id/ide)
container:6000
Browser → Portal → API (/api/dev/:id/ide) → container:6000
```
Portal calls `POST /api/dev/:id/ide-token` first, then opens the returned URL in a new tab. Token is short-lived (300s), signed by API. Proxy accepts token via `Authorization: Bearer` or `?token=` query param. WebSocket upgrades validated with same token.
Containers are never publicly exposed.
Portal calls `POST /api/dev/:id/ide-token`, opens returned URL in new tab.
Token TTL: 300s. Proxy accepts `Authorization: Bearer` or `?token=`.
WebSocket upgrades validated with same token.
Containers never publicly exposed.
### Local Developer Access (Future)
@ -124,8 +122,8 @@ Constraints: no exit nodes, `magic_dns: false`.
### Removed
DNS-per-container + Traefik dynamic routing approach was abandoned.
Code removed from API: `devRouting.js`, `devDePublisher.js`, Traefik file writes.
DNS-per-container + Traefik dynamic routing abandoned.
Removed from API: `devRouting.js`, `devDePublisher.js`, Traefik file writes.
`proxyClient.js` retained — still used by game edge publish path.
---
@ -176,8 +174,8 @@ Revenue multiplier: 1 developer → ~10 players → $147.50/mo total.
## Open Threads
1. Agent: update code-server launch args (`--auth none`, `--base-path /api/dev/<vmid>/ide`)
2. Portal: "Open IDE" button calling `/api/dev/:id/ide-token`
1. **Agent:** fix code-server `--base-path /api/dev/<vmid>/ide` — unblocks IDE
2. **Portal:** "Open IDE" button calling `/api/dev/:id/ide-token`
3. Confirm Headscale `zlh-ctl` VM status
4. Curated provenance — tracking install origin