diff --git a/PROJECT_CONTEXT.md b/PROJECT_CONTEXT.md index 48e95cf..b810f78 100644 --- a/PROJECT_CONTEXT.md +++ b/PROJECT_CONTEXT.md @@ -75,6 +75,9 @@ access. Pulls runtimes + server jars from zlh-artifacts (VM 1003). - Provenance: `.zlh_metadata.json` — source is `null` if not set - Status transport model: poll-based (`/status`), not push-based - State transitions: `idle`, `installing`, `starting`, `running`, `stopping`, `crashed`, `error` +- Crash recovery: backoff 30s/60s/120s, resets if uptime ≥ 30s, `error` state after repeated failures +- Crash observability: exit code, signal, uptime, log tail, classification (oom/mod_error/missing_dep/nonzero/unexpected) +- Structured logging across provisioning, installs, file ops, control plane --- @@ -85,36 +88,31 @@ access. Pulls runtimes + server jars from zlh-artifacts (VM 1003). - runtime root: `/opt/zlh/runtimes//` - dev identity: `dev:dev` - workspace root: `/home/dev/workspace` +- shell env: `HOME`, `USER`, `LOGNAME`, `TERM` set correctly - code-server install path: `/opt/zlh/services/code-server` - code-server port: `6000` +- code-server lifecycle: `POST /dev/codeserver/start|stop|restart` +- code-server detection: `/proc/*/cmdline` scan - agent port: `18888` -Confirmed: - -- code-server process launches and binds to `0.0.0.0:6000` -- frontend host/console state updates correctly via API status endpoint - -**Pending agent change:** code-server must be relaunched with `--auth none --base-path /api/dev//ide` +**Current blocking issue:** code-server missing `--base-path /api/dev//ide` +in launch args. Causes WS 1006, filesystem provider failure, extension host crash. +Fix is one line in the agent launch script. --- ## Dev Container Access Model -### Browser IDE (Implemented) +### Browser IDE (API implemented, agent fix pending) ``` -Browser - ↓ -Portal - ↓ -API proxy (/api/dev/:id/ide) - ↓ -container:6000 +Browser → Portal → API (/api/dev/:id/ide) → container:6000 ``` -Portal calls `POST /api/dev/:id/ide-token` first, then opens the returned URL in a new tab. Token is short-lived (300s), signed by API. Proxy accepts token via `Authorization: Bearer` or `?token=` query param. WebSocket upgrades validated with same token. - -Containers are never publicly exposed. +Portal calls `POST /api/dev/:id/ide-token`, opens returned URL in new tab. +Token TTL: 300s. Proxy accepts `Authorization: Bearer` or `?token=`. +WebSocket upgrades validated with same token. +Containers never publicly exposed. ### Local Developer Access (Future) @@ -124,8 +122,8 @@ Constraints: no exit nodes, `magic_dns: false`. ### Removed -DNS-per-container + Traefik dynamic routing approach was abandoned. -Code removed from API: `devRouting.js`, `devDePublisher.js`, Traefik file writes. +DNS-per-container + Traefik dynamic routing abandoned. +Removed from API: `devRouting.js`, `devDePublisher.js`, Traefik file writes. `proxyClient.js` retained — still used by game edge publish path. --- @@ -176,8 +174,8 @@ Revenue multiplier: 1 developer → ~10 players → $147.50/mo total. ## Open Threads -1. Agent: update code-server launch args (`--auth none`, `--base-path /api/dev//ide`) -2. Portal: "Open IDE" button calling `/api/dev/:id/ide-token` +1. **Agent:** fix code-server `--base-path /api/dev//ide` — unblocks IDE +2. **Portal:** "Open IDE" button calling `/api/dev/:id/ide-token` 3. Confirm Headscale `zlh-ctl` VM status 4. Curated provenance — tracking install origin