zlh-grind/Codex/API/CURRENT_STATE.md

151 lines
12 KiB
Markdown

# API — Current State
This file records what is believed to be implemented now.
## Runtime / dependency baseline
- API is now tracked against Node 24 with repo-local pinning via `package.json` engines and `.nvmrc`.
- Direct `node-fetch` dependency has been removed and API code now uses built-in global `fetch`.
- Dependency / audit cleanup has been performed and the reported audit state is clean.
- Prisma config has been migrated out of deprecated `package.json#prisma` into `prisma.config.ts`.
- Prisma generate / validate checks reportedly pass on the current API baseline.
## Route and lifecycle split
- `/api/instances` and `/api/containers` are different operational surfaces and should not be treated as duplicates.
- `GET /api/instances` lists `ContainerInstance` rows and is still used for lookup/discovery by some Portal-side code paths.
- `POST /api/instances` is the active agent-driven provisioning entrypoint.
- `DELETE /api/containers/:vmid` is the cleanup/delete/orphan-remediation route used for failed creates, agent failures, manual deletions, and other orphaned container cases.
- Current delete cleanup has been updated to derive archive metadata from current instance fields (`payload`, `engineType`, `allocatedPorts`) instead of assuming the pre-v2 `game` / `variant` / `ports` fields are still present on `ContainerInstance`.
## Readiness / agent state model
- API is the heartbeat authority by polling agents.
- Agent does not push state to API.
- API consumes:
- `/health` for liveness
- `/ready` for semantic readiness
- `/status` for detailed state snapshot
- Portal should rely on API-normalized state, not direct agent state.
- Proxy lifecycle is now tracked separately from agent readiness under `ContainerInstance.payload.proxy`.
- `ContainerInstance.payload.edge` and `ContainerInstance.payload.proxy` are now merged through an atomic JSON merge helper so edge publish state and Velocity callback state do not clobber each other during near-simultaneous writes.
## Readiness cleanup already done
- `agentClient.js` centralizes non-streaming agent transport.
- `getAgentReady()` remains low-level transport.
- `isAgentReadyResult()` is the shared semantic readiness helper.
- `assertAgentReady()` uses semantic readiness.
- Poller only caches `ready: true` when `/ready` returns semantic success.
- Provisioning requires semantic readiness before success/persist/publish.
- Timeout handling in `agentClient.js` has been modernized to `AbortSignal.timeout(...)`.
- A final alignment pass is still needed in merged live status so any Portal-facing `agentReady` field fully matches semantic `/ready` rather than looser cache presence.
## Provisioning / post-provision flow
- `src/api/provisionAgent.js` is the active provisioning path.
- The older worker-based provisioning chain has been archived for reference and is no longer treated as a live API path.
- Post-provision edge publishing is still active.
- Post-provision behavior has been adjusted away from legacy port-allocation commits and should be understood as request-driven / payload-driven.
## Velocity / Minecraft edge lifecycle
- Minecraft edge publish uses Velocity instead of Traefik TCP config.
- API registers Minecraft backends with the Velocity bridge using `POST /zpack/register` and unregisters with `POST /zpack/unregister`.
- Velocity registration verification now uses the bridge's real `GET /zpack/status` route instead of the nonexistent `/zpack/list` route.
- `/zpack/status` is expected to expose a `servers` array containing `{ name, address, port }` entries.
- If `/zpack/status` exists but does not expose registered backends, API treats the register response as acknowledged but unverifiable instead of falsely failing the registration.
- API exposes `POST /internal/velocity/proxy-status` for bridge lifecycle callbacks using the same hashed `X-Zpack-Secret` shared-secret auth style.
- Accepted proxy lifecycle statuses are `registered_with_proxy`, `proxy_ping_ok`, and `proxy_ping_failed`.
- Proxy lifecycle callbacks are stored under `ContainerInstance.payload.proxy` with source, status, server name, address, port, duplicate flag, timestamps, latency, detail, and last event time.
- `GET /api/servers/:id/status` now includes the stored `proxy` object beside agent-derived live status.
- `GET /api/servers/:id/status` derives Minecraft connection state from agent readiness, persisted edge state, Velocity registration state, and backend ping state.
- When Velocity has registered a backend but has not posted a separate `proxy_ping_ok`, API can treat an agent-confirmed Minecraft ping (`readySource: "minecraft_ping"`) as a backend ping fallback unless Velocity explicitly reported `proxy_ping_failed`.
- The bridge should set `ZPACK_PROXY_STATUS_ENDPOINT` to the API internal callback URL, for example `http://<api-host>:4000/internal/velocity/proxy-status`.
## Host / LXC lifecycle state
- `/api/servers/:id/host/status` exposes the underlying Proxmox LXC state for both game and dev containers.
- Host status now distinguishes container power state from agent/game runtime state.
- Host status response includes `hostStatus`, `powerState`, `running`, active `operation`, and selected Proxmox stats.
- `hostStatus` can report `running`, `stopped`, `starting`, `stopping`, or `restarting`.
- Host lifecycle routes now return `202 Accepted` with an operation object and `statusUrl` instead of blocking until Proxmox finishes:
- `POST /api/servers/:id/host/start`
- `POST /api/servers/:id/host/stop`
- `POST /api/servers/:id/host/restart`
- Overlapping host lifecycle operations return `409 host_operation_in_progress`.
- Host lifecycle operation state is cached under Redis `hostop:<vmid>` with a short TTL and is also surfaced in `GET /api/servers` as `hostOperation`.
- `GET /api/servers` includes Proxmox-backed `hostStatus` and `powerState` so Portal list views can show LXC-level start/stop/restart state for game and dev containers.
- Host lifecycle routes now perform ownership checks before touching Proxmox.
- Proxmox client can resolve the actual node for an LXC via `/cluster/resources` instead of assuming every VMID lives on the configured default `PROXMOX_NODE`.
## Backup support
- API forwards game backup operations.
- Current API route shape:
- `GET /api/game/servers/:id/backups`
- `POST /api/game/servers/:id/backups`
- `POST /api/game/servers/:id/backups/restore?id=<backup_id>`
- `DELETE /api/game/servers/:id/backups/:backupId`
- Restore start is async at the API layer and Portal is expected to poll status rather than hold the restore POST open.
- API forwards agent HTTP status codes for backup responses.
- Successful backup responses currently pass through the agent body.
- Non-OK backup responses currently use the shared agent response envelope: `{ error: <fallback>, details: <agent_body> }`.
- Backup response shape normalization remains open.
## File proxy / route compatibility
- Duplicated game file proxy logic has been extracted into `src/routes/helpers/gameFileProxy.js` in the API repo.
- Route compatibility is intentionally preserved between:
- `/api/game/servers/:id/files...`
- `/api/servers/:id/files...`
- Streamed upload / download / file edit forwarding still exists outside the generic non-streaming agent helper path.
- Compatibility between canonical game routes and legacy/compatibility server routes should be treated as part of the API contract.
## Agent contract alignment already done
- `/start`, `/stop`, `/restart` forwarded as POST.
- `/console/command` forwarded as POST JSON.
- `/ready` is part of poller/readiness logic.
- There is no API-native `/api/agent/:serverId/:action` route in `zpack-api`.
- The reviewed Portal repo contained a Portal-side bridge with that shape which calls `/api/instances` for IP lookup and then contacts the agent directly; static review did not find an obvious current frontend caller for that bridge.
## Billing / auth lifecycle
- API issues access tokens and refresh tokens.
- Password reset tokens are stored hashed and exchanged through API routes.
- Password reset request now delivers email through the configured support mailbox SMTP path first, with optional Resend fallback and console-link fallback for local development.
- Password reset request routes are `POST /api/auth/password-reset/request` and alias `POST /api/auth/forgot-password`.
- Password reset confirm routes are `POST /api/auth/password-reset/confirm` and alias `POST /api/auth/reset-password`.
- Logged-in password change is available at `POST /api/auth/change-password` with bearer auth and body `{ currentPassword, newPassword }`.
- Logged-in password change verifies the current password, enforces the same 8-character minimum, updates the password hash, and marks outstanding password reset tokens used.
- Reset links use `RESET_PASSWORD_URL_BASE`, then `PORTAL_URL`, then `http://localhost:3000`, and point at `/reset-password?token=...`.
- Reset request responses remain generic to avoid account enumeration.
- Reset confirmation rejects passwords shorter than 8 characters and marks all outstanding reset tokens for that user used after a successful password change.
- Default reset sender is `ZeroLag Hub Support <support@zerolaghub.com>` and production SMTP is configured through `SMTP_HOST`, `SMTP_PORT`, `SMTP_SECURE`, `SMTP_USER`, and `SMTP_PASS`.
- Stripe billing routes cover checkout, upgrade, downgrade, portal, and current billing state.
- Stripe webhooks are mounted with raw body parsing before normal JSON middleware.
- Billing scheduler starts in-process and performs limited reminder/reconciliation work.
- Admin users are billing-exempt in billing flows.
- JWT verification has reportedly been tightened to fixed algorithm plus issuer/audience separation for access, refresh, and IDE proxy tokens.
- Pre-hardening tokens may no longer verify and a re-login may be required after this change.
## Hosted IDE proxy
- `POST /api/dev/:id/ide-token` issues short-lived IDE proxy tokens.
- IDE proxy supports both tunnel paths under `/__ide/:id` and hosted `dev-<vmid>.<suffix>` hosts.
- Hosted IDE tokens can be delivered by query parameter and then persisted as the IDE proxy cookie.
- Hosted URL return is controlled by `DEV_IDE_RETURN_HOSTED_URL`.
- API exposes code-server controls for owned dev containers:
- `POST /api/dev/:id/codeserver/start` -> Agent `POST /dev/codeserver/start`
- `POST /api/dev/:id/codeserver/stop` -> Agent `POST /dev/codeserver/stop`
- `POST /api/dev/:id/codeserver/restart` -> Agent `POST /dev/codeserver/restart`
- IDE proxy cookie hardening is expected to include `httpOnly`, `sameSite: "lax"`, and secure-cookie behavior tied to public HTTPS or explicit secure-cookie config.
- Sensitive proxy logging has reportedly been reduced so cookies and forwarded header detail are not exposed in normal logs.
## Console / outbound socket stability
- Console WebSocket proxy attachment is guarded so the console upgrade handler is only attached once per HTTP server.
- Console proxy raw socket error logging is guarded to avoid stacking duplicate socket listeners.
- API raises listener limits on inbound HTTP sockets and console WebSocket sockets to avoid false-positive listener warnings under proxy/websocket fan-out.
- Axios-backed outbound clients now use shared HTTP/HTTPS agent helpers that raise outbound socket listener limits at socket creation time.
- The outbound socket agent helper is used by Proxmox, OPNsense, Cloudflare, and Prometheus metrics query paths.
- A temporary `MaxListenersExceededWarning` tracer exists in `src/app.js` to log emitter/event/count/stack if listener warnings recur.
## Legacy / archived behavior
- legacy port allocation / slot reservation is no longer part of the live route mounts and has been archived for reference
- legacy worker provisioning, detached reconcile helpers, and explicit `.old` files have been moved under archive for reference rather than kept in the live tree
- websocket console proxy wiring remains outside `agentClient.js`
- raw streaming upload proxy behavior remains outside `agentClient.js`
## Node 24 cleanup already reflected in API repo
- `RegExp.escape(...)` is used where host / suffix regex escaping was previously manual.
- Selected built-in imports have been normalized to `node:` style.