zlh-grind/Codex/API/DECISIONS.md

5.3 KiB

API — Decisions

Settled

  • API is the heartbeat authority by polling agents.
  • Agent does not push heartbeat/state into API.
  • Semantic readiness uses /ready, not plain HTTP 200.
  • Portal should consume API-normalized state, not call agents directly for normal state/actions.
  • /api/instances and /api/containers are distinct contracts: instances is the active list/create surface, while containers is the cleanup/delete/orphan-remediation surface.
  • there is no API-native /api/agent/:serverId/:action route in zpack-api; any route with that shape is Portal-owned compatibility behavior rather than an API feature.
  • streaming upload proxy behavior should remain separate from generic non-streaming agentClient.js transport.
  • websocket console proxy behavior should remain separate from generic non-streaming agentClient.js transport.
  • API is now tracked on a Node 24 baseline with repo-local version pinning.
  • built-in global fetch is the intended fetch implementation; direct node-fetch dependency is no longer the preferred pattern.
  • duplicated game file proxy behavior should be folded into shared helper paths while preserving compatibility for both canonical and compatibility routes.
  • Prisma config should live in dedicated Prisma config, not deprecated package.json#prisma config.
  • JWT verification hardening is allowed to be contract-sensitive; access, refresh, and IDE proxy tokens may use distinct audience expectations.
  • hosted IDE proxy cookies should default to hardened behavior appropriate for public HTTPS deployments.
  • proxy logging should avoid exposing cookies or detailed forwarded-header values in routine logs.
  • legacy worker-based provisioning is no longer a live API path and should stay out of the active tree unless intentionally revived.
  • legacy port allocation / slot reservation is no longer part of the active provisioning model and should stay retired unless intentionally revived end to end.
  • Minecraft edge routing uses Velocity; API should call the bridge's real HTTP routes: POST /zpack/register, POST /zpack/unregister, and GET /zpack/status.
  • API must not depend on a nonexistent Velocity /zpack/list route for registration verification.
  • registered_with_proxy means Velocity accepted the backend into its routing table; it does not mean the Minecraft backend is confirmed playable.
  • proxy_ping_ok / proxy_ping_failed are the stronger proxy-side readiness signals because they come from Velocity pinging the registered backend.
  • Proxy lifecycle state belongs under ContainerInstance.payload.proxy, separate from agent readiness and agentState.
  • The Velocity bridge should report proxy lifecycle events to POST /internal/velocity/proxy-status using the same hashed X-Zpack-Secret auth style.
  • Edge publish state and Velocity proxy callback state should be merged into ContainerInstance.payload atomically rather than read/modify/write replacing the full JSON payload.
  • Underlying Proxmox/LXC lifecycle state is a separate API contract from game runtime readiness and agent readiness.
  • Shared host lifecycle routes under /api/servers/:id/host/* should serve both game and dev containers.
  • Host lifecycle actions should return 202 Accepted plus an operation/status URL instead of holding the request open until Proxmox completes.
  • API must check ownership before any host/LXC lifecycle action.
  • API should resolve the actual Proxmox node for a VMID when possible instead of assuming the configured default node is always correct.
  • Listener-limit fixes should target the socket creation/attachment point. For outbound Axios/follow-redirects traffic, use configured HTTP/HTTPS agents rather than relying on inbound HTTP socket handling.
  • API control-plane routes must have explicit route-level authorization even when network access is already gated behind OPNsense/internal routing.
  • requireAdmin is the shared policy for admin-only API routes such as audit logs and global instance inventory.
  • requireInternalToken is the shared policy for internal-only control-plane routes such as raw edge publishing, Proxmox access, raw container teardown, and service discovery.
  • Internal-token-protected routes should fail closed when token configuration is missing, except for explicit NODE_ENV=development or NODE_ENV=test local flows.
  • User-owned delete should use DELETE /api/servers/:id with a normal user bearer token and ownership check.
  • Raw DELETE /api/containers/:vmid is primarily an internal/admin/orphan-remediation surface. It temporarily accepts owned-user bearer deletes for Portal compatibility, but Portal should migrate to /api/servers/:id.
  • Orphan cleanup belongs to internal/admin workflows, not normal Portal user paths. Portal deletes only active servers the DB says the user owns.
  • Container teardown should live in a reusable service so Portal-owned delete and internal raw teardown share archival/Proxmox/DNS/Velocity/dev-IDE cleanup behavior.
  • Completed teardown should archive DeletedInstance before removing the active ContainerInstance row so cleanup metadata remains available.
  • Non-runtime clutter such as checked-in keys/tokens, local artifacts, .old scripts, src/tmp, and retired legacy trees should stay out of the active repo.

Tracking rule

  • when API work completes, remove it from OPEN_ITEMS.md
  • if it changes the long-lived architecture, update CURRENT_STATE.md or this file