# API — Current State This file records what is believed to be implemented now in `zpack-api`. ## Runtime / dependency baseline - API is tracked against Node 24 with repo-local pinning via `package.json` engines and `.nvmrc`. - API repo is JavaScript ESM; do not introduce TypeScript into API launch work unless a separate migration is explicitly approved. - Direct `node-fetch` dependency has been removed and API code uses built-in global `fetch`. - Prisma config lives in dedicated Prisma config, not deprecated `package.json#prisma` config. - Prisma generate / validate checks have passed on the current launch baseline. ## Service/process model Launch service/process set: ```text zpack-api.service # HTTP/API zpack-provision-worker.service # BullMQ provisioning worker zpack-repair-worker.service # Level 1 repair worker zlh-controller.service # singleton reconciler/controller zpack-billing-worker.service # billing enforcement worker ``` Guardrail: do not add more worker/systemd services before launch unless there is a strong safety-boundary reason. ## Provisioning / async create - `POST /api/instances` is the active create entrypoint. - Create flow validates request/account state, creates or reuses a durable `ProvisioningOperation`, enqueues a BullMQ `provisioning` job, and returns `202 Accepted` with `operationId` / `statusUrl`. - `src/queues/provisioning.js` is the live provisioning worker entrypoint. - Worker concurrency is `1`; blind retries are disabled with `attempts: 1`. - BullMQ job ID uses the colon-free operation ID. - Portal sends an `Idempotency-Key`; backend duplicate protection also includes a no-key launch guard. - Worker callbacks update operation status, phase, heartbeat, server persistence, and cleanup metadata. - Live validation passed for game and dev provisioning through the worker path. - Portal async pending cards were visually validated and replace themselves with the real server card after completion. - Existing API teardown works for worker-created servers. ## Controller / repair model - `src/controllers/reconciler.js` is the controller loop. - Controller runs as `zlh-controller.service`, separate from the HTTP API process. - Controller is singleton-protected by Redis lock key `zlh:controller:lock`. - Controller should remain conservative; Level 0/1 only for launch. - Controller is expected to run in dry-run unless deliberately enabling Level 1 auto-repair. - `src/controllers/repairPolicy.js` owns repair decision rules. - `src/queues/repair.js` is the Level 1 repair worker queue. - Validated Level 1 behavior: - expired provisioning operation can be marked `stale` - live Cloudflare SRV drift can be detected and repaired by `edge_republish` - `edge_republish` uses the existing full edge publish path and post-checks live edge state so it does not fake success. - Level 2 actions such as agent/workload restart remain disabled. - Level 3 destructive actions such as restore/rebuild/delete are never automatic. ## Billing enforcement - Durable billing enforcement exists through `BillingEnforcementState`, `BillingEnforcementEvent`, and `StripeEventLog`. - Stripe webhook handling covers `invoice.payment_failed`, `invoice.paid`, `invoice.payment_succeeded`, `customer.subscription.updated`, and `customer.subscription.deleted`. - Stripe events are idempotent through `StripeEventLog`. - `src/services/billingEnforcement.js` owns billing states, guard/assert helpers, Stripe state transitions, and due enforcement calculation. - `src/queues/billingEnforcement.js` owns billing enforcement queue execution. - `zpack-billing-worker.service` is installed/running under systemd. - Billing actions are allowlisted: warning, final warning, backup block, suspension shutdown, retained marking, restore access. - Destructive billing actions are rejected and audited. - API gates are implemented for provisioning, start/restart, backup mutations, console command/stream, and file mutations. - File read/list/download remains allowed while suspended/retained by policy. - Suspended/retained/pending-deletion billing state suppresses edge/DNS/Velocity repair and live edge observation. - Validated flows include payment failed, replay idempotency, backup block state, suspension/shutdown safety, API gates while suspended, controller no-repair while suspended, payment restored, and destructive rejection. ## Support tickets - `POST /api/support/create` is implemented and mounted under `/api/support`. - API creates a `SupportTicket` DB row with human-readable ticket number `ZLH-YYYYMMDD-XXXX`. - Customer acknowledgement email is sent through SMTP. - Discord `#support` alert is sent via `DISCORD_SUPPORT_WEBHOOK`. - Optional support mailbox copy uses `SUPPORT_EMAIL_TO` and email `Reply-To` support. - Portal form submit, customer acknowledgement email, Discord alert, and DB-backed ticket creation were validated live. - Support triage, admin ticket list/view, inbound reply parsing, attachments, and self-hosted helpdesk integration are post-launch enhancements. ## Security / trust boundaries - API has explicit route-level trust boundaries in addition to internal network placement behind OPNsense. - `src/middleware/requireAdmin.js` protects admin-only API routes. - `src/middleware/requireInternalToken.js` protects internal-only control-plane routes with `INTERNAL_API_TOKEN` / `ZLH_INTERNAL_API_TOKEN`. - Internal-token routes accept `X-ZLH-Internal-Token`, `X-Internal-Token`, or bearer auth carrying the internal token. - Internal-token routes fail closed when token config is missing except explicit development/test local flows. - `/api/audit` is admin-only. - `GET /api/instances` is admin-only global inventory. - `/api/edge/*`, `/api/proxmox/*`, raw control-plane routes, and monitoring/service-discovery surfaces require the appropriate admin/internal/discovery token boundary. - Portal never calls agents directly for normal user flows and must not expose internal tokens in browser code. ## Route and lifecycle split - `/api/instances`, `/api/servers`, and `/api/containers` are distinct operational surfaces. - `GET /api/servers` is the user-owned server list surface for Portal. - `DELETE /api/servers/:id` is the preferred Portal/user-owned delete contract and checks ownership before teardown. - `DELETE /api/containers/:vmid` remains raw cleanup/orphan-remediation and should be internal/admin oriented. - Orphan remediation stays in internal/admin workflows. ## Teardown / orphan cleanup behavior - Container teardown logic lives in `src/services/containerTeardown.js` so user-owned delete and raw internal delete share the same workflow. - Teardown archives a `DeletedInstance` record before removing the active `ContainerInstance` row. - Game teardown performs DNS / Cloudflare / Technitium / Velocity cleanup through existing publisher cleanup paths. - Dev teardown performs dev IDE cleanup through the dev IDE publisher path. - Live teardown was validated after worker-created game and dev server tests. ## Readiness / agent state model - API is the heartbeat authority by polling agents; Agent does not push state to API. - API consumes `/health`, `/ready`, and `/status`. - Portal should rely on API-normalized state, not direct agent state. - Proxy lifecycle is tracked separately from agent readiness under `ContainerInstance.payload.proxy`. - Edge publish state and Velocity callback state must be merged into `ContainerInstance.payload` atomically. - Transient Proxmox `lxc/status/current` read errors are soft-handled by host-status polling and surface `powerState: unknown` rather than raw API 500s. ## Velocity / Minecraft edge lifecycle - Minecraft edge publish uses Velocity instead of Traefik TCP config. - API registers/unregisters Minecraft backends through the Velocity bridge routes `POST /zpack/register`, `POST /zpack/unregister`, and verifies with `GET /zpack/status`. - API exposes `POST /internal/velocity/proxy-status` for bridge lifecycle callbacks. - Accepted proxy lifecycle statuses are `registered_with_proxy`, `proxy_ping_ok`, and `proxy_ping_failed`. - `GET /api/servers/:id/status` derives Minecraft connection state from agent readiness, persisted edge state, Velocity registration state, and backend ping state. ## Host / LXC lifecycle state - `/api/servers/:id/host/status` exposes Proxmox LXC power state for game and dev containers. - Host lifecycle routes return `202 Accepted` with operation/status URL rather than blocking until Proxmox finishes: - `POST /api/servers/:id/host/start` - `POST /api/servers/:id/host/stop` - `POST /api/servers/:id/host/restart` - Overlapping host lifecycle operations return `409 host_operation_in_progress`. - Host lifecycle routes perform ownership and billing checks before touching Proxmox. ## Backup support - API forwards game backup operations: - `GET /api/game/servers/:id/backups` - `POST /api/game/servers/:id/backups` - `POST /api/game/servers/:id/backups/restore?id=` - `DELETE /api/game/servers/:id/backups/:backupId` - Restore start is async at the API layer and Portal is expected to poll status. - Backup response shape normalization remains open. - Billing backup-block gates are implemented; full live route validation still needs a game backup fixture with backups available. ## File proxy / route compatibility - Duplicated game file proxy logic has been extracted into `src/routes/helpers/gameFileProxy.js`. - Compatibility is intentionally preserved between `/api/game/servers/:id/files...` and `/api/servers/:id/files...`. - Streamed upload/download/edit forwarding remains outside generic non-streaming agent helper transport. - Billing gates block file mutations while suspended/retained but allow read/list/download by policy. ## Hosted IDE proxy - `POST /api/dev/:id/ide-token` issues short-lived IDE proxy tokens. - IDE proxy supports hosted `dev-.` hosts and tunnel paths. - Hosted IDE access was validated during dev provisioning worker tests. - API exposes code-server controls for owned dev containers: - `POST /api/dev/:id/codeserver/start` - `POST /api/dev/:id/codeserver/stop` - `POST /api/dev/:id/codeserver/restart` ## Console / socket stability - Console WebSocket proxy attachment is guarded so the console upgrade handler is only attached once per HTTP server. - Console proxy raw socket error logging is guarded to avoid stacking duplicate socket listeners. - API raises listener limits on inbound HTTP sockets and console WebSocket sockets to avoid false-positive listener warnings under proxy/websocket fan-out. - Portal terminal connection-state hardening remains a Portal launch item. ## Legacy / archived behavior - Legacy port allocation / slot reservation is no longer part of the live route mounts. - Legacy synchronous HTTP provisioning is no longer the launch model; async BullMQ provisioning worker is the current model. - Raw streaming upload proxy behavior remains outside `agentClient.js`. - Non-runtime clutter such as checked-in keys/tokens, local artifacts, `.old` scripts, `src/tmp`, and retired legacy trees should stay out of the active repo.