From 4707e761983ccef90c5e65d3eb4bda914c3fafbf Mon Sep 17 00:00:00 2001 From: jester Date: Sun, 3 May 2026 19:54:47 +0000 Subject: [PATCH] Refresh API open items after launch validation --- Codex/API/OPEN_ITEMS.md | 80 ++++++++++++++++++++--------------------- 1 file changed, 40 insertions(+), 40 deletions(-) diff --git a/Codex/API/OPEN_ITEMS.md b/Codex/API/OPEN_ITEMS.md index 992be1a..b1268ff 100644 --- a/Codex/API/OPEN_ITEMS.md +++ b/Codex/API/OPEN_ITEMS.md @@ -1,51 +1,52 @@ # API — Open Items -Only keep unfinished API work here. +Only keep unfinished API-owned work here. -## Active -- normalize backup response shape: define canonical success bodies for list/create/restore/delete and a stable error envelope that preserves agent details -- service discovery migration: audit edge publish/DNS/Cloudflare/Technitium, Prometheus SD, dev IDE wildcard, and post-provision hot paths for direct host assumptions -- provisioning validation follow-up where API behavior is involved -- verify Portal compatibility after API JWT issuer/audience tightening, especially refresh flow and hosted IDE token flow -- migrate Portal delete calls to `DELETE /api/servers/:id`; `DELETE /api/containers/:vmid` currently remains as owned-user compatibility plus internal-token automation, but should return to internal-only once Portal is updated -- verify canonical and compatibility file routes still behave identically across list/stat/read/download/delete/put/revert/upload paths after helper extraction -- align merged live-status readiness fields so Portal-facing `agentReady` semantics fully match semantic `/ready` -- decide whether the remaining Portal-side `/api/agent/:serverId/:action` bridge should be deleted outright or formally kept as compatibility-only Portal-owned behavior -- live-verify Velocity bridge lifecycle callbacks after `ZPACK_PROXY_STATUS_ENDPOINT` is set: confirm `registered_with_proxy`, `proxy_ping_ok`, and `proxy_ping_failed` land in `ContainerInstance.payload.proxy` and surface through `GET /api/servers/:id/status` -- Portal integration follow-up for host/LXC lifecycle state: consume `GET /api/servers/:id/host/status`, list-level `hostStatus` / `powerState` / `hostOperation`, and `202` host action responses for game and dev containers -- remove or downgrade the temporary `MaxListenersExceededWarning` tracer in `src/app.js` after outbound Axios socket listener warnings are confirmed quiet in runtime logs -- verify Proxmox node resolution against all active container ranges; recent local smoke checks showed some DB VMIDs not present in `/cluster/resources` or on the configured node -- add minimal test/CI coverage for auth boundaries and teardown behavior: - - normal users cannot reach admin-only routes - - missing internal token fails closed for internal-only routes - - owned-user delete archives then removes the active instance - - non-owner delete fails - - Stripe webhook raw-body handling remains intact +## Launch / validation active +- Normalize backup response shape: define canonical success bodies for list/create/restore/delete and a stable error envelope that preserves agent details. +- Live-validate billing backup mutation gates against a game backup fixture with backups available. +- Live-validate file read/list behavior against a responsive Agent while billing state is suspended/retained; policy should allow read/list/download and block mutations. +- Verify canonical and compatibility file routes still behave identically across list/stat/read/download/delete/put/revert/upload paths after helper extraction. +- Align merged live-status readiness fields so Portal-facing `agentReady` semantics fully match semantic `/ready`. +- Live-verify Velocity bridge lifecycle callbacks after `ZPACK_PROXY_STATUS_ENDPOINT` is set: confirm `registered_with_proxy`, `proxy_ping_ok`, and `proxy_ping_failed` land in `ContainerInstance.payload.proxy` and surface through `GET /api/servers/:id/status`. +- Verify Proxmox node resolution against all active container ranges; recent local smoke checks showed some DB VMIDs not present in `/cluster/resources` or on the configured node. +- Add queue staleness visibility/alerts for `provisioning`, `repair`, and `billing_enforcement` if not already covered by monitoring/controller logs. +- Remove or downgrade the temporary `MaxListenersExceededWarning` tracer in `src/app.js` after outbound Axios socket listener warnings are confirmed quiet in runtime logs. + +## Launch architecture follow-ups +- Controller should remain conservative; Level 2 repairs such as agent/workload restart stay disabled until separately validated. +- Decide when to move controller from dry-run to default Level 1 auto-repair after observing noise and repair recommendations. +- Keep billing worker scoped to billing enforcement only; do not add new worker/systemd services before launch without a strong safety-boundary reason. +- Support ticket post-launch enhancements belong outside launch blocker scope: admin ticket list/view, support triage diagnostics, self-hosted helpdesk integration, inbound reply parsing, attachments. ## Cleanup / consolidation priorities -- fold repeated ownership/auth/IP-guard patterns into small concrete helpers without hiding route intent -- split oversized route/service files by responsibility without changing route contracts -- keep backup/restore status shaping and async-dispatch logic explicit, but remove duplicated mapping/normalization paths where possible -- keep stream-vs-JSON forwarding rules centralized in one place and avoid route-local reimplementation -- keep legacy flows out of the live tree unless they are intentionally revived and revalidated against the current schema/contracts +- Fold repeated ownership/auth/IP-guard patterns into small concrete helpers without hiding route intent. +- Split oversized route/service files by responsibility without changing route contracts. +- Keep backup/restore status shaping and async-dispatch logic explicit, but remove duplicated mapping/normalization paths where possible. +- Keep stream-vs-JSON forwarding rules centralized in one place and avoid route-local reimplementation. +- Keep legacy flows out of the live tree unless they are intentionally revived and revalidated against the current schema/contracts. ## Completed and moved out of active cleanup -- Node/runtime pinning is no longer an open cleanup-only item; Node 24 pinning is now treated as current repo state -- `node-fetch` removal and built-in `fetch` migration are no longer open items -- initial file-proxy route deduplication has been completed; only compatibility verification and follow-on cleanup remain open -- Prisma config migration is no longer an open item -- baseline proxy cookie/log hardening is no longer an open item -- worker-era provisioning and detached legacy port reservation have been removed from the live tree rather than treated as active API surfaces -- initial control-plane hardening has landed: shared admin/internal-token middleware, admin-only audit/global instance list, internal-token guards on raw control-plane routes, and fail-closed monitoring/SD token behavior outside explicit development/test -- teardown workflow has been extracted into a service and live-verified against one game VMID and one dev VMID; both archived with `reason=user_delete` and active rows were removed -- repo hygiene pass removed checked-in key/token/artifact/legacy clutter and tightened ignore rules +- Node/runtime pinning is current repo state. +- `node-fetch` removal and built-in `fetch` migration are complete. +- Initial file-proxy route deduplication is complete; only compatibility verification remains open. +- Prisma config migration is complete. +- Baseline proxy cookie/log hardening is complete. +- Initial control-plane hardening is complete. +- Teardown workflow has been extracted into a service and live-verified. +- Repo hygiene pass removed checked-in key/token/artifact/legacy clutter and tightened ignore rules. +- Async provisioning worker is implemented, systemd-backed, and live-validated for game and dev creates. +- Provisioning idempotency/no-key guard and controlled failure handling have been validated. +- Controller/reconciler foundation, repair queue, Discord notifications, stale-operation repair, and live edge drift repair have been implemented and validated. +- Billing enforcement backend, Stripe idempotency, API gates, billing worker, and controller billing guards have been implemented and validated. +- Support ticket route, DB ticket creation, customer acknowledgement email, and Discord support alert have been implemented and validated. ## Cleanup rule -- prefer behavior-preserving folding over broad refactors -- merge repeated flows, not concepts -- keep helpers small and concrete -- reduce route-local duplication before introducing new abstractions -- treat security/runtime changes as contract-sensitive validation work once they affect auth, cookies, or route compatibility +- Prefer behavior-preserving folding over broad refactors. +- Merge repeated flows, not concepts. +- Keep helpers small and concrete. +- Reduce route-local duplication before introducing new abstractions. +- Treat security/runtime changes as contract-sensitive validation work once they affect auth, cookies, or route compatibility. ## Verify before re-opening - hosted IDE token + hosted URL flow @@ -55,7 +56,6 @@ Only keep unfinished API work here. - restore async-start contract + status polling semantics - streamed file edit/revert forwarding through both canonical and compatibility routes - older-session re-login behavior after JWT tightening -- Portal-side `/api/agent` bridge usage before deleting any remaining compatibility code around instance lookup assumptions ## Not API-owned - agent-local backup implementation details