Reconcile root open threads with Codex ownership model
This commit is contained in:
parent
948b207bc4
commit
7db130baf4
309
OPEN_THREADS.md
309
OPEN_THREADS.md
@ -1,263 +1,60 @@
|
|||||||
# Open Threads — zlh-grind
|
# Open Threads — zlh-grind
|
||||||
|
|
||||||
This file tracks active but unfinished work.
|
This file tracks **active cross-repo and platform-level work only**.
|
||||||
|
|
||||||
Keep it short.
|
Repo-specific work belongs in:
|
||||||
|
- `Codex/API/OPEN_ITEMS.md`
|
||||||
|
- `Codex/Portal/OPEN_ITEMS.md`
|
||||||
|
- `Codex/Agent/OPEN_ITEMS.md`
|
||||||
|
|
||||||
|
Keep this file short.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Agent (zlh-agent)
|
## Cross-Repo Active
|
||||||
|
|
||||||
### Dev Runtime System
|
### Game backup integration
|
||||||
Completed:
|
- normalize backup response shape across API and Portal so list/create/restore/delete have stable success bodies and a consistent error envelope
|
||||||
- catalog validation implemented
|
- validate local Minecraft backup/restore flow on a real live server end-to-end
|
||||||
- runtime installs artifact-backed
|
- confirm checkpoint metadata presentation once API exposes the final stable fields to Portal
|
||||||
- install guard implemented
|
|
||||||
- all installs now fetch from artifact server
|
|
||||||
|
|
||||||
Outstanding:
|
### Dev access / IDE / SSH
|
||||||
- runtime install verification improvements
|
- simplify and harden API `devProxy`
|
||||||
- catalog hash validation
|
- complete SSH / CF tunnel access path across platform, API, Agent, and Portal UX
|
||||||
- runtime removal / upgrade handling
|
- add Portal SSH config snippet for power users
|
||||||
- runtime update process for dev containers
|
|
||||||
|
|
||||||
### Dev Environment
|
### Service discovery / launch validation
|
||||||
Completed:
|
- service discovery migration for remaining hot-path references
|
||||||
- dev user creation
|
- provisioning validation across current API/Agent/Portal assumptions
|
||||||
- workspace root `/home/dev/workspace`
|
- Fabric / readiness / Velocity exposure final cross-component verification
|
||||||
- console runs as dev user
|
|
||||||
- `HOME`, `USER`, `LOGNAME`, `TERM` env vars set correctly
|
|
||||||
|
|
||||||
Outstanding:
|
|
||||||
- PATH normalization
|
|
||||||
- shell profile consistency
|
|
||||||
- runtime PATH injection
|
|
||||||
|
|
||||||
### Dev Container Backups
|
|
||||||
Recommendation:
|
|
||||||
- implement dev backups as workspace snapshots, not whole-container backups
|
|
||||||
- primary scope should be `/home/dev/workspace`
|
|
||||||
- restore should rebuild from config, then restore workspace snapshot
|
|
||||||
- treat dotfiles / user settings as optional follow-up, not default backup scope
|
|
||||||
- avoid backing up reproducible runtime payloads and caches by default
|
|
||||||
- plan remote storage early for dev backups so node loss does not equal workspace loss
|
|
||||||
|
|
||||||
### Code Server Addon
|
|
||||||
Status: Installed, running, browser-verified end-to-end.
|
|
||||||
|
|
||||||
Outstanding:
|
|
||||||
- code-server memory baseline
|
|
||||||
- decide whether default dev RAM should increase or become tier-based
|
|
||||||
- k6 IDE session load test
|
|
||||||
|
|
||||||
### Game Server Supervision
|
|
||||||
Completed:
|
|
||||||
- crash recovery with backoff
|
|
||||||
- crash observability and classification
|
|
||||||
- unified readiness-aware start / restart path across manual start, restart, autostart, and supervisor recovery
|
|
||||||
- dead duplicate crash monitor removed
|
|
||||||
- `/ready` endpoint added and operation/maintenance state surfaced through `/status`
|
|
||||||
- guarded operation lock added for mutating / stateful flows
|
|
||||||
- console command endpoint hardened
|
|
||||||
- agent self-update rollback symlink handling corrected
|
|
||||||
|
|
||||||
### Game Server Backups
|
|
||||||
Completed:
|
|
||||||
- first guarded Minecraft backup flow implemented in agent
|
|
||||||
- local backup create/list/restore endpoints added
|
|
||||||
- local backup delete endpoint added
|
|
||||||
- live backup uses `save-all flush` -> `save-off` -> archive -> `save-on`
|
|
||||||
- restore stops server, waits for exit, restores manifest-declared paths, then restarts through readiness-aware path
|
|
||||||
- **pre-restore checkpoint hardening implemented (Apr 16 2026)**
|
|
||||||
- backup metadata fields added: `type` and optional `reason`
|
|
||||||
- manual backups record `type: "manual"`
|
|
||||||
- restore creates a local `checkpoint` backup with `reason: "pre_restore"` before any destructive operation
|
|
||||||
- checkpoint creation aborts the restore if it fails — live data never touched
|
|
||||||
- checkpoint creation disables pruning so safety backup is not immediately removed
|
|
||||||
- collision-safe backup IDs for same-second creation
|
|
||||||
- `POST /game/backups/restore` response now includes `restored`, `backup`, `checkpoint`
|
|
||||||
- full test coverage: checkpoint creation, abort-before-delete, manual metadata, listability, unsafe restore rejection
|
|
||||||
|
|
||||||
Outstanding:
|
|
||||||
- remote backup storage / transfer path
|
|
||||||
- backup job history / progress beyond current operation state
|
|
||||||
- retention policy refinement beyond initial local pruning
|
|
||||||
- real-world validation on live Minecraft server
|
|
||||||
|
|
||||||
### Fabric Readiness Gating
|
|
||||||
Status: startup rehydrate path fixed in plugin and both happy-path and negative-path startup validation passed.
|
|
||||||
|
|
||||||
Still outstanding:
|
|
||||||
- confirm whether any remaining agent-side registration path can surface a backend before readiness probe success
|
|
||||||
- see `SCRATCH/session-stabilization-fabric-findings.md`
|
|
||||||
|
|
||||||
### Agent Future Work (post-launch)
|
|
||||||
1. Structured logging (slog) for Loki
|
|
||||||
2. Dev container `provisioningComplete` state in `/status`
|
|
||||||
3. Graceful shutdown verification
|
|
||||||
4. Process reattachment on agent restart
|
|
||||||
5. SSH server install in dev container provisioning pipeline
|
|
||||||
6. Long-running job model (job IDs, progress phases, cancel/retry)
|
|
||||||
7. Typed platform-action wrappers over raw console commands
|
|
||||||
8. Persistent operation recovery after agent restart
|
|
||||||
9. `RestartServer()` readiness probe bypass — fix or document
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Dev IDE Access
|
|
||||||
|
|
||||||
### Browser IDE
|
|
||||||
Status: fully working, browser-verified, zero-install.
|
|
||||||
|
|
||||||
Remaining:
|
|
||||||
- confirm "Open IDE" button in portal uses hosted URL in production path
|
|
||||||
- reduce legacy `/__ide/:id` compatibility paths once portal button confirmed
|
|
||||||
- simplify and harden `devProxy`
|
|
||||||
|
|
||||||
### Local Dev Access — SSH via CF Tunnel
|
|
||||||
Current state:
|
|
||||||
- tunnel created and connected to bastion VM
|
|
||||||
- Zero Trust free plan active
|
|
||||||
- SSH hostname mapping not yet configured
|
|
||||||
- bastion SSH proxy jump config not yet done
|
|
||||||
- dev container SSH server not yet verified
|
|
||||||
- portal SSH config snippet not yet built
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## API (zpack-api)
|
|
||||||
|
|
||||||
Completed:
|
|
||||||
- dev provisioning payload
|
|
||||||
- runtime/version fields
|
|
||||||
- enable_code_server flag
|
|
||||||
- status endpoint
|
|
||||||
- IDE token generation + hosted URL
|
|
||||||
- bootstrap IDE route
|
|
||||||
- live tunnel proxy
|
|
||||||
- host-based routing for hosted IDE
|
|
||||||
- hosted flow browser-verified end-to-end
|
|
||||||
- backend lifecycle hardened
|
|
||||||
- duplicate server creation fixed
|
|
||||||
- console routing corrected
|
|
||||||
- control plane switched to IP-based service communication
|
|
||||||
- Velocity rehydration uses DB + Redis instead of Proxmox live state
|
|
||||||
- Stripe webhook delivery/reachability fixed via public billing hostname
|
|
||||||
- webhook-driven persistence of billing state (`subscriptionStatus`, `plan`)
|
|
||||||
- billing page/API alignment for active state and Stripe portal flow
|
|
||||||
- direct in-app plan upgrade endpoint (`/api/billing/upgrade`)
|
|
||||||
- direct in-app plan downgrade scheduling endpoint (`/api/billing/downgrade`)
|
|
||||||
- persisted billing fields for `currentPeriodEnd`, `lastInvoicePaidAt`, `billingSyncedAt`
|
|
||||||
- persisted scheduled downgrade state (`scheduledPlan`, `scheduledPlanEffectiveAt`)
|
|
||||||
- plan-based quota enforcement in `POST /api/instances`
|
|
||||||
- password reset request + confirm flow implemented
|
|
||||||
- agent contract updated for POST control actions, `/ready`, operation state, and backup routes
|
|
||||||
- agent transport consolidated into shared `agentClient.js`
|
|
||||||
- semantic readiness split implemented with shared `isAgentReadyResult()`
|
|
||||||
- API backup forwarding added for list / create / restore / delete
|
|
||||||
- agent `409` conflict and readiness-oriented error handling preserved instead of collapsing to generic `500`
|
|
||||||
- Velocity routing made more conservative around missing readiness
|
|
||||||
|
|
||||||
Outstanding:
|
|
||||||
- simplify and harden host-native `devProxy`
|
|
||||||
- dev runtime catalog endpoint for portal
|
|
||||||
- Headscale auth key generation
|
|
||||||
- service discovery migration for remaining hot-path `internal.zlh` references
|
|
||||||
- normalize backup response shape now that portal is tolerating multiple field names
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Portal (zpack-portal)
|
|
||||||
|
|
||||||
Completed:
|
|
||||||
- dev runtime dropdown
|
|
||||||
- dotnet runtime support
|
|
||||||
- enable code-server checkbox
|
|
||||||
- dev file browser support
|
|
||||||
- site copy rewrite
|
|
||||||
- pricing page updated
|
|
||||||
- billing page aligned with API v2 billing state
|
|
||||||
- honest Stripe portal section with single portal CTA
|
|
||||||
- in-app Basic → Pro upgrade wiring
|
|
||||||
- in-app Pro → Basic scheduled downgrade wiring
|
|
||||||
- quota/limit messaging on create flow with billing upgrade guidance
|
|
||||||
- forgot-password + reset-password pages and login linkage
|
|
||||||
- first-login onboarding modal with quick/full tour and skip
|
|
||||||
- dashboard IA refresh: spotlight server card replaces duplicate mini-listing
|
|
||||||
- operation / maintenance state surfaced in game server UI
|
|
||||||
- first backup UI added for list / create / restore
|
|
||||||
- backup delete UI added with destructive confirmation
|
|
||||||
- targeted `409` / `503` messaging added for operation conflict and not-ready states
|
|
||||||
- console command submission updated to POST JSON
|
|
||||||
- console page action gating fixed so stopped MC servers remain startable while console send stays gated to running + ready
|
|
||||||
|
|
||||||
Outstanding:
|
|
||||||
- confirm "Open IDE" button fully uses hosted URL flow
|
|
||||||
- SSH config snippet for power users
|
|
||||||
- email notifications
|
|
||||||
- remove `testdaemon` binary from repo root
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Game Servers
|
|
||||||
|
|
||||||
Completed:
|
|
||||||
- first local Minecraft backup / restore flow wired end-to-end through agent, API, and portal
|
|
||||||
- manual local backup delete wired end-to-end through agent, API, and portal
|
|
||||||
- pre-restore checkpoint hardening complete in agent
|
|
||||||
|
|
||||||
Outstanding:
|
|
||||||
- remote storage for game server backups
|
|
||||||
- real-world backup/restore validation on live Minecraft server
|
|
||||||
- game server subdomain / player connection method verification
|
- game server subdomain / player connection method verification
|
||||||
|
|
||||||
---
|
### Notifications / launch polish
|
||||||
|
- email notifications across backend contract + Portal UX
|
||||||
## Velocity / ZpackVelocityBridge
|
- remove stray `testdameon` / `testdaemon` binary from Portal repo
|
||||||
|
|
||||||
Completed:
|
|
||||||
- startup rehydrate now requires `ready == true`
|
|
||||||
- stale default `zpack-api.internal.zlh` fallback removed
|
|
||||||
- explicit `ZPACK_REHYDRATE_ENDPOINT` env wiring validated
|
|
||||||
- happy-path player routing validated live after restart
|
|
||||||
- negative-path startup validation passed: no backend registered when rehydrate returned zero eligible servers
|
|
||||||
|
|
||||||
Outstanding:
|
|
||||||
- confirm whether any remaining agent-side registration path can surface a backend before readiness probe success
|
|
||||||
- see `SCRATCH/velocity-plugin.md`
|
|
||||||
|
|
||||||
Optional / Nonessential:
|
|
||||||
- `ZpackCommands` exist in code but are not currently needed operationally
|
|
||||||
- prefer plugin HTTP status endpoint and host metrics over in-proxy admin commands
|
|
||||||
- if metrics coverage is sufficient, command registration can remain omitted or the command class can be removed later
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Pre-Launch Checklist
|
## Platform / Infrastructure Active
|
||||||
|
|
||||||
Outstanding before launch:
|
|
||||||
- remote storage for game server backups
|
|
||||||
- real-world backup/restore validation on live Minecraft server
|
|
||||||
- game server subdomain verification
|
|
||||||
- email notifications
|
|
||||||
- upload testing
|
- upload testing
|
||||||
- billing endpoint/path cleanup verification
|
- stress testing: k6 IDE load, Minecraft bot load, code-server memory baseline
|
||||||
- stress testing: k6 IDE + Minecraft bot + code-server memory baseline
|
|
||||||
- OPNsense audit
|
- OPNsense audit
|
||||||
- Fabric readiness gating full validation
|
- billing endpoint/path cleanup verification
|
||||||
- service discovery migration
|
|
||||||
- provisioning validation
|
### Backup boundary
|
||||||
- remove `testdaemon` from `zpack-portal`
|
- Agent-owned backups are local, app-aware rollback backups for Minecraft worlds/config
|
||||||
|
- PBS / platform backup strategy is the durability / disaster-recovery layer
|
||||||
|
- do not track PBS/offsite durability work as agent implementation work unless that ownership changes
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Platform
|
## Platform Future
|
||||||
|
|
||||||
Future work:
|
- CF Tunnel SSH completion beyond first working path
|
||||||
- CF Tunnel SSH completion
|
|
||||||
- artifact version promotion
|
- artifact version promotion
|
||||||
- runtime rollback support
|
- runtime rollback support
|
||||||
- Cloudflare R2 for large artifact/mod file delivery
|
- Cloudflare R2 for large artifact/mod delivery
|
||||||
- admin panel
|
- admin panel
|
||||||
- referral / dev pipeline reward system
|
- referral / dev pipeline reward system
|
||||||
- uptime history
|
- uptime history
|
||||||
@ -265,35 +62,11 @@ Future work:
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Closed Threads
|
## Cleaning Rule
|
||||||
|
|
||||||
- PTY console (dev + game)
|
- Root keeps only cross-repo/platform work
|
||||||
- mod lifecycle
|
- Repo-specific items must be removed from root once they live only in one Codex tracker
|
||||||
- upload pipeline
|
- Completed items should be removed, not left in place as historical clutter
|
||||||
- runtime artifact installs
|
- Use `CURRENT_STATE.md` for durable implemented behavior
|
||||||
- dev container filesystem model
|
- Use `DECISIONS.md` for settled choices
|
||||||
- code-server artifact fix
|
- Re-open old items only when there is current evidence they are still unfinished or have regressed
|
||||||
- API status endpoint for frontend agent-state consumption
|
|
||||||
- game server crash recovery with backoff
|
|
||||||
- crash observability
|
|
||||||
- code-server lifecycle endpoints
|
|
||||||
- code-server process detection
|
|
||||||
- dev IDE proxy
|
|
||||||
- hosted wildcard Traefik → API → container IDE flow
|
|
||||||
- per-container dev IDE edge publish/unpublish removed from API
|
|
||||||
- wildcard TLS cert `*.zerolaghub.dev`
|
|
||||||
- browser IDE fully loading at `dev-<vmid>.zerolaghub.dev`
|
|
||||||
- CF Tunnel created and connected to bastion VM
|
|
||||||
- portal copy rewrite
|
|
||||||
- DDoS investigation
|
|
||||||
- hosting provider decision: GTHost Detroit
|
|
||||||
- migration to Detroit complete
|
|
||||||
- system stabilization after migration
|
|
||||||
- IP-based control plane
|
|
||||||
- Velocity startup rehydrate fixed and validated on happy path
|
|
||||||
- billing / Stripe webhook delivery and persistence
|
|
||||||
- password reset flow
|
|
||||||
- usage limits / quota enforcement
|
|
||||||
- user onboarding flow
|
|
||||||
- dashboard spotlight server IA refresh
|
|
||||||
- pre-restore checkpoint hardening (agent-side, Apr 16 2026)
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user