zlh-grind/OPEN_THREADS.md

4.8 KiB

Open Threads — zlh-grind

This file tracks active cross-repo and platform-level work only.

Repo-specific work belongs in:

  • Codex/API/OPEN_ITEMS.md
  • Codex/Portal/OPEN_ITEMS.md
  • Codex/Agent/OPEN_ITEMS.md

Keep this file short.


Cross-Repo Active

Backup / restore UX and contract polish

  • keep Portal aligned with async restore start + status polling
  • keep restore wording/status transitions clear through completion and restart
  • confirm checkpoint metadata presentation remains clean when exposed to Portal
  • consider later hardening for automatic rollback from pre-restore checkpoint if restore apply/start fails after destructive replace

Dev access / IDE / SSH

  • simplify and harden API devProxy
  • complete SSH / CF tunnel access path across platform, API, Agent, and Portal UX
  • add Portal SSH config snippet for power users
  • resolve the dev console / shell workspace-boundary mismatch: current live validation shows hosted IDE and dev console work, but interactive shell traversal can still cd .. upward from /home/dev/workspace
  • make docs and implementation agree on whether workspace scoping is file-API-only or true interactive-shell confinement

Dev backup strategy

  • define dev-container backup ownership and user-facing restore contract
  • current likely direction: prefer LXC snapshot-based backup/restore for dev containers instead of agent-managed dev backups
  • keep game backup ownership separate from dev backup ownership unless that decision changes
  • confirm how snapshot creation, retention, restore UX, and API/Portal exposure should work for dev containers

Service discovery / launch validation

  • service discovery migration for remaining hot-path references
  • provisioning validation across current API/Agent/Portal assumptions
  • Fabric / readiness / Velocity exposure final cross-component verification
  • game server subdomain / player connection method verification

Monitoring / observability

  • normalize game/dev Alloy monitoring contract across API discovery, agent-written Alloy labels, Prometheus targets, and Grafana dashboards
  • keep dynamic game/dev discovery on API -> sync script -> file_sd and verify automatic add/remove behavior for new containers
  • finish game/dev template cleanup so Alloy is standard and node-exporter is removed from those templates
  • keep OPNsense plugin and PBS monitoring as explicit platform exceptions while Linux-managed targets converge on Alloy

Notifications / launch polish

  • email notifications across backend contract + Portal UX
  • remove stray testdameon / testdaemon binary from Portal repo

Platform / Infrastructure Active

  • upload testing
  • stress testing: k6 IDE load, Minecraft bot load, code-server memory baseline
  • OPNsense audit
  • billing endpoint/path cleanup verification

Backup boundary

  • Agent-owned backups are local, app-aware rollback backups for Minecraft worlds/config
  • PBS / platform backup strategy is the durability / disaster-recovery layer
  • do not track PBS/offsite durability work as agent implementation work unless that ownership changes

Recently Verified / No Longer Considered Blocked

  • local Minecraft backup create/restore works end-to-end on live validation
  • restore creates intentional pre-restore checkpoint and API now starts restore asynchronously instead of holding the full request open
  • backup timestamps are normalized and pre-restore checkpoints are filtered from the default backup list
  • agent-backed file edits create shadow copies for revert and API route/stream forwarding issues were fixed
  • vanilla / fabric runtime split is restored:
    • vanilla = Fabric-based internal profile with proxy/API/config injection
    • fabric = plain Fabric jar delivery only
  • Forge / Neoforge first-start flow now avoids premature readiness gating, applies post-start property enforcement, and restarts through the readiness-aware path
  • current validation indicates Minecraft server creation succeeds across supported runtime variants
  • current validation indicates dev container creation succeeds and hosted IDE access still works after the latest API/Portal runtime and cleanup passes

Platform Future

  • CF Tunnel SSH completion beyond first working path
  • artifact version promotion
  • runtime rollback support
  • Cloudflare R2 for large artifact/mod delivery
  • admin panel
  • referral / dev pipeline reward system
  • uptime history
  • revisit DDoS mitigation later if needed

Cleaning Rule

  • Root keeps only cross-repo/platform work
  • Repo-specific items must be removed from root once they live only in one Codex tracker
  • Completed items should be removed, not left in place as historical clutter
  • Use CURRENT_STATE.md for durable implemented behavior
  • Use DECISIONS.md for settled choices
  • Re-open old items only when there is current evidence they are still unfinished or have regressed