zlh-grind/OPEN_THREADS.md

6.9 KiB

Open Threads — zlh-grind

This file tracks active cross-repo and platform-level work only.

Repo-specific work belongs in:

  • Codex/API/OPEN_ITEMS.md
  • Codex/Portal/OPEN_ITEMS.md
  • Codex/Agent/OPEN_ITEMS.md
  • Codex/Monitoring/OPEN_ITEMS.md

Keep this file short.


Cross-Repo Active

Final launch smoke test

  • create Minecraft server
  • confirm it reaches Ready / connectable=true
  • verify public game hostname is shown only when connectable
  • upload datapack on vanilla or install mod on supported modded runtime
  • create backup
  • restore backup
  • stop/start/restart host lifecycle actions
  • delete server
  • confirm Velocity unregister, Cloudflare cleanup, and Technitium cleanup

Portal public-site QA

  • marketing site now has hybrid SaaS structure and SEO landing pages; verify visually before public push
  • check desktop and mobile layouts for Home, Features, Pricing, FAQ, About, Support, and SEO landing pages
  • confirm public CTAs route correctly to Register, Login, Pricing, Features, FAQ, and SEO pages
  • confirm root metadata, page metadata, and hero copy reflect browser dev environments + managed server hosting
  • mobile web is not yet considered optimized; perform a targeted usability pass before launch marketing traffic

Backup / restore polish

  • happy-path local Minecraft backup create/restore has been verified live
  • API restore starts asynchronously and Portal polls restore status
  • keep restore wording/status transitions clear through completion and restart
  • confirm checkpoint metadata presentation remains clean when exposed to Portal
  • later hardening: persist last restore failure/checkpoint state in Agent /status
  • later hardening: automatic rollback from pre-restore checkpoint if restore apply/start fails after destructive replace

Service discovery / launch validation

  • service discovery migration audit for remaining non-launch hot-path references
  • provisioning validation across current API/Agent/Portal assumptions
  • keep public exposure model explicit:
    • Portal public
    • Minecraft game hostnames public as needed
    • API/control plane/internal bridge/agent/admin services private

Monitoring / observability

  • core lifecycle monitoring is launch-ready
  • /etc/zlh-monitor is now the operational monitoring source of truth
  • game/dev monitoring uses API discovery -> monitor sync -> file_sd for lifecycle inventory and add/remove validation
  • container Alloy remote-write to Prometheus 10.60.0.25:9090 is the canonical game/dev metrics path
  • game-dev-alloy scrape health also works because container Alloy now listens on 0.0.0.0:12345
  • remaining future work lives in Codex/Monitoring/OPEN_ITEMS.md: centralized logs/Loki and optional OPNsense router-only monitoring
  • keep OPNsense plugin/PBS monitoring as explicit platform exceptions while Linux-managed game/dev targets converge on Alloy

Notifications / launch polish

  • email notifications across backend contract + Portal UX
  • billing launch validation:
    • plan limit gating verified in Portal
    • still verify checkout/portal/webhook/upgrade-downgrade if Stripe is live

Platform / Infrastructure Active

  • stress testing: k6 IDE load, Minecraft bot load, code-server memory baseline
  • OPNsense / public exposure audit
  • billing endpoint/path cleanup verification

Backup boundary

  • Agent-owned backups are local, app-aware rollback backups for Minecraft worlds/config
  • PBS / platform backup strategy is the durability / disaster-recovery layer
  • do not track PBS/offsite durability work as agent implementation work unless that ownership changes

Recently Verified / No Longer Considered Blocked

  • password reset and logged-in change-password work end-to-end
  • password reset tokens are 5-minute, hashed at rest, single-use, and old unused tokens are invalidated on deploy
  • API-owned Minecraft connection state derives from agent readiness, edge/DNS state, Velocity registration, and backend ping
  • Velocity proxy lifecycle callbacks are live with registered_with_proxy and proxy_ping_ok landing in API state
  • Portal consumes API-owned connectable / connection state and no longer infers Minecraft readiness itself
  • Portal server creation redirects to /servers and tracks setup progress there
  • Portal status labels no longer treat all non-connectable states as Needs attention
  • Portal public marketing site now has hybrid conversion + SEO structure
  • Portal pricing tiers are now Starter / Pro / Performance workload tiers rather than Minecraft-only tier names
  • Portal root metadata, homepage hero copy, and fake CLI line were corrected to match actual product capabilities
  • SEO landing pages were added for Minecraft hosting, modded Minecraft hosting, and browser dev environments
  • local Minecraft backup create/restore works end-to-end on live validation
  • restore creates intentional pre-restore checkpoint and API now starts restore asynchronously instead of holding the full request open
  • backup timestamps are normalized and pre-restore checkpoints are filtered from the default backup list
  • agent-backed file edits create shadow copies for revert and API route/stream forwarding issues were fixed
  • vanilla datapack upload works
  • vanilla Mods UI is hidden and direct vanilla mods/ upload is rejected by API
  • NeoForge mod search/install/list works
  • delete/teardown lifecycle removes Velocity, Cloudflare, and Technitium records
  • public exposure model is in place: Portal public, control plane private
  • vanilla / fabric runtime split is restored:
    • vanilla = Fabric-based internal profile with proxy/API/config injection
    • fabric = plain Fabric jar delivery only
  • Forge / Neoforge first-start flow avoids premature readiness gating, applies post-start property enforcement, and restarts through the readiness-aware path
  • current validation indicates Minecraft server creation succeeds across supported runtime variants
  • current validation indicates dev container creation succeeds and hosted IDE access still works after the latest API/Portal runtime and cleanup passes

Platform Future / Phase 2

  • SSH / CF tunnel power-user access
  • Portal SSH config snippets
  • true interactive shell confinement / workspace-boundary decision
  • dev-container backup ownership and user-facing restore contract
  • likely direction for dev backups: LXC snapshot-based backup/restore instead of agent-managed dev backups
  • artifact version promotion
  • runtime rollback support
  • Cloudflare R2 for large artifact/mod delivery
  • admin panel
  • referral / dev pipeline reward system
  • uptime history
  • revisit DDoS mitigation later if needed

Cleaning Rule

  • Root keeps only cross-repo/platform work
  • Repo-specific items must be removed from root once they live only in one Codex tracker
  • Completed items should be removed, not left in place as historical clutter
  • Use CURRENT_STATE.md for durable implemented behavior
  • Use DECISIONS.md for settled choices
  • Re-open old items only when there is current evidence they are still unfinished or have regressed