11 KiB
11 KiB
Open Threads — zlh-grind
This file tracks active but unfinished work.
Keep it short.
Agent (zlh-agent)
Dev Runtime System
Completed:
- catalog validation implemented
- runtime installs artifact-backed
- install guard implemented
- all installs now fetch from artifact server
Outstanding:
- runtime install verification improvements
- catalog hash validation
- runtime removal / upgrade handling
- runtime update process for dev containers
Dev Environment
Completed:
- dev user creation
- workspace root
/home/dev/workspace - console runs as dev user
HOME,USER,LOGNAME,TERMenv vars set correctly
Outstanding:
- PATH normalization
- shell profile consistency
- runtime PATH injection
Dev Container Backups
Recommendation:
- implement dev backups as workspace snapshots, not whole-container backups
- primary scope should be
/home/dev/workspace - restore should rebuild from config, then restore workspace snapshot
- treat dotfiles / user settings as optional follow-up, not default backup scope
- avoid backing up reproducible runtime payloads and caches by default
- plan remote storage early for dev backups so node loss does not equal workspace loss
Code Server Addon
Status: Installed, running, browser-verified end-to-end.
Outstanding:
- code-server memory baseline
- decide whether default dev RAM should increase or become tier-based
- k6 IDE session load test
Game Server Supervision
Completed:
- crash recovery with backoff
- crash observability and classification
- unified readiness-aware start / restart path across manual start, restart, autostart, and supervisor recovery
- dead duplicate crash monitor removed
/readyendpoint added and operation/maintenance state surfaced through/status- guarded operation lock added for mutating / stateful flows
- console command endpoint hardened
- agent self-update rollback symlink handling corrected
Game Server Backups
Completed:
- first guarded Minecraft backup flow implemented in agent
- local backup create/list/restore endpoints added
- local backup delete endpoint added
- live backup uses
save-all flush->save-off-> archive ->save-on - restore stops server, waits for exit, restores manifest-declared paths, then restarts through readiness-aware path
- pre-restore checkpoint hardening implemented (Apr 16 2026)
- backup metadata fields added:
typeand optionalreason - manual backups record
type: "manual" - restore creates a local
checkpointbackup withreason: "pre_restore"before any destructive operation - checkpoint creation aborts the restore if it fails — live data never touched
- checkpoint creation disables pruning so safety backup is not immediately removed
- collision-safe backup IDs for same-second creation
POST /game/backups/restoreresponse now includesrestored,backup,checkpoint- full test coverage: checkpoint creation, abort-before-delete, manual metadata, listability, unsafe restore rejection
- backup metadata fields added:
Outstanding:
- remote backup storage / transfer path
- backup job history / progress beyond current operation state
- retention policy refinement beyond initial local pruning
- real-world validation on live Minecraft server
Fabric Readiness Gating
Status: startup rehydrate path fixed in plugin and both happy-path and negative-path startup validation passed.
Still outstanding:
- confirm whether any remaining agent-side registration path can surface a backend before readiness probe success
- see
SCRATCH/session-stabilization-fabric-findings.md
Agent Future Work (post-launch)
- Structured logging (slog) for Loki
- Dev container
provisioningCompletestate in/status - Graceful shutdown verification
- Process reattachment on agent restart
- SSH server install in dev container provisioning pipeline
- Long-running job model (job IDs, progress phases, cancel/retry)
- Typed platform-action wrappers over raw console commands
- Persistent operation recovery after agent restart
RestartServer()readiness probe bypass — fix or document
Dev IDE Access
Browser IDE
Status: fully working, browser-verified, zero-install.
Remaining:
- confirm "Open IDE" button in portal uses hosted URL in production path
- reduce legacy
/__ide/:idcompatibility paths once portal button confirmed - simplify and harden
devProxy
Local Dev Access — SSH via CF Tunnel
Current state:
- tunnel created and connected to bastion VM
- Zero Trust free plan active
- SSH hostname mapping not yet configured
- bastion SSH proxy jump config not yet done
- dev container SSH server not yet verified
- portal SSH config snippet not yet built
API (zpack-api)
Completed:
- dev provisioning payload
- runtime/version fields
- enable_code_server flag
- status endpoint
- IDE token generation + hosted URL
- bootstrap IDE route
- live tunnel proxy
- host-based routing for hosted IDE
- hosted flow browser-verified end-to-end
- backend lifecycle hardened
- duplicate server creation fixed
- console routing corrected
- control plane switched to IP-based service communication
- Velocity rehydration uses DB + Redis instead of Proxmox live state
- Stripe webhook delivery/reachability fixed via public billing hostname
- webhook-driven persistence of billing state (
subscriptionStatus,plan) - billing page/API alignment for active state and Stripe portal flow
- direct in-app plan upgrade endpoint (
/api/billing/upgrade) - direct in-app plan downgrade scheduling endpoint (
/api/billing/downgrade) - persisted billing fields for
currentPeriodEnd,lastInvoicePaidAt,billingSyncedAt - persisted scheduled downgrade state (
scheduledPlan,scheduledPlanEffectiveAt) - plan-based quota enforcement in
POST /api/instances - password reset request + confirm flow implemented
- agent contract updated for POST control actions,
/ready, operation state, and backup routes - agent transport consolidated into shared
agentClient.js - semantic readiness split implemented with shared
isAgentReadyResult() - API backup forwarding added for list / create / restore / delete
- agent
409conflict and readiness-oriented error handling preserved instead of collapsing to generic500 - Velocity routing made more conservative around missing readiness
Outstanding:
- simplify and harden host-native
devProxy - dev runtime catalog endpoint for portal
- Headscale auth key generation
- service discovery migration for remaining hot-path
internal.zlhreferences - normalize backup response shape now that portal is tolerating multiple field names
Portal (zpack-portal)
Completed:
- dev runtime dropdown
- dotnet runtime support
- enable code-server checkbox
- dev file browser support
- site copy rewrite
- pricing page updated
- billing page aligned with API v2 billing state
- honest Stripe portal section with single portal CTA
- in-app Basic → Pro upgrade wiring
- in-app Pro → Basic scheduled downgrade wiring
- quota/limit messaging on create flow with billing upgrade guidance
- forgot-password + reset-password pages and login linkage
- first-login onboarding modal with quick/full tour and skip
- dashboard IA refresh: spotlight server card replaces duplicate mini-listing
- operation / maintenance state surfaced in game server UI
- first backup UI added for list / create / restore
- backup delete UI added with destructive confirmation
- targeted
409/503messaging added for operation conflict and not-ready states - console command submission updated to POST JSON
- console page action gating fixed so stopped MC servers remain startable while console send stays gated to running + ready
Outstanding:
- confirm "Open IDE" button fully uses hosted URL flow
- SSH config snippet for power users
- email notifications
- remove
testdaemonbinary from repo root
Game Servers
Completed:
- first local Minecraft backup / restore flow wired end-to-end through agent, API, and portal
- manual local backup delete wired end-to-end through agent, API, and portal
- pre-restore checkpoint hardening complete in agent
Outstanding:
- remote storage for game server backups
- real-world backup/restore validation on live Minecraft server
- game server subdomain / player connection method verification
Velocity / ZpackVelocityBridge
Completed:
- startup rehydrate now requires
ready == true - stale default
zpack-api.internal.zlhfallback removed - explicit
ZPACK_REHYDRATE_ENDPOINTenv wiring validated - happy-path player routing validated live after restart
- negative-path startup validation passed: no backend registered when rehydrate returned zero eligible servers
Outstanding:
- confirm whether any remaining agent-side registration path can surface a backend before readiness probe success
- see
SCRATCH/velocity-plugin.md
Optional / Nonessential:
ZpackCommandsexist in code but are not currently needed operationally- prefer plugin HTTP status endpoint and host metrics over in-proxy admin commands
- if metrics coverage is sufficient, command registration can remain omitted or the command class can be removed later
Pre-Launch Checklist
Outstanding before launch:
- remote storage for game server backups
- real-world backup/restore validation on live Minecraft server
- game server subdomain verification
- email notifications
- upload testing
- billing endpoint/path cleanup verification
- stress testing: k6 IDE + Minecraft bot + code-server memory baseline
- OPNsense audit
- Fabric readiness gating full validation
- service discovery migration
- provisioning validation
- remove
testdaemonfromzpack-portal
Platform
Future work:
- CF Tunnel SSH completion
- artifact version promotion
- runtime rollback support
- Cloudflare R2 for large artifact/mod file delivery
- admin panel
- referral / dev pipeline reward system
- uptime history
- revisit DDoS mitigation later if needed
Closed Threads
- PTY console (dev + game)
- mod lifecycle
- upload pipeline
- runtime artifact installs
- dev container filesystem model
- code-server artifact fix
- API status endpoint for frontend agent-state consumption
- game server crash recovery with backoff
- crash observability
- code-server lifecycle endpoints
- code-server process detection
- dev IDE proxy
- hosted wildcard Traefik → API → container IDE flow
- per-container dev IDE edge publish/unpublish removed from API
- wildcard TLS cert
*.zerolaghub.dev - browser IDE fully loading at
dev-<vmid>.zerolaghub.dev - CF Tunnel created and connected to bastion VM
- portal copy rewrite
- DDoS investigation
- hosting provider decision: GTHost Detroit
- migration to Detroit complete
- system stabilization after migration
- IP-based control plane
- Velocity startup rehydrate fixed and validated on happy path
- billing / Stripe webhook delivery and persistence
- password reset flow
- usage limits / quota enforcement
- user onboarding flow
- dashboard spotlight server IA refresh
- pre-restore checkpoint hardening (agent-side, Apr 16 2026)