zlh-grind/OPEN_THREADS.md

232 lines
6.9 KiB
Markdown

# Open Threads — zlh-grind
This file tracks active but unfinished work.
Keep it short.
---
## Agent (zlh-agent)
### Dev Runtime System
Completed:
- catalog validation implemented
- runtime installs artifact-backed
- install guard implemented
- all installs now fetch from artifact server
Outstanding:
- runtime install verification improvements
- catalog hash validation
- runtime removal / upgrade handling
- runtime update process for dev containers
### Dev Environment
Completed:
- dev user creation
- workspace root `/home/dev/workspace`
- console runs as dev user
- `HOME`, `USER`, `LOGNAME`, `TERM` env vars set correctly
Outstanding:
- PATH normalization
- shell profile consistency
- runtime PATH injection
### Code Server Addon
Status: Installed, running, browser-verified end-to-end.
Outstanding:
- code-server memory baseline
- decide whether default dev RAM should increase or become tier-based
- k6 IDE session load test
### Game Server Supervision
Completed:
- crash recovery with backoff
- crash observability and classification
### Fabric Readiness Gating
Status: startup rehydrate path fixed in plugin and both happy-path and negative-path startup validation passed.
Still outstanding:
- confirm whether any remaining agent-side registration path can surface a backend before readiness probe success
- see `SCRATCH/session-stabilization-fabric-findings.md`
### Agent Future Work
1. Structured logging (slog) for Loki
2. Dev container `provisioningComplete` state in `/status`
3. Graceful shutdown verification
4. Process reattachment on agent restart
5. SSH server install in dev container provisioning pipeline
---
## Dev IDE Access
### Browser IDE
Status: fully working, browser-verified, zero-install.
Remaining:
- confirm "Open IDE" button in portal uses hosted URL in production path
- reduce legacy `/__ide/:id` compatibility paths once portal button confirmed
- simplify and harden `devProxy`
### Local Dev Access — SSH via CF Tunnel
Current state:
- tunnel created and connected to bastion VM
- Zero Trust free plan active
- SSH hostname mapping not yet configured
- bastion SSH proxy jump config not yet done
- dev container SSH server not yet verified
- portal SSH config snippet not yet built
---
## API (zpack-api)
Completed:
- dev provisioning payload
- runtime/version fields
- enable_code_server flag
- status endpoint
- IDE token generation + hosted URL
- bootstrap IDE route
- live tunnel proxy
- host-based routing for hosted IDE
- hosted flow browser-verified end-to-end
- backend lifecycle hardened
- duplicate server creation fixed
- console routing corrected
- control plane switched to IP-based service communication
- Velocity rehydration uses DB + Redis instead of Proxmox live state
- billing foundation added to Prisma/User model (`stripeCustomerId`, `subscriptionStatus`, `plan`)
- migration history drift reconciled cleanly without DB reset
- Stripe sandbox checkout flow now reaches Stripe successfully for a normal test user
- Stripe customer ID persistence works
Outstanding:
- Stripe webhook delivery/reachability for non-public API path
- webhook-driven persistence of `subscriptionStatus` and `plan`
- password reset flow verification
- usage limits / quota enforcement
- simplify and harden host-native `devProxy`
- dev runtime catalog endpoint for portal
- Headscale auth key generation
- service discovery migration for remaining hot-path `internal.zlh` references
---
## Portal (zpack-portal)
Completed:
- dev runtime dropdown
- dotnet runtime support
- enable code-server checkbox
- dev file browser support
- site copy rewrite
- pricing page updated
- billing page/auth compatibility work started for API v2 billing state
Outstanding:
- confirm "Open IDE" button fully uses hosted URL flow
- SSH config snippet for power users
- user onboarding flow
- email notifications
- finish portal billing page alignment with API v2 (`/api/billing`, checkout redirect, billing-exempt/admin handling, trialing/active states)
- dashboard information architecture refresh — remove or rethink duplicate resource overview/cards
- implement "spotlight server" carousel on dashboard (rotating single server card with status/players, click-through to console)
- remove `testdaemon` binary from repo root
---
## Game Servers
Outstanding:
- game server world backup / restore
- game server subdomain / player connection method verification
---
## Velocity / ZpackVelocityBridge
Completed:
- startup rehydrate now requires `ready == true`
- stale default `zpack-api.internal.zlh` fallback removed
- explicit `ZPACK_REHYDRATE_ENDPOINT` env wiring validated
- happy-path player routing validated live after restart
- negative-path startup validation passed: no backend registered when rehydrate returned zero eligible servers
Outstanding:
- confirm whether any remaining agent-side registration path can surface a backend before readiness probe success
- see `SCRATCH/velocity-plugin.md`
Optional / Nonessential:
- `ZpackCommands` exist in code but are not currently needed operationally
- prefer plugin HTTP status endpoint and host metrics over in-proxy admin commands
- if metrics coverage is sufficient, command registration can remain omitted or the command class can be removed later
---
## Pre-Launch Checklist
Outstanding before launch:
- Billing / Stripe integration (webhook delivery + state persistence still pending)
- game server world backup / restore
- user onboarding flow
- password reset flow verification
- usage limits / quota enforcement
- game server subdomain verification
- email notifications
- upload testing
- billing endpoints
- stress testing: k6 IDE + Minecraft bot + code-server memory baseline
- OPNsense audit
- Fabric readiness gating full validation
- service discovery migration
- provisioning validation
- remove `testdaemon` from `zpack-portal`
---
## Platform
Future work:
- CF Tunnel SSH completion
- artifact version promotion
- runtime rollback support
- Cloudflare R2 for large artifact/mod file delivery
- admin panel
- referral / dev pipeline reward system
- uptime history
- revisit DDoS mitigation later if needed
---
## Closed Threads
- PTY console (dev + game)
- mod lifecycle
- upload pipeline
- runtime artifact installs
- dev container filesystem model
- code-server artifact fix
- API status endpoint for frontend agent-state consumption
- game server crash recovery with backoff
- crash observability
- code-server lifecycle endpoints
- code-server process detection
- dev IDE proxy
- hosted wildcard Traefik → API → container IDE flow
- per-container dev IDE edge publish/unpublish removed from API
- wildcard TLS cert `*.zerolaghub.dev`
- browser IDE fully loading at `dev-<vmid>.zerolaghub.dev`
- CF Tunnel created and connected to bastion VM
- portal copy rewrite
- DDoS investigation
- hosting provider decision: GTHost Detroit
- migration to Detroit complete
- system stabilization after migration
- IP-based control plane
- Velocity startup rehydrate fixed and validated on happy path