Update bootstrap to reflect Apr 2026 progress — billing, resolved blockers, portal, runtime, observability

This commit is contained in:
jester 2026-04-30 21:35:13 +00:00
parent 5d0e3c8d27
commit 9f9b88b662

View File

@ -1,9 +1,9 @@
# ZeroLagHub — Master Bootstrap (April 2026) # ZeroLagHub — Master Bootstrap (April 2026)
**For**: Claude (strategic/architecture sessions) **For**: Claude (strategic/architecture sessions)
**Last Updated**: April 7, 2026 **Last Updated**: April 30, 2026
**Supersedes**: ZeroLagHub_Master_Bootstrap_Dec2025.md **Supersedes**: Previous April 7, 2026 version
**Source refs**: zlh-grind/SCRATCH/handover-apr-2026.md, zlh-grind/PROJECT_CONTEXT.md, zlh-grind/OPEN_THREADS.md **Source refs**: zlh-grind/SCRATCH/handover-apr-2026.md, zlh-grind/OPEN_THREADS.md, zlh-grind/SCRATCH/billing-stripe-handover-apr11-2026.md
--- ---
@ -20,20 +20,21 @@
1. Read this file 1. Read this file
2. Read `ZeroLagHub_Cross_Project_Tracker.md` (locked decisions + boundaries) 2. Read `ZeroLagHub_Cross_Project_Tracker.md` (locked decisions + boundaries)
3. Reference `INFRASTRUCTURE.md` in `zlh-grind` for current IPs/VMs 3. Reference `INFRASTRUCTURE.md` in `zlh-grind` for current IPs/VMs
4. Read `zlh-grind/OPEN_THREADS.md` for current active work
--- ---
## What ZeroLagHub Is ## What ZeroLagHub Is
Game server hosting platform targeting modded and indie Minecraft. Build-and-run platform for developers, modders, and game communities. Browser-based dev environments + managed game servers + developer-to-player pipeline in one platform.
**Competitive advantages**: **Competitive advantages**:
- LXC containers (20-30% performance over Docker) - LXC containers (20-30% performance over Docker) — system-container infrastructure, not Docker overhead
- Custom Go agent architecture — purpose-built, not adapted generic tooling - Custom Go agent architecture — purpose-built, not adapted generic tooling
- Open-source stack (30-40% cost advantage) - Open-source stack (30-40% cost advantage)
- Developer-to-player pipeline: mod devs become a distribution channel - Developer-to-player pipeline: mod devs become a distribution channel
**System posture as of Apr 2026**: Stable. Controlled expansion phase. Detroit migration complete. **System posture as of Apr 30, 2026**: Stable. Pre-launch validation phase. Most core workflows verified end-to-end.
--- ---
@ -79,7 +80,7 @@ Supermicro 2029TP-HTR, Xeon Gold 6152 22c/44t, 192GB DDR4, 2×1.92TB SSD, $99/mo
**API (zpack-api, VM 9020)**: Node.js ESM, Express 5, Prisma 6, MariaDB, Redis, BullMQ, JWT, Stripe, argon2, ssh2, WebSocket, http-proxy-middleware **API (zpack-api, VM 9020)**: Node.js ESM, Express 5, Prisma 6, MariaDB, Redis, BullMQ, JWT, Stripe, argon2, ssh2, WebSocket, http-proxy-middleware
**Portal (zpack-portal, VM 9021)**: Next.js 15, TypeScript, TailwindCSS, Axios, WebSocket console. Sci-fi HUD aesthetic (steel textures, neon accents, beveled panels). **Portal (zpack-portal, VM 9021)**: Next.js 15, TypeScript, TailwindCSS, Axios, WebSocket console. Hybrid marketing + SaaS structure with SEO landing pages. Pricing tiers: Starter / Pro / Performance.
**Agent (zlh-agent)**: Go 1.21, stdlib HTTP, creack/pty, gorilla/websocket. Runs inside every game/dev container. Only process with direct filesystem access. Pulls runtimes + server jars from zlh-artifacts (VM 9014). **Agent (zlh-agent)**: Go 1.21, stdlib HTTP, creack/pty, gorilla/websocket. Runs inside every game/dev container. Only process with direct filesystem access. Pulls runtimes + server jars from zlh-artifacts (VM 9014).
@ -135,6 +136,11 @@ Flow: frontend calls `POST /api/dev/:id/ide-token` → API returns hosted URL
**Pipeline**: Valheim, Palworld, Vintage Story, Core Keeper **Pipeline**: Valheim, Palworld, Vintage Story, Core Keeper
**Target market**: Modded/indie games vs mainstream providers **Target market**: Modded/indie games vs mainstream providers
### Minecraft Runtime Split (Verified)
- `vanilla` = Fabric-based internal profile with proxy/API/config injection (FabricProxy-Lite pre-seeded)
- `fabric` = plain Fabric jar delivery only
- Forge/NeoForge first-start flow avoids premature readiness gating, applies post-start property enforcement, restarts through readiness-aware path
--- ---
## Developer-to-Player Revenue Pipeline ## Developer-to-Player Revenue Pipeline
@ -152,19 +158,34 @@ Revenue multiplier: 1 developer → ~10 players → $147.50/mo total from one de
--- ---
## Current System State (April 7, 2026) ## Current System State (April 30, 2026)
**System is stable and deterministic** following stabilization work this week. **System is stable and pre-launch validated** across core workflows.
### Resolved ### Resolved since last bootstrap (Apr 7)
- ✅ Duplicate server creation (was frontend issue, not API)
- ✅ DB/Redis state drift from Denver→Detroit migration
- ✅ Console routing (browser → API → agent)
- ✅ IP-based control plane (internal.zlh removed from hot paths)
- ✅ Velocity rehydration (uses DB + Redis, not Proxmox live state)
- ✅ Denver fully decommissioned
### Active Issue: Fabric Readiness Gating (agent fix needed) - ✅ Password reset — 5-minute tokens, hashed at rest, single-use, old tokens invalidated on deploy
- ✅ Billing / Stripe — checkout flow works, customer creation works, Stripe sandbox confirmed, Portal billing UI working. **Remaining**: webhook delivery (Stripe cannot reach API — not yet publicly exposed). Once webhook is live, billing is functionally complete.
- ✅ Vanilla/Fabric runtime split restored and validated
- ✅ Forge/NeoForge first-start flow correct end-to-end
- ✅ Delete/teardown lifecycle removes Velocity, Cloudflare, and Technitium records
- ✅ Portal consumes API-owned `connectable`/`connection` state — no longer infers Minecraft readiness itself
- ✅ Velocity proxy lifecycle callbacks live with `registered_with_proxy` and `proxy_ping_ok` in API state
- ✅ Portal status labels fixed — non-connectable states no longer all show "Needs attention"
- ✅ Portal server creation redirects to `/servers` and tracks setup progress there
- ✅ Portal public marketing site: hybrid SEO structure, Starter/Pro/Performance pricing tiers, root metadata fixed, hero copy fixed, fake CLI line removed
- ✅ SEO landing pages added: `/minecraft-server-hosting`, `/modded-minecraft-hosting`, `/browser-dev-environment`
- ✅ Local Minecraft backup create/restore verified live end-to-end
- ✅ Restore creates intentional pre-restore checkpoint; API starts restore asynchronously
- ✅ Backup timestamps normalized; pre-restore checkpoints filtered from default backup list
- ✅ Vanilla datapack upload works; direct vanilla `mods/` upload rejected by API
- ✅ NeoForge mod search/install/list works
- ✅ Agent-backed file edits create shadow copies for revert; API route/stream forwarding issues fixed
- ✅ Public exposure model in place: Portal public, control plane private
- ✅ Dev container creation succeeds; hosted IDE access verified post-cleanup passes
- ✅ Minecraft server creation succeeds across supported runtime variants
### Still Active: Fabric Readiness Gating (agent fix needed)
**Root cause**: Velocity registration fires before Fabric server is ready to accept proxy traffic → "proxy starting" errors until Velocity restart. **Root cause**: Velocity registration fires before Fabric server is ready to accept proxy traffic → "proxy starting" errors until Velocity restart.
**Fix**: Gate Velocity registration behind TCP probe success (port 25565). Agent already runs probe — registration must fire after probe returns success, not at process start. **Fix**: Gate Velocity registration behind TCP probe success (port 25565). Agent already runs probe — registration must fire after probe returns success, not at process start.
**Scope**: Agent only. No Velocity or FabricProxy-Lite changes needed. **Scope**: Agent only. No Velocity or FabricProxy-Lite changes needed.
@ -176,19 +197,20 @@ Revenue multiplier: 1 developer → ~10 players → $147.50/mo total from one de
In priority order: In priority order:
1. **Billing / Stripe** — cannot take payments, hard launch blocker 1. **Billing webhook delivery** — Stripe cannot POST to `/api/billing/webhook` (API not publicly exposed). Checkout works, customer/subscription created in Stripe, but `subscriptionStatus` and `plan` in DB not updating. Fix: expose webhook endpoint via public domain with HTTPS, or use Stripe CLI forwarding for dev testing. Ref: `zlh-grind/SCRATCH/billing-stripe-handover-apr11-2026.md`
2. **Game server world backup/restore** — trust-critical, player retention risk 2. **Fabric readiness gating** — agent TCP probe gate (see above)
3. **User onboarding flow** — guided first-server creation after registration 3. **User onboarding flow** — guided first-server creation after registration
4. **Fabric readiness gating** — agent TCP probe gate (see above) 4. **Usage limits / quota enforcement** — prevent unbounded server creation
5. **Password reset flow** — verify fully wired 5. **Email notifications** — crash, billing, provisioning complete
6. **Usage limits / quota enforcement** — prevent unbounded server creation 6. **Upload testing** — end-to-end verification in dev containers
7. **Email notifications** — crash, billing, provisioning complete 7. **Stress testing** — k6 IDE session load + Minecraft bot + code-server memory baseline
8. **Upload testing** — end-to-end in dev containers 8. **OPNsense audit** — both routers need systematic validation
9. **Billing endpoints** — add back to API 9. **Service discovery migration** — remaining non-hot-path internal.zlh refs
10. **Stress testing** — k6 IDE session load + Minecraft bot + code-server memory baseline 10. **Provisioning validation** — single controlled creation, confirm 1 record / 1 job / 1 execution
11. **OPNsense audit** — both routers need systematic validation 11. **Final smoke test** — full lifecycle: create → ready → connectable → backup → restore → stop/start/restart → delete. Confirm Velocity unregister, Cloudflare cleanup, Technitium cleanup.
12. **Service discovery migration** — replace remaining internal.zlh refs in hot paths 12. **Portal public-site QA** — desktop + mobile layouts, CTA routing, metadata verification. Mobile not yet optimized.
13. **Provisioning validation** — single controlled creation, confirm 1 record / 1 job / 1 execution 13. **Monitoring / observability** — normalize game/dev Alloy label contract across API discovery, agent-written labels, Prometheus targets, and Grafana dashboards. Finish template cleanup (remove node-exporter, keep Alloy).
14. **Billing Portal UI polish**`trialing` state handling, upgrade/downgrade flow, plan limit gating in Portal
--- ---
@ -199,8 +221,8 @@ In priority order:
| `jester/knowledge-base` | Claude's home — architecture, strategy, canonical decisions | | `jester/knowledge-base` | Claude's home — architecture, strategy, canonical decisions |
| `jester/zlh-grind` | GPT's workspace — execution continuity, session handovers, debug notes | | `jester/zlh-grind` | GPT's workspace — execution continuity, session handovers, debug notes |
| `jester/zlh-docs` | API/agent/portal operational reference docs | | `jester/zlh-docs` | API/agent/portal operational reference docs |
| `jester/zpac-api` | API source (mirror) | | `jester/zpac-api` | API source |
| `jester/zpac-portal` | Portal source (mirror) | | `jester/zpac-portal` | Portal source |
| `jester/zlh-agent` | Agent source (Go) | | `jester/zlh-agent` | Agent source (Go) |
--- ---
@ -214,3 +236,4 @@ In priority order:
- `zlh-grind/SCRATCH/` has detailed debug docs for specific topics - `zlh-grind/SCRATCH/` has detailed debug docs for specific topics
- Do not mark unimplemented work as complete - Do not mark unimplemented work as complete
- Game publish flow must never be modified by dev routing changes - Game publish flow must never be modified by dev routing changes
- Synthesis of Codex review outputs belongs here, not in Codex