knowledge-base/ZeroLagHub_Master_Bootstrap_Apr2026.md

12 KiB
Raw Blame History

ZeroLagHub — Master Bootstrap (April 2026)

For: Claude (strategic/architecture sessions)
Last Updated: April 30, 2026
Supersedes: Previous April 7, 2026 version
Source refs: zlh-grind/SCRATCH/handover-apr-2026.md, zlh-grind/OPEN_THREADS.md, zlh-grind/SCRATCH/billing-stripe-handover-apr11-2026.md


AI Role Split

AI Role Workspace
Claude Architecture, strategy, design decisions, cross-cutting concerns jester/knowledge-base
GPT (Ceàrd) Implementation, code changes, verification, session continuity jester/zlh-grind

Hard rule: GPT must not make architecture decisions. If architectural interpretation is required, stop and defer to Claude or canonical docs.

Claude's session start checklist:

  1. Read this file
  2. Read ZeroLagHub_Cross_Project_Tracker.md (locked decisions + boundaries)
  3. Reference INFRASTRUCTURE.md in zlh-grind for current IPs/VMs
  4. Read zlh-grind/OPEN_THREADS.md for current active work

What ZeroLagHub Is

Build-and-run platform for developers, modders, and game communities. Browser-based dev environments + managed game servers + developer-to-player pipeline in one platform.

Competitive advantages:

  • LXC containers (20-30% performance over Docker) — system-container infrastructure, not Docker overhead
  • Custom Go agent architecture — purpose-built, not adapted generic tooling
  • Open-source stack (30-40% cost advantage)
  • Developer-to-player pipeline: mod devs become a distribution channel

System posture as of Apr 30, 2026: Stable. Pre-launch validation phase. Most core workflows verified end-to-end.


Infrastructure

Single host: GTHost Detroit
Supermicro 2029TP-HTR, Xeon Gold 6152 22c/44t, 192GB DDR4, 2×1.92TB SSD, $99/mo, Proxmox VE 9.x

Denver host: Fully decommissioned April 2, 2026. OS wiped, disks striped.

Active VMs (all 9000s range — authoritative in zlh-grind/INFRASTRUCTURE.md)

VM Name Role
9001 zlh-router OPNsense, WAN 66.163.115.221
9002 zpack-router OPNsense, WAN 66.163.115.115
9010 zpack-dns Technitium DNS
9011 zlh-proxy Caddy reverse proxy
9012 zpack-proxy Traefik v3 — game/dev edge routing + wildcard TLS
9014 zlh-artifacts Caddy file server — runtime binaries + server jars
9015 zpack-velocity Velocity 3.5 — Minecraft proxy
9017 zlh-back PBS backup + Backblaze B2
9020 zpack-api Node.js API, MariaDB, Redis
9021 zpack-portal Next.js frontend

LXC containers:

  • Game containers: IDs 5000+
  • Dev containers: IDs 6000+
  • Base template: ID 820 (zlh-base)

Legacy VMs (NOT active production): 100 (zlh-panel), 101 (zlh-wings), 103 (zlh-api), 1000 (zlh-router)


Naming Conventions

  • zlh-* = core infrastructure (DNS, monitoring, backup, routing, artifacts)
  • zpack-* = game and dev server stack (portal, API, containers)

Stack

API (zpack-api, VM 9020): Node.js ESM, Express 5, Prisma 6, MariaDB, Redis, BullMQ, JWT, Stripe, argon2, ssh2, WebSocket, http-proxy-middleware

Portal (zpack-portal, VM 9021): Next.js 15, TypeScript, TailwindCSS, Axios, WebSocket console. Hybrid marketing + SaaS structure with SEO landing pages. Pricing tiers: Starter / Pro / Performance.

Agent (zlh-agent): Go 1.21, stdlib HTTP, creack/pty, gorilla/websocket. Runs inside every game/dev container. Only process with direct filesystem access. Pulls runtimes + server jars from zlh-artifacts (VM 9014).


Architecture Rules (Locked)

These cannot be changed without an explicit "Revisit decision X" conversation with Claude.

# Decision
DEC-001 Templates + Agent hybrid (not templates-only or agent-only)
DEC-002 API orchestrates, Agent executes (never reversed)
DEC-003 API owns DNS (Agent never creates DNS records)
DEC-004 Traefik + Velocity only (no HAProxy)
DEC-005 MariaDB is source of truth (no flat files)
DEC-006 Frontend → API only (no direct agent calls)
DEC-007 Drift prevention mandatory
DEC-008 Control plane uses IP env vars, not internal.zlh DNS in hot paths
DEC-009 Redis never restored from backup — always start clean
DEC-010 Vanilla Minecraft servers use Fabric loader + FabricProxy-Lite (not vanilla JAR)
DEC-011 Velocity forwarding mode = modern (not none — breaks UUID stability)

Control Plane Architecture

Browser
  → Traefik (zpack-proxy, 10.70.0.242) — TLS termination, domain routing
  → API (10.60.0.245:4000) — orchestration, auth, state
  → Agent (:18888) — container execution (game + dev)
  • Agent is HTTP server on :18888, internal only. API is the only caller.
  • Portal never calls agents directly.
  • Service-to-service communication uses IP env vars, not DNS FQDNs.
  • Upload transport: raw http.request piping (req.pipe(proxyReq)), never fetch().

Dev IDE Access (Browser — Current Working Model)

Browser → dev-<vmid>.zerolaghub.dev → Traefik → API → container:6000

Flow: frontend calls POST /api/dev/:id/ide-token → API returns hosted URL → browser opens → Traefik wildcard routes to API → API validates token, sets HTTP-only cookie, proxies to code-server. Browser-verified end-to-end.


Game Support

Production: Minecraft (Vanilla/Fabric/Paper/Forge/Neoforge), Rust, Terraria, Project Zomboid
Pipeline: Valheim, Palworld, Vintage Story, Core Keeper
Target market: Modded/indie games vs mainstream providers

Minecraft Runtime Split (Verified)

  • vanilla = Fabric-based internal profile with proxy/API/config injection (FabricProxy-Lite pre-seeded)
  • fabric = plain Fabric jar delivery only
  • Forge/NeoForge first-start flow avoids premature readiness gating, applies post-start property enforcement, restarts through readiness-aware path

Developer-to-Player Revenue Pipeline

LXC Dev Environment ($15-40/mo)
  → Game/mod creation + testing
  → Testing servers (50% dev discount)
  → Player community referrals (25% player discount)
  → Developer revenue share (5-10% commission)
  → Viral growth

Revenue multiplier: 1 developer → ~10 players → $147.50/mo total from one developer acquisition.


Current System State (April 30, 2026)

System is stable and pre-launch validated across core workflows.

Resolved since last bootstrap (Apr 7)

  • Password reset — 5-minute tokens, hashed at rest, single-use, old tokens invalidated on deploy
  • Billing / Stripe — checkout flow works, customer creation works, Stripe sandbox confirmed, Portal billing UI working. Remaining: webhook delivery (Stripe cannot reach API — not yet publicly exposed). Once webhook is live, billing is functionally complete.
  • Vanilla/Fabric runtime split restored and validated
  • Forge/NeoForge first-start flow correct end-to-end
  • Delete/teardown lifecycle removes Velocity, Cloudflare, and Technitium records
  • Portal consumes API-owned connectable/connection state — no longer infers Minecraft readiness itself
  • Velocity proxy lifecycle callbacks live with registered_with_proxy and proxy_ping_ok in API state
  • Portal status labels fixed — non-connectable states no longer all show "Needs attention"
  • Portal server creation redirects to /servers and tracks setup progress there
  • Portal public marketing site: hybrid SEO structure, Starter/Pro/Performance pricing tiers, root metadata fixed, hero copy fixed, fake CLI line removed
  • SEO landing pages added: /minecraft-server-hosting, /modded-minecraft-hosting, /browser-dev-environment
  • Local Minecraft backup create/restore verified live end-to-end
  • Restore creates intentional pre-restore checkpoint; API starts restore asynchronously
  • Backup timestamps normalized; pre-restore checkpoints filtered from default backup list
  • Vanilla datapack upload works; direct vanilla mods/ upload rejected by API
  • NeoForge mod search/install/list works
  • Agent-backed file edits create shadow copies for revert; API route/stream forwarding issues fixed
  • Public exposure model in place: Portal public, control plane private
  • Dev container creation succeeds; hosted IDE access verified post-cleanup passes
  • Minecraft server creation succeeds across supported runtime variants

Still Active: Fabric Readiness Gating (agent fix needed)

Root cause: Velocity registration fires before Fabric server is ready to accept proxy traffic → "proxy starting" errors until Velocity restart.
Fix: Gate Velocity registration behind TCP probe success (port 25565). Agent already runs probe — registration must fire after probe returns success, not at process start.
Scope: Agent only. No Velocity or FabricProxy-Lite changes needed.
Ref: zlh-grind/SCRATCH/session-stabilization-fabric-findings.md


Pre-Launch Blockers

In priority order:

  1. Billing webhook delivery — Stripe cannot POST to /api/billing/webhook (API not publicly exposed). Checkout works, customer/subscription created in Stripe, but subscriptionStatus and plan in DB not updating. Fix: expose webhook endpoint via public domain with HTTPS, or use Stripe CLI forwarding for dev testing. Ref: zlh-grind/SCRATCH/billing-stripe-handover-apr11-2026.md
  2. Fabric readiness gating — agent TCP probe gate (see above)
  3. User onboarding flow — guided first-server creation after registration
  4. Usage limits / quota enforcement — prevent unbounded server creation
  5. Email notifications — crash, billing, provisioning complete
  6. Upload testing — end-to-end verification in dev containers
  7. Stress testing — k6 IDE session load + Minecraft bot + code-server memory baseline
  8. OPNsense audit — both routers need systematic validation
  9. Service discovery migration — remaining non-hot-path internal.zlh refs
  10. Provisioning validation — single controlled creation, confirm 1 record / 1 job / 1 execution
  11. Final smoke test — full lifecycle: create → ready → connectable → backup → restore → stop/start/restart → delete. Confirm Velocity unregister, Cloudflare cleanup, Technitium cleanup.
  12. Portal public-site QA — desktop + mobile layouts, CTA routing, metadata verification. Mobile not yet optimized.
  13. Monitoring / observability — normalize game/dev Alloy label contract across API discovery, agent-written labels, Prometheus targets, and Grafana dashboards. Finish template cleanup (remove node-exporter, keep Alloy).
  14. Billing Portal UI polishtrialing state handling, upgrade/downgrade flow, plan limit gating in Portal

Repo Registry

Repo Purpose
jester/knowledge-base Claude's home — architecture, strategy, canonical decisions
jester/zlh-grind GPT's workspace — execution continuity, session handovers, debug notes
jester/zlh-docs API/agent/portal operational reference docs
jester/zpac-api API source
jester/zpac-portal Portal source
jester/zlh-agent Agent source (Go)

Session Guidance for Claude

  • Architecture decisions belong here — if GPT hits an architectural question, it should stop and bring it to Claude
  • zlh-grind is GPT's execution ledger, not an architecture source
  • zlh-grind/INFRASTRUCTURE.md is authoritative for current IPs and VM inventory
  • zlh-grind/OPEN_THREADS.md has full active + outstanding work
  • zlh-grind/SCRATCH/ has detailed debug docs for specific topics
  • Do not mark unimplemented work as complete
  • Game publish flow must never be modified by dev routing changes
  • Synthesis of Codex review outputs belongs here, not in Codex