knowledge-base/ZeroLagHub_Master_Bootstrap_Apr2026.md

8.5 KiB
Raw Blame History

ZeroLagHub — Master Bootstrap (April 2026)

For: Claude (strategic/architecture sessions)
Last Updated: April 7, 2026
Supersedes: ZeroLagHub_Master_Bootstrap_Dec2025.md
Source refs: zlh-grind/SCRATCH/handover-apr-2026.md, zlh-grind/PROJECT_CONTEXT.md, zlh-grind/OPEN_THREADS.md


AI Role Split

AI Role Workspace
Claude Architecture, strategy, design decisions, cross-cutting concerns jester/knowledge-base
GPT (Ceàrd) Implementation, code changes, verification, session continuity jester/zlh-grind

Hard rule: GPT must not make architecture decisions. If architectural interpretation is required, stop and defer to Claude or canonical docs.

Claude's session start checklist:

  1. Read this file
  2. Read ZeroLagHub_Cross_Project_Tracker.md (locked decisions + boundaries)
  3. Reference INFRASTRUCTURE.md in zlh-grind for current IPs/VMs

What ZeroLagHub Is

Game server hosting platform targeting modded and indie Minecraft.

Competitive advantages:

  • LXC containers (20-30% performance over Docker)
  • Custom Go agent architecture — purpose-built, not adapted generic tooling
  • Open-source stack (30-40% cost advantage)
  • Developer-to-player pipeline: mod devs become a distribution channel

System posture as of Apr 2026: Stable. Controlled expansion phase. Detroit migration complete.


Infrastructure

Single host: GTHost Detroit
Supermicro 2029TP-HTR, Xeon Gold 6152 22c/44t, 192GB DDR4, 2×1.92TB SSD, $99/mo, Proxmox VE 9.x

Denver host: Fully decommissioned April 2, 2026. OS wiped, disks striped.

Active VMs (all 9000s range — authoritative in zlh-grind/INFRASTRUCTURE.md)

VM Name Role
9001 zlh-router OPNsense, WAN 66.163.115.221
9002 zpack-router OPNsense, WAN 66.163.115.115
9010 zpack-dns Technitium DNS
9011 zlh-proxy Caddy reverse proxy
9012 zpack-proxy Traefik v3 — game/dev edge routing + wildcard TLS
9014 zlh-artifacts Caddy file server — runtime binaries + server jars
9015 zpack-velocity Velocity 3.5 — Minecraft proxy
9017 zlh-back PBS backup + Backblaze B2
9020 zpack-api Node.js API, MariaDB, Redis
9021 zpack-portal Next.js frontend

LXC containers:

  • Game containers: IDs 5000+
  • Dev containers: IDs 6000+
  • Base template: ID 820 (zlh-base)

Legacy VMs (NOT active production): 100 (zlh-panel), 101 (zlh-wings), 103 (zlh-api), 1000 (zlh-router)


Naming Conventions

  • zlh-* = core infrastructure (DNS, monitoring, backup, routing, artifacts)
  • zpack-* = game and dev server stack (portal, API, containers)

Stack

API (zpack-api, VM 9020): Node.js ESM, Express 5, Prisma 6, MariaDB, Redis, BullMQ, JWT, Stripe, argon2, ssh2, WebSocket, http-proxy-middleware

Portal (zpack-portal, VM 9021): Next.js 15, TypeScript, TailwindCSS, Axios, WebSocket console. Sci-fi HUD aesthetic (steel textures, neon accents, beveled panels).

Agent (zlh-agent): Go 1.21, stdlib HTTP, creack/pty, gorilla/websocket. Runs inside every game/dev container. Only process with direct filesystem access. Pulls runtimes + server jars from zlh-artifacts (VM 9014).


Architecture Rules (Locked)

These cannot be changed without an explicit "Revisit decision X" conversation with Claude.

# Decision
DEC-001 Templates + Agent hybrid (not templates-only or agent-only)
DEC-002 API orchestrates, Agent executes (never reversed)
DEC-003 API owns DNS (Agent never creates DNS records)
DEC-004 Traefik + Velocity only (no HAProxy)
DEC-005 MariaDB is source of truth (no flat files)
DEC-006 Frontend → API only (no direct agent calls)
DEC-007 Drift prevention mandatory
DEC-008 Control plane uses IP env vars, not internal.zlh DNS in hot paths
DEC-009 Redis never restored from backup — always start clean
DEC-010 Vanilla Minecraft servers use Fabric loader + FabricProxy-Lite (not vanilla JAR)
DEC-011 Velocity forwarding mode = modern (not none — breaks UUID stability)

Control Plane Architecture

Browser
  → Traefik (zpack-proxy, 10.70.0.242) — TLS termination, domain routing
  → API (10.60.0.245:4000) — orchestration, auth, state
  → Agent (:18888) — container execution (game + dev)
  • Agent is HTTP server on :18888, internal only. API is the only caller.
  • Portal never calls agents directly.
  • Service-to-service communication uses IP env vars, not DNS FQDNs.
  • Upload transport: raw http.request piping (req.pipe(proxyReq)), never fetch().

Dev IDE Access (Browser — Current Working Model)

Browser → dev-<vmid>.zerolaghub.dev → Traefik → API → container:6000

Flow: frontend calls POST /api/dev/:id/ide-token → API returns hosted URL → browser opens → Traefik wildcard routes to API → API validates token, sets HTTP-only cookie, proxies to code-server. Browser-verified end-to-end.


Game Support

Production: Minecraft (Vanilla/Fabric/Paper/Forge/Neoforge), Rust, Terraria, Project Zomboid
Pipeline: Valheim, Palworld, Vintage Story, Core Keeper
Target market: Modded/indie games vs mainstream providers


Developer-to-Player Revenue Pipeline

LXC Dev Environment ($15-40/mo)
  → Game/mod creation + testing
  → Testing servers (50% dev discount)
  → Player community referrals (25% player discount)
  → Developer revenue share (5-10% commission)
  → Viral growth

Revenue multiplier: 1 developer → ~10 players → $147.50/mo total from one developer acquisition.


Current System State (April 7, 2026)

System is stable and deterministic following stabilization work this week.

Resolved

  • Duplicate server creation (was frontend issue, not API)
  • DB/Redis state drift from Denver→Detroit migration
  • Console routing (browser → API → agent)
  • IP-based control plane (internal.zlh removed from hot paths)
  • Velocity rehydration (uses DB + Redis, not Proxmox live state)
  • Denver fully decommissioned

Active Issue: Fabric Readiness Gating (agent fix needed)

Root cause: Velocity registration fires before Fabric server is ready to accept proxy traffic → "proxy starting" errors until Velocity restart.
Fix: Gate Velocity registration behind TCP probe success (port 25565). Agent already runs probe — registration must fire after probe returns success, not at process start.
Scope: Agent only. No Velocity or FabricProxy-Lite changes needed.
Ref: zlh-grind/SCRATCH/session-stabilization-fabric-findings.md


Pre-Launch Blockers

In priority order:

  1. Billing / Stripe — cannot take payments, hard launch blocker
  2. Game server world backup/restore — trust-critical, player retention risk
  3. User onboarding flow — guided first-server creation after registration
  4. Fabric readiness gating — agent TCP probe gate (see above)
  5. Password reset flow — verify fully wired
  6. Usage limits / quota enforcement — prevent unbounded server creation
  7. Email notifications — crash, billing, provisioning complete
  8. Upload testing — end-to-end in dev containers
  9. Billing endpoints — add back to API
  10. Stress testing — k6 IDE session load + Minecraft bot + code-server memory baseline
  11. OPNsense audit — both routers need systematic validation
  12. Service discovery migration — replace remaining internal.zlh refs in hot paths
  13. Provisioning validation — single controlled creation, confirm 1 record / 1 job / 1 execution

Repo Registry

Repo Purpose
jester/knowledge-base Claude's home — architecture, strategy, canonical decisions
jester/zlh-grind GPT's workspace — execution continuity, session handovers, debug notes
jester/zlh-docs API/agent/portal operational reference docs
jester/zpac-api API source (mirror)
jester/zpac-portal Portal source (mirror)
jester/zlh-agent Agent source (Go)

Session Guidance for Claude

  • Architecture decisions belong here — if GPT hits an architectural question, it should stop and bring it to Claude
  • zlh-grind is GPT's execution ledger, not an architecture source
  • zlh-grind/INFRASTRUCTURE.md is authoritative for current IPs and VM inventory
  • zlh-grind/OPEN_THREADS.md has full active + outstanding work
  • zlh-grind/SCRATCH/ has detailed debug docs for specific topics
  • Do not mark unimplemented work as complete
  • Game publish flow must never be modified by dev routing changes