knowledge-base/ZeroLagHub_Master_Bootstrap_Apr2026.md

217 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ZeroLagHub — Master Bootstrap (April 2026)
**For**: Claude (strategic/architecture sessions)
**Last Updated**: April 7, 2026
**Supersedes**: ZeroLagHub_Master_Bootstrap_Dec2025.md
**Source refs**: zlh-grind/SCRATCH/handover-apr-2026.md, zlh-grind/PROJECT_CONTEXT.md, zlh-grind/OPEN_THREADS.md
---
## AI Role Split
| AI | Role | Workspace |
|----|------|-----------|
| **Claude** | Architecture, strategy, design decisions, cross-cutting concerns | `jester/knowledge-base` |
| **GPT (Ceàrd)** | Implementation, code changes, verification, session continuity | `jester/zlh-grind` |
**Hard rule**: GPT must not make architecture decisions. If architectural interpretation is required, stop and defer to Claude or canonical docs.
**Claude's session start checklist**:
1. Read this file
2. Read `ZeroLagHub_Cross_Project_Tracker.md` (locked decisions + boundaries)
3. Reference `INFRASTRUCTURE.md` in `zlh-grind` for current IPs/VMs
---
## What ZeroLagHub Is
Game server hosting platform targeting modded and indie Minecraft.
**Competitive advantages**:
- LXC containers (20-30% performance over Docker)
- Custom Go agent architecture — purpose-built, not adapted generic tooling
- Open-source stack (30-40% cost advantage)
- Developer-to-player pipeline: mod devs become a distribution channel
**System posture as of Apr 2026**: Stable. Controlled expansion phase. Detroit migration complete.
---
## Infrastructure
**Single host**: GTHost Detroit
Supermicro 2029TP-HTR, Xeon Gold 6152 22c/44t, 192GB DDR4, 2×1.92TB SSD, $99/mo, Proxmox VE 9.x
**Denver host**: Fully decommissioned April 2, 2026. OS wiped, disks striped.
### Active VMs (all 9000s range — authoritative in zlh-grind/INFRASTRUCTURE.md)
| VM | Name | Role |
|----|------|------|
| 9001 | zlh-router | OPNsense, WAN 66.163.115.221 |
| 9002 | zpack-router | OPNsense, WAN 66.163.115.115 |
| 9010 | zpack-dns | Technitium DNS |
| 9011 | zlh-proxy | Caddy reverse proxy |
| 9012 | zpack-proxy | Traefik v3 — game/dev edge routing + wildcard TLS |
| 9014 | zlh-artifacts | Caddy file server — runtime binaries + server jars |
| 9015 | zpack-velocity | Velocity 3.5 — Minecraft proxy |
| 9017 | zlh-back | PBS backup + Backblaze B2 |
| 9020 | zpack-api | Node.js API, MariaDB, Redis |
| 9021 | zpack-portal | Next.js frontend |
**LXC containers**:
- Game containers: IDs 5000+
- Dev containers: IDs 6000+
- Base template: ID 820 (zlh-base)
**Legacy VMs (NOT active production)**: 100 (zlh-panel), 101 (zlh-wings), 103 (zlh-api), 1000 (zlh-router)
---
## Naming Conventions
- `zlh-*` = core infrastructure (DNS, monitoring, backup, routing, artifacts)
- `zpack-*` = game and dev server stack (portal, API, containers)
---
## Stack
**API (zpack-api, VM 9020)**: Node.js ESM, Express 5, Prisma 6, MariaDB, Redis, BullMQ, JWT, Stripe, argon2, ssh2, WebSocket, http-proxy-middleware
**Portal (zpack-portal, VM 9021)**: Next.js 15, TypeScript, TailwindCSS, Axios, WebSocket console. Sci-fi HUD aesthetic (steel textures, neon accents, beveled panels).
**Agent (zlh-agent)**: Go 1.21, stdlib HTTP, creack/pty, gorilla/websocket. Runs inside every game/dev container. Only process with direct filesystem access. Pulls runtimes + server jars from zlh-artifacts (VM 9014).
---
## Architecture Rules (Locked)
These cannot be changed without an explicit "Revisit decision X" conversation with Claude.
| # | Decision |
|---|----------|
| DEC-001 | Templates + Agent hybrid (not templates-only or agent-only) |
| DEC-002 | API orchestrates, Agent executes (never reversed) |
| DEC-003 | API owns DNS (Agent never creates DNS records) |
| DEC-004 | Traefik + Velocity only (no HAProxy) |
| DEC-005 | MariaDB is source of truth (no flat files) |
| DEC-006 | Frontend → API only (no direct agent calls) |
| DEC-007 | Drift prevention mandatory |
| DEC-008 | Control plane uses IP env vars, not internal.zlh DNS in hot paths |
| DEC-009 | Redis never restored from backup — always start clean |
| DEC-010 | Vanilla Minecraft servers use Fabric loader + FabricProxy-Lite (not vanilla JAR) |
| DEC-011 | Velocity forwarding mode = `modern` (not `none` — breaks UUID stability) |
---
## Control Plane Architecture
```
Browser
→ Traefik (zpack-proxy, 10.70.0.242) — TLS termination, domain routing
→ API (10.60.0.245:4000) — orchestration, auth, state
→ Agent (:18888) — container execution (game + dev)
```
- Agent is HTTP server on :18888, internal only. API is the only caller.
- Portal never calls agents directly.
- Service-to-service communication uses IP env vars, not DNS FQDNs.
- Upload transport: raw `http.request` piping (`req.pipe(proxyReq)`), never `fetch()`.
### Dev IDE Access (Browser — Current Working Model)
```
Browser → dev-<vmid>.zerolaghub.dev → Traefik → API → container:6000
```
Flow: frontend calls `POST /api/dev/:id/ide-token` → API returns hosted URL → browser opens → Traefik wildcard routes to API → API validates token, sets HTTP-only cookie, proxies to code-server. Browser-verified end-to-end.
---
## Game Support
**Production**: Minecraft (Vanilla/Fabric/Paper/Forge/Neoforge), Rust, Terraria, Project Zomboid
**Pipeline**: Valheim, Palworld, Vintage Story, Core Keeper
**Target market**: Modded/indie games vs mainstream providers
---
## Developer-to-Player Revenue Pipeline
```
LXC Dev Environment ($15-40/mo)
→ Game/mod creation + testing
→ Testing servers (50% dev discount)
→ Player community referrals (25% player discount)
→ Developer revenue share (5-10% commission)
→ Viral growth
```
Revenue multiplier: 1 developer → ~10 players → $147.50/mo total from one developer acquisition.
---
## Current System State (April 7, 2026)
**System is stable and deterministic** following stabilization work this week.
### Resolved
- ✅ Duplicate server creation (was frontend issue, not API)
- ✅ DB/Redis state drift from Denver→Detroit migration
- ✅ Console routing (browser → API → agent)
- ✅ IP-based control plane (internal.zlh removed from hot paths)
- ✅ Velocity rehydration (uses DB + Redis, not Proxmox live state)
- ✅ Denver fully decommissioned
### Active Issue: Fabric Readiness Gating (agent fix needed)
**Root cause**: Velocity registration fires before Fabric server is ready to accept proxy traffic → "proxy starting" errors until Velocity restart.
**Fix**: Gate Velocity registration behind TCP probe success (port 25565). Agent already runs probe — registration must fire after probe returns success, not at process start.
**Scope**: Agent only. No Velocity or FabricProxy-Lite changes needed.
**Ref**: `zlh-grind/SCRATCH/session-stabilization-fabric-findings.md`
---
## Pre-Launch Blockers
In priority order:
1. **Billing / Stripe** — cannot take payments, hard launch blocker
2. **Game server world backup/restore** — trust-critical, player retention risk
3. **User onboarding flow** — guided first-server creation after registration
4. **Fabric readiness gating** — agent TCP probe gate (see above)
5. **Password reset flow** — verify fully wired
6. **Usage limits / quota enforcement** — prevent unbounded server creation
7. **Email notifications** — crash, billing, provisioning complete
8. **Upload testing** — end-to-end in dev containers
9. **Billing endpoints** — add back to API
10. **Stress testing** — k6 IDE session load + Minecraft bot + code-server memory baseline
11. **OPNsense audit** — both routers need systematic validation
12. **Service discovery migration** — replace remaining internal.zlh refs in hot paths
13. **Provisioning validation** — single controlled creation, confirm 1 record / 1 job / 1 execution
---
## Repo Registry
| Repo | Purpose |
|------|---------|
| `jester/knowledge-base` | Claude's home — architecture, strategy, canonical decisions |
| `jester/zlh-grind` | GPT's workspace — execution continuity, session handovers, debug notes |
| `jester/zlh-docs` | API/agent/portal operational reference docs |
| `jester/zpac-api` | API source (mirror) |
| `jester/zpac-portal` | Portal source (mirror) |
| `jester/zlh-agent` | Agent source (Go) |
---
## Session Guidance for Claude
- **Architecture decisions belong here** — if GPT hits an architectural question, it should stop and bring it to Claude
- `zlh-grind` is GPT's execution ledger, not an architecture source
- `zlh-grind/INFRASTRUCTURE.md` is authoritative for current IPs and VM inventory
- `zlh-grind/OPEN_THREADS.md` has full active + outstanding work
- `zlh-grind/SCRATCH/` has detailed debug docs for specific topics
- Do not mark unimplemented work as complete
- Game publish flow must never be modified by dev routing changes