zlh-grind/SCRATCH/handover-apr-2026.md

149 lines
5.8 KiB
Markdown

# ZeroLagHub — Session Handover (Apr 7, 2026)
## Platform Overview
ZeroLagHub is a game server hosting platform targeting modded/indie Minecraft.
Custom stack: Node.js API, Next.js portal, Go agent, Velocity proxy, LXC containers on Proxmox.
Source of truth: `git.zerolaghub.com/jester/zlh-grind` (mirrored to GitHub `jester1181/`)
Key docs in this repo:
- `OPEN_THREADS.md` — all active and future work
- `INFRASTRUCTURE.md` — server specs, VM inventory, IPs
- `PROJECT_CONTEXT.md` — stack, architecture (note: has old VM IDs, INFRASTRUCTURE.md is authoritative)
- `SCRATCH/` — session notes, debug docs, specs
---
## Infrastructure
**Single host:** GTHost Detroit — Supermicro 2029TP-HTR, Xeon Gold 6152 22c/44t, 192GB DDR4, 2x1.92TB SSD, $99/mo, Proxmox VE 9.x
**Denver host decommissioned** Apr 2, 2026 — OS wiped, disks striped.
Key VMs (all 9000s range):
- 9001 zlh-router — OPNsense, 10.60.0.254, WAN 66.163.115.221
- 9002 zpack-router — OPNsense, 10.70.0.1, WAN 66.163.115.115
- 9010 zpack-dns — Technitium DNS, 10.60.0.14
- 9011 zlh-proxy — Caddy, 10.60.0.16
- 9012 zpack-proxy — Traefik v3, 10.70.0.11
- 9014 zlh-artifacts — Caddy file server, 10.60.0.17
- 9015 zpack-velocity — Velocity 3.5, 10.70.0.10
- 9017 zlh-back — PBS, 10.60.0.24 / 172.60.0.30
- 9020 zpack-api — Node.js API, MariaDB, Redis, 10.60.0.18
- 9021 zpack-portal — Next.js, 10.60.0.19
Full IP table in `INFRASTRUCTURE.md`.
---
## Architecture
**Control plane:** API ↔ Agent ↔ services via direct IP env vars (not DNS)
- internal.zlh DNS exists but must NOT be used in service-to-service hot paths
- See `SCRATCH/service-discovery.md` for spec
**Data plane:** user/browser → Traefik (zpack-proxy) → domain-based routing
**Game containers:** LXC, IDs 5000+
**Dev containers:** LXC, IDs 6000+
**Base template:** ID 820 (zlh-base)
**Agent** runs inside every container on :18888. Only caller is the API.
**API** is central authority for routing, console, orchestration.
**Velocity** (ZpackVelocityBridge plugin) handles Minecraft player routing via HTTP bridge on :8081.
---
## Current System State
System is **stable and deterministic** following stabilization work.
Resolved this week:
- ✅ Duplicate server creation — was frontend issue, not API
- ✅ DB/Redis state drift from Denver→Detroit migration — Redis flushed, DB verified clean
- ✅ Console routing — browser → API → agent
- ✅ IP-based control plane — internal.zlh removed from hot paths
- ✅ Velocity rehydration — uses DB + Redis, not Proxmox live state
- ✅ Denver fully decommissioned
---
## Active Issues (Next Session Priority)
### 1. Fabric Readiness Gating (Agent fix needed)
**Root cause:** Velocity registration happens before Fabric server is fully ready to accept proxy traffic. Results in "proxy starting" errors until Velocity is restarted.
**Fix:** Gate Velocity registration behind TCP probe success (port 25565). The agent already runs a TCP probe — Velocity registration just needs to happen after probe returns success, not at process start.
**Does NOT require:** Changes to Velocity, FabricProxy-Lite version, or Fabric API version.
See `SCRATCH/session-stabilization-fabric-findings.md`
### 2. Fabric/Vanilla Stack (what's working)
- Fabric server starts correctly with Fabric API + FabricProxy-Lite pre-seeded in mods/
- Correct versions for 1.21.7: `fabric-api-0.129.0+1.21.7.jar` + `fabricproxy-lite-2.9.0.jar`
- Agent normalizes "vanilla" game type → fabric profile, pulls Fabric loader as server.jar
- Forwarding secret must be written to `config/FabricProxy-Lite.toml` at provisioning time
- See `SCRATCH/minecraft-velocity-forwarding.md` for full per-game-type requirements
### 3. Velocity Config (current state)
```toml
player-info-forwarding-mode = "modern"
forwarding-secret-file = "forwarding.secret"
```
Lobby fallback removed. ZpackVelocityBridge handles all routing dynamically.
---
## Pre-Launch Checklist (from OPEN_THREADS)
Blockers:
- **Billing / Stripe** — cannot take money without this
- **Game server world backup/restore** — trust-critical
- **User onboarding flow** — guided first-server creation after register
- **Fabric readiness gating** — agent-level fix
- **Password reset flow** — verify wired up
- **Usage limits / quota enforcement**
- **Email notifications**
- **Upload testing**
- **Billing endpoints**
- **Stress testing** — k6 + Minecraft bot + code-server memory baseline
- **OPNsense audit**
- **Service discovery migration** — remaining internal.zlh refs in hot paths
- **Provisioning validation** — single controlled test post-cleanup
---
## Key Decisions Made
- **Forwarding mode:** `modern``none` breaks player identity/UUID stability
- **Vanilla servers:** Use Fabric loader + FabricProxy-Lite, not vanilla JAR
- **Service discovery:** IP env vars, not DNS, for control plane
- **Redis:** Never restore from backup — always start clean
- **Database:** Source of truth for all state
- **Hosting:** GTHost Detroit long-term (most cost-effective)
- **Velocity plugin:** ZpackVelocityBridge — in-memory only, re-registration needed after restart
---
## Repo Registry
All at `git.zerolaghub.com/jester/<repo>`:
- `zlh-grind` — execution workspace, continuity, this file
- `zlh-docs` — API/agent/portal reference docs
- `zpac-api` — API source
- `zpac-portal` — portal source
- `zlh-agent` — agent source (Go)
---
## Session Guidance for Claude
- Read `OPEN_THREADS.md` at session start for current state
- Read `INFRASTRUCTURE.md` for IPs and VM inventory
- `SCRATCH/` has detailed docs for specific topics
- Agent is authority on filesystem — API must not duplicate filesystem logic
- Portal never calls agents directly — all traffic through API
- Game publish flow must never be modified by dev routing changes
- Do not mark unimplemented work as complete