zlh-grind/SCRATCH/handover-apr-2026.md

5.8 KiB

ZeroLagHub — Session Handover (Apr 7, 2026)

Platform Overview

ZeroLagHub is a game server hosting platform targeting modded/indie Minecraft. Custom stack: Node.js API, Next.js portal, Go agent, Velocity proxy, LXC containers on Proxmox. Source of truth: git.zerolaghub.com/jester/zlh-grind (mirrored to GitHub jester1181/)

Key docs in this repo:

  • OPEN_THREADS.md — all active and future work
  • INFRASTRUCTURE.md — server specs, VM inventory, IPs
  • PROJECT_CONTEXT.md — stack, architecture (note: has old VM IDs, INFRASTRUCTURE.md is authoritative)
  • SCRATCH/ — session notes, debug docs, specs

Infrastructure

Single host: GTHost Detroit — Supermicro 2029TP-HTR, Xeon Gold 6152 22c/44t, 192GB DDR4, 2x1.92TB SSD, $99/mo, Proxmox VE 9.x

Denver host decommissioned Apr 2, 2026 — OS wiped, disks striped.

Key VMs (all 9000s range):

  • 9001 zlh-router — OPNsense, 10.60.0.254, WAN 66.163.115.221
  • 9002 zpack-router — OPNsense, 10.70.0.1, WAN 66.163.115.115
  • 9010 zpack-dns — Technitium DNS, 10.60.0.14
  • 9011 zlh-proxy — Caddy, 10.60.0.16
  • 9012 zpack-proxy — Traefik v3, 10.70.0.11
  • 9014 zlh-artifacts — Caddy file server, 10.60.0.17
  • 9015 zpack-velocity — Velocity 3.5, 10.70.0.10
  • 9017 zlh-back — PBS, 10.60.0.24 / 172.60.0.30
  • 9020 zpack-api — Node.js API, MariaDB, Redis, 10.60.0.18
  • 9021 zpack-portal — Next.js, 10.60.0.19

Full IP table in INFRASTRUCTURE.md.


Architecture

Control plane: API ↔ Agent ↔ services via direct IP env vars (not DNS)

  • internal.zlh DNS exists but must NOT be used in service-to-service hot paths
  • See SCRATCH/service-discovery.md for spec

Data plane: user/browser → Traefik (zpack-proxy) → domain-based routing

Game containers: LXC, IDs 5000+ Dev containers: LXC, IDs 6000+ Base template: ID 820 (zlh-base)

Agent runs inside every container on :18888. Only caller is the API. API is central authority for routing, console, orchestration. Velocity (ZpackVelocityBridge plugin) handles Minecraft player routing via HTTP bridge on :8081.


Current System State

System is stable and deterministic following stabilization work.

Resolved this week:

  • Duplicate server creation — was frontend issue, not API
  • DB/Redis state drift from Denver→Detroit migration — Redis flushed, DB verified clean
  • Console routing — browser → API → agent
  • IP-based control plane — internal.zlh removed from hot paths
  • Velocity rehydration — uses DB + Redis, not Proxmox live state
  • Denver fully decommissioned

Active Issues (Next Session Priority)

1. Fabric Readiness Gating (Agent fix needed)

Root cause: Velocity registration happens before Fabric server is fully ready to accept proxy traffic. Results in "proxy starting" errors until Velocity is restarted.

Fix: Gate Velocity registration behind TCP probe success (port 25565). The agent already runs a TCP probe — Velocity registration just needs to happen after probe returns success, not at process start.

Does NOT require: Changes to Velocity, FabricProxy-Lite version, or Fabric API version.

See SCRATCH/session-stabilization-fabric-findings.md

2. Fabric/Vanilla Stack (what's working)

  • Fabric server starts correctly with Fabric API + FabricProxy-Lite pre-seeded in mods/
  • Correct versions for 1.21.7: fabric-api-0.129.0+1.21.7.jar + fabricproxy-lite-2.9.0.jar
  • Agent normalizes "vanilla" game type → fabric profile, pulls Fabric loader as server.jar
  • Forwarding secret must be written to config/FabricProxy-Lite.toml at provisioning time
  • See SCRATCH/minecraft-velocity-forwarding.md for full per-game-type requirements

3. Velocity Config (current state)

player-info-forwarding-mode = "modern"
forwarding-secret-file = "forwarding.secret"

Lobby fallback removed. ZpackVelocityBridge handles all routing dynamically.


Pre-Launch Checklist (from OPEN_THREADS)

Blockers:

  • Billing / Stripe — cannot take money without this
  • Game server world backup/restore — trust-critical
  • User onboarding flow — guided first-server creation after register
  • Fabric readiness gating — agent-level fix
  • Password reset flow — verify wired up
  • Usage limits / quota enforcement
  • Email notifications
  • Upload testing
  • Billing endpoints
  • Stress testing — k6 + Minecraft bot + code-server memory baseline
  • OPNsense audit
  • Service discovery migration — remaining internal.zlh refs in hot paths
  • Provisioning validation — single controlled test post-cleanup

Key Decisions Made

  • Forwarding mode: modernnone breaks player identity/UUID stability
  • Vanilla servers: Use Fabric loader + FabricProxy-Lite, not vanilla JAR
  • Service discovery: IP env vars, not DNS, for control plane
  • Redis: Never restore from backup — always start clean
  • Database: Source of truth for all state
  • Hosting: GTHost Detroit long-term (most cost-effective)
  • Velocity plugin: ZpackVelocityBridge — in-memory only, re-registration needed after restart

Repo Registry

All at git.zerolaghub.com/jester/<repo>:

  • zlh-grind — execution workspace, continuity, this file
  • zlh-docs — API/agent/portal reference docs
  • zpac-api — API source
  • zpac-portal — portal source
  • zlh-agent — agent source (Go)

Session Guidance for Claude

  • Read OPEN_THREADS.md at session start for current state
  • Read INFRASTRUCTURE.md for IPs and VM inventory
  • SCRATCH/ has detailed docs for specific topics
  • Agent is authority on filesystem — API must not duplicate filesystem logic
  • Portal never calls agents directly — all traffic through API
  • Game publish flow must never be modified by dev routing changes
  • Do not mark unimplemented work as complete