knowledge-base/ZeroLagHub_Master_Bootstrap_Dec2025.md
2025-12-13 16:40:48 +00:00

20 KiB

🚀 ZeroLagHub - Master Bootstrap Document (December 2025)

Last Updated: December 7, 2025
Version: 4.0 (Platform Launch Ready)
Status: 85% Complete - Launch Decision Point


📌 Quick Start for New AI Sessions

Resume Point: Platform 85% launch-ready, all core provisioning operational, critical UX features needed.

Current Phase: Launch readiness assessment - choose NOW vs +1 week vs +1 month

Critical Context: All 6 Minecraft variants provisioning successfully via Go agent. Need WebSocket console, crash protection, and disk monitoring for competitive parity.


🎯 Project Overview

ZeroLagHub is a developer-focused game server hosting platform built on:

  • Proxmox VE with LXC containers (20-30% performance advantage over Docker)
  • Hybrid Architecture: Pterodactyl panel + Custom Node.js API + Go provisioning agent
  • Velocity proxy for seamless Minecraft routing
  • Dual-router architecture for traffic separation
  • Developer-to-player revenue pipeline with 9.75x revenue multiplier

Core Value Proposition

Complete dev-to-production pipeline: Development environments ($20/mo) → Testing servers (50% discount) → Player hosting (25% discount) → Revenue sharing (7.5% commission) = viral growth through developer ecosystem.


🏗️ Current Architecture (December 2025)

Infrastructure Overview (11 VMs)

Critical Production:
├── VM 100 (zlh-panel)      - Pterodactyl panel + OAuth customization
├── VM 103 (zlh-api)        - Node.js backend + developer platform APIs
├── VM 101 (zlh-wings)      - Game servers + LXC integration target

Platform Services:
├── VM 102 (zlh-portal)     - Next.js frontend + developer dashboard
├── VM 104 (zlh-monitor)    - Prometheus/Grafana monitoring

Network & Infrastructure:
├── VM 1000 (zlh-router)    - Platform services routing + VLANs
├── VM 1006 (zpack-router)  - Game traffic routing + Velocity
├── VM 1001 (zlh-dns)       - Technitium DNS + development domains
├── VM 1002 (zlh-proxy)     - Caddy reverse proxy + SSL automation
├── VM 300 (zlh-panel-dev)  - Development environment + testing
├── VM 2000 (zlh-ci)        - CI/CD pipeline + automation
└── VM [zlh-back]           - PBS backup + Backblaze B2 replication

Network Topology

zlh-router (VM 1000):
├─ WAN1: Platform services (API, portal, monitoring)
├─ CORE_LAN: 10.60.0.0/24 (internal services)
├─ MGMT_LAN: 172.60.0.10/24 (inter-router communication)
└─ WireGuard: Admin access

zpack-router (VM 1006):
├─ WAN2: 139.64.165.248 (game services)
├─ ZPACK_LAN: 10.70.0.0/24 (Velocity @ 10.70.0.241)
├─ DEV_LAN: 10.100.0.0/24 (developer environments - future)
├─ GAME_LAN: 10.200.0.0/24 (game server LXCs)
└─ MGMT_LAN: 172.60.0.20/24 (control plane communication)

Traffic Flows

  • Platform Access: Client → WAN1 → zlh-router → Frontend/API
  • Game Play: Player → WAN2 (139.64.165.248) → zpack-router → Velocity (10.70.0.241) → Game Server (10.200.0.X)
  • Control Plane: API → MGMT_LAN (172.60.0.X) → Velocity/DNS/Monitoring

What's Working (December 7, 2025)

Provisioning Pipeline (100% Operational)

Component Status Notes
LXC Container Creation Template VMID 800, auto-cloning working
VMID Allocation Sequential assignment from range
IP Detection Automatic network configuration
Go Agent Deployment Payload delivery + self-repair system
Java Runtime Selection Auto-detect MC version → Java 17/21
All 6 MC Variants Vanilla, Paper, Purpur, Fabric, Forge, NeoForge
Server Startup All variants start successfully
DNS Publishing Cloudflare + Technitium A + SRV records
Velocity Registration Dynamic backend server registration
Client Connectivity Players can connect and play

Control Functions

Function Status Implementation
Start/Stop/Restart HTTP API → Go agent
Console Commands Command injection working
Log Tailing ⚠️ HTTP polling only (need WebSocket)
Status Reporting Agent emits RUNNING state
Crash Detection Agent tracks exit codes

Game Support Matrix

Launch Ready (Minecraft Only):

  • Vanilla - Official Mojang server
  • Paper - Primary recommendation (vanilla + plugins)
  • Purpur - Paper fork with extra features
  • Fabric - Lightweight mod support
  • Forge - Heavy mod support (tech/magic mods)
  • NeoForge - Modern Forge fork (competitive advantage)

Supported Versions: 1.12.2, 1.16.5, 1.18.2, 1.19.2, 1.20.1, 1.21.x

Deferred to Post-Launch:

  • 📋 Terraria
  • 📋 Project Zomboid
  • 📋 Valheim
  • 📋 Rust

🚨 Known Issues & Gaps

Critical Bugs (Non-Blocking, System Works)

  1. Forge server.jar Glob Logic (artifacts.go lines 112-116, 147-151)

    • Tries to find *server.jar but Forge ≥1.17 doesn't create this
    • Fix: Remove glob/rename logic (Forge uses run.sh + libraries/)
    • Impact: System works, but unnecessary code
  2. ensureProvisioned() Fallthrough (agent.go lines 155-171)

    • After Forge check, falls through to check server.jar
    • Fix: Add else to prevent fallthrough
    • Impact: Minor efficiency issue
  3. Forge Stop Command Exclusion (process.go line 83)

    • Excludes Forge from receiving stop command
    • Fix: Remove exclusion (Forge accepts stop commands)
    • Impact: Manual workaround needed for Forge stops

Missing Competitive Features (CRITICAL)

Feature Apex Shockbyte ZeroLagHub Priority
All MC Variants -
NeoForge ADVANTAGE
Performance 🟡 🟡 ADVANTAGE
Console Streaming 🔴 HIGH
File Management 🟡 MEDIUM
Backups 🟡 MEDIUM
Crash Protection 🔴 HIGH
Disk Monitoring 🔴 HIGH

🎯 Platform Readiness Assessment (85%)

Core Platform (100%)

  • Container orchestration
  • Multi-variant provisioning
  • Network routing (dual-router)
  • DNS automation (Cloudflare + Technitium)
  • Velocity proxy integration
  • Start/stop/restart control
  • Console command injection
  • Status monitoring

Operational Features (70%)

  • Log tailing (HTTP polling)
  • Crash detection
  • WebSocket console (need real-time streaming)
  • Crash loop protection (need exponential backoff)
  • Disk space monitoring (prevent corruption)

File Management (0%)

  • File upload/download
  • Backup/restore system
  • World file management

Advanced Features (Planned)

  • 📋 Resource monitoring dashboard
  • 📋 Plugin marketplace
  • 📋 Developer platform APIs
  • 📋 Performance optimization tools

🚀 Launch Decision Point

Option A: Launch NOW (Soft Beta)

Status: 85% ready
Timeline: Immediate
Pros: Fast to market, gather user feedback
Cons: Missing competitive UX features, higher support burden
Recommendation: ⚠️ Acceptable for 10-20 beta users only

Status: 95% ready after additions
Timeline: December 14, 2025
Add: WebSocket console + Crash protection + Disk monitoring
Effort: 7-9 hours total
Pros: Competitive feature parity, professional launch
Cons: Minimal delay
Recommendation: Best balance of quality and speed

Option C: +1 Month (Full Feature Parity)

Status: 100% ready
Timeline: January 7, 2026
Add: All UX features + file management + backups
Effort: ~30 hours
Pros: Complete competitive offering
Cons: Slower to market, feature creep risk
Recommendation: ⚠️ Over-engineering for launch


📋 Critical Outstanding Items

🔴 High Priority (Before Launch)

1. WebSocket Console Streaming [4-6 hours]

  • Current: HTTP polling via /logs/tail
  • Needed: Real-time WebSocket streaming
  • Why: Industry standard, users expect it
  • Technical: Socket.io integration to Go agent

2. Crash Loop Protection [2 hours]

  • Current: Immediate restart on crash
  • Needed: Exponential backoff (5s, 10s, 15s), stop after 3 crashes
  • Why: Prevents resource thrashing
  • Technical: Agent retry logic with backoff timer

3. Disk Space Monitoring [1 hour]

  • Current: No checks
  • Needed: Alert when <1GB free, prevent start if insufficient
  • Why: Prevents world corruption
  • Technical: Agent disk space check before start

🟡 Medium Priority (Week 1)

4. File Upload/Download [6-8 hours]

  • Plugin management, world uploads
  • HTTP multipart + streaming

5. Backup System [8-10 hours]

  • World backup/restore
  • Integration with PBS backup infrastructure

6. Enhanced Health Checks [3-4 hours]

  • Query server status
  • Resource monitoring (CPU/RAM)

🟢 Low Priority (Month 1)

  1. Resource monitoring dashboard
  2. Plugin marketplace integration
  3. Developer platform APIs
  4. Performance optimization

🗄️ Technical Architecture Details

Directory Structure (Finalized)

/opt/zlh/<game>/<variant>/world/

Examples:
/opt/zlh/minecraft/vanilla/world/
/opt/zlh/minecraft/forge/world/
/opt/zlh/minecraft/fabric/world/

Benefits:

  • Clear game/variant separation
  • Scalable to all future games
  • Self-documenting paths
  • Easy backup automation

Container Model

Architecture: One game per LXC container
Rationale: Industry standard, 3-5x simpler than multi-game
Benefits:

  • Better resource isolation
  • Simpler billing
  • Clearer security boundaries
  • Easier debugging

Java Runtime Selection

MC 1.21.x        → Java 21
MC ≥1.20.5       → Java 21
MC <1.20.5       → Java 17

Artifact Download Paths

minecraft/vanilla/<version>/server.jar
minecraft/paper/<version>/server.jar
minecraft/purpur/<version>/server.jar
minecraft/fabric/<version>/fabric-server.jar
minecraft/forge/<version>/forge-installer.jar
minecraft/neoforge/<version>/neoforge-installer.jar

Critical Note: Fabric uses fabric-server.jar (pre-built), not installer pattern


💰 Business Model & Revenue Strategy

Developer-to-Player Pipeline

Step 1: Developer Acquisition
├─ Development Environment: $20/month
└─ Testing Server: $25/month (50% discount)

Step 2: Player Acquisition (via developer)
├─ Player 1-10: $15/month each (25% discount)
└─ Total Player Revenue: $150/month

Step 3: Developer Commission
├─ Revenue Share: 7.5% of player revenue
├─ Developer Earns: $11.25/month
└─ Platform Keeps: $138.75/month

Total Monthly Revenue from One Developer:
$20 (dev env) + $25 (test server) + $150 (players) = $195/month
Revenue Multiplier: 9.75x on developer acquisition cost

Financial Projections

Month 6: $8K-30K (LXC advantage + developer pipeline)
Month 12: $25K-100K (custom platform competitive advantages)
Month 24: $75K-300K (market leadership + technology licensing)

Competitive Advantages

  1. LXC Performance: 20-30% improvement over Docker competitors
  2. Developer Ecosystem: Complete dev-to-production pipeline vs pure hosting
  3. Open Source Foundation: 30-40% cost advantage over corporate providers
  4. Gaming-First Architecture: Purpose-built vs adapted generic hosting
  5. NeoForge Support: Ahead of Apex and Shockbyte

🔐 Security Vulnerabilities (CRITICAL - Active Fix Required)

API Department Issues

  1. Server Ownership Bypass

    • Any user can control any server via UUID
    • No ownership validation in API endpoints
    • Impact: Critical security flaw
  2. Admin Privilege Escalation

    • Frontend can claim admin via JWT manipulation
    • No server-side role validation
    • Impact: Complete access control bypass
  3. Token URL Exposure

    • JWTs visible in browser history/logs
    • Tokens passed as URL parameters
    • Impact: Token theft vulnerability
  4. API Key Validation Missing

    • Authentication bypass vulnerabilities
    • Inconsistent validation patterns
    • Impact: Unauthorized API access

Required Fixes

  • Implement ownership checks on all server operations
  • Server-side JWT validation and role enforcement
  • Move tokens from URL to headers/cookies
  • Comprehensive API key validation

Priority: Must fix before public launch (current soft beta acceptable)


🛠️ Ford Assembly Line Department Structure

Management Department (Coordination Hub)

  • Role: Strategic oversight, cross-department integration
  • AI Resource: Claude (architecture) + ChatGPT (implementation)
  • Current Focus: Launch readiness + critical feature completion

5 Specialized Departments

1. API Department ⚠️ CRITICAL SECURITY + DEVELOPER PLATFORM

  • Tech: Node.js/Express, MariaDB, JWT auth, Pterodactyl integration
  • Priority: Security fixes + developer environment APIs

2. Infrastructure Department LXC INTEGRATION PRIORITY

  • Tech: Proxmox VMs, Ansible automation, PBS backup, Monitoring
  • Achievement: Enterprise backup system operational
  • Capacity: 1.8TB available, supports 75-100 developers

3. Frontend Department 🔧 TOKEN SECURITY + DEVELOPER UI

  • Tech: Next.js 15, TailwindCSS, sci-fi HUD aesthetic, TypeScript
  • Priority: Token security + developer dashboard

4. Pterodactyl Department ⚠️ OAUTH + WINGS LXC

  • Role: Panel customization, OAuth integration
  • Future: Wings LXC integration for performance advantage

5. Planning & Brainstorming Department 🧠 STRATEGIC EXECUTION

  • Role: Long-term vision, competitive strategy
  • Focus: Developer acquisition, viral growth mechanics

📋 Immediate Next Steps (Priority Order)

Phase 1: Critical Features (Before Launch)

  1. Fix Go Agent Bugs - Remove Forge glob, fix fallthrough, enable stop commands
  2. 🔧 WebSocket Console - Implement real-time streaming (4-6 hours)
  3. 🔧 Crash Loop Protection - Add exponential backoff (2 hours)
  4. 🔧 Disk Space Monitoring - Prevent starts on low disk (1 hour)

Phase 2: Launch Readiness

  1. 📋 Security Audit - Review critical vulnerabilities
  2. 📋 Documentation - User guides, API docs
  3. 📋 Monitoring - Alert thresholds, dashboards
  4. 📋 Soft Beta - 10-20 users, gather feedback

Phase 3: Week 1 Post-Launch

  1. 📋 File Management - Upload/download interface
  2. 📋 Backup System - World backup/restore
  3. 📋 Enhanced Health Checks - Resource monitoring

🎯 Success Metrics

Technical Metrics

  • 100% provisioning success rate (all 6 variants)
  • ⚠️ Zero DNS orphan records (needs EdgeState migration)
  • ⚠️ Sub-second WebSocket latency (needs implementation)
  • LXC 20-30% performance advantage (validated)

Business Metrics (Future)

  • Developer referral system operational
  • Revenue sharing calculations accurate
  • Customer quota enforcement working
  • Usage metering for billing

User Experience Metrics

  • Professional HUD aesthetic maintained
  • Zero breaking changes during updates
  • Seamless dev-to-production pipeline
  • <3s average provisioning time

📁 Key Files & Locations

API Service (/home/zlh/zlh-api-v2/)

  • prisma/schema.prisma - Database schema
  • src/services/edgePublisher.js - DNS + Velocity publishing
  • src/services/dePublisher.js - Edge cleanup
  • src/services/portAllocator.js - Port management
  • src/clients/cloudflareClient.js - Cloudflare API wrapper
  • src/clients/technitiumClient.js - Technitium DNS API wrapper

Go Agent (/opt/zlh-agent/)

  • agent.go - Main provisioning logic
  • artifacts.go - Download + verification (has bugs)
  • process.go - Server lifecycle management (has bug)
  • api.go - HTTP server for control commands
  • payload.json - Configuration from API

Frontend (/home/zlh/zlh-portal/)

  • Next.js 15 application
  • Steel-texture HUD aesthetic
  • Developer dashboard (in progress)

⚠️ Critical Rules & Constraints

DO NOT

  • Infer hostnames from DNS records
  • Use DNS as source of truth
  • Delete Cloudflare records without record IDs
  • Launch without WebSocket console (competitive requirement)
  • Skip crash protection (operational stability)
  • Ignore disk space monitoring (data safety)

ALWAYS

  • Treat DB as authoritative source of truth
  • Store Cloudflare record IDs in EdgeState
  • Use exact hostname matching
  • Track all async operations in JobLog
  • Audit significant actions
  • Test all 6 MC variants before deploy

💡 Key Architectural Decisions (ADRs)

ADR-001: Minecraft-Only Launch
Decision: Launch with Minecraft only, defer other games
Rationale: Market validation, focused quality, faster to market
Consequence: 6 variants + 6 versions = comprehensive MC offering

ADR-002: One Game Per Container
Decision: Single game per LXC container
Rationale: Industry standard, 3-5x simpler than multi-game
Consequence: Better isolation, clearer billing, easier debugging

ADR-003: Velocity Over Direct Port Forwarding
Decision: Use Velocity proxy for Minecraft routing
Rationale: Single entry point, dynamic registration, no NAT complexity
Consequence: No external port allocation needed for MC

ADR-004: Hybrid Pterodactyl + Custom API
Decision: Keep Pterodactyl panel, build custom API alongside
Rationale: Preserve working OAuth, gradual migration path
Consequence: Dual system complexity, eventual migration needed

ADR-005: Go Agent Architecture
Decision: Containerized Go agent handles provisioning
Rationale: Language-agnostic, self-healing, version-aware
Consequence: Robust provisioning, automatic repair, clean separation


🧠 Session Continuity Prompt

For AI assistants resuming work on this project:

Resume from ZeroLagHub Master Bootstrap (December 7, 2025).

Current State: Platform 85% launch-ready. All 6 Minecraft variants provisioning successfully via Go agent. Core functionality operational, need critical UX features for competitive parity.

Launch Decision: Recommend +1 week for WebSocket console, crash protection, and disk monitoring.

Known Bugs: 3 non-blocking Go agent issues (Forge glob, fallthrough, stop exclusion).

Critical Context:

  • Security vulnerabilities exist but acceptable for soft beta
  • Business model validated with 9.75x revenue multiplier
  • Developer-to-player pipeline is core differentiator
  • LXC performance advantage is primary competitive edge

Next Actions: Fix Go agent bugs, implement critical features, launch beta.


📞 Support & Escalation

  • Platform Owner: 44 years old, full-stack developer
  • AI Coordination: Claude (architecture) + ChatGPT (implementation)
  • Infrastructure: GTHost dedicated server ($109/month)
  • Domain: zerolaghub.com, zpack.zerolaghub.com
  • Public Game IP: 139.64.165.248

📊 Platform Status Summary

Technical Readiness: 85% complete
Competitive Position: Ready to compete on core provisioning, need UX polish
Strategic Clarity: Clear path to launch with validated business model
Infrastructure: Production-grade with enterprise backup system
Security: Known vulnerabilities, acceptable for soft beta, must fix before public launch


🎯 Strategic Recommendation

Recommended Path: Option B (+1 Week)

Rationale:

  1. WebSocket console is table stakes (competitors have it)
  2. Crash protection prevents operational nightmares
  3. Disk monitoring prevents data loss
  4. 1 week is negligible for long-term platform success
  5. Professional launch > rushed launch

Timeline:

  • Dec 7-10: Implement critical features (WebSocket, crash, disk)
  • Dec 11-13: Testing + bug fixes
  • Dec 14: Soft beta launch (10-20 users)
  • Dec 21: Public launch after beta feedback

This document serves as the single source of truth for project continuity. Update after each major milestone or architectural change.

🚀 Next action: Decide launch timeline, then implement critical features or launch beta.