# 🚀 ZeroLagHub - Master Bootstrap Document (December 2025) **Last Updated**: December 7, 2025 **Version**: 4.0 (Platform Launch Ready) **Status**: 85% Complete - Launch Decision Point --- ## 📌 Quick Start for New AI Sessions > **Resume Point**: Platform 85% launch-ready, all core provisioning operational, critical UX features needed. > > **Current Phase**: Launch readiness assessment - choose NOW vs +1 week vs +1 month > > **Critical Context**: All 6 Minecraft variants provisioning successfully via Go agent. Need WebSocket console, crash protection, and disk monitoring for competitive parity. --- ## 🎯 Project Overview **ZeroLagHub** is a developer-focused game server hosting platform built on: - **Proxmox VE** with LXC containers (20-30% performance advantage over Docker) - **Hybrid Architecture**: Pterodactyl panel + Custom Node.js API + Go provisioning agent - **Velocity proxy** for seamless Minecraft routing - **Dual-router architecture** for traffic separation - **Developer-to-player revenue pipeline** with 9.75x revenue multiplier ### Core Value Proposition Complete dev-to-production pipeline: Development environments ($20/mo) → Testing servers (50% discount) → Player hosting (25% discount) → Revenue sharing (7.5% commission) = viral growth through developer ecosystem. --- ## 🏗️ Current Architecture (December 2025) ### Infrastructure Overview (11 VMs) ``` Critical Production: ├── VM 100 (zlh-panel) - Pterodactyl panel + OAuth customization ├── VM 103 (zlh-api) - Node.js backend + developer platform APIs ├── VM 101 (zlh-wings) - Game servers + LXC integration target Platform Services: ├── VM 102 (zlh-portal) - Next.js frontend + developer dashboard ├── VM 104 (zlh-monitor) - Prometheus/Grafana monitoring Network & Infrastructure: ├── VM 1000 (zlh-router) - Platform services routing + VLANs ├── VM 1006 (zpack-router) - Game traffic routing + Velocity ├── VM 1001 (zlh-dns) - Technitium DNS + development domains ├── VM 1002 (zlh-proxy) - Caddy reverse proxy + SSL automation ├── VM 300 (zlh-panel-dev) - Development environment + testing ├── VM 2000 (zlh-ci) - CI/CD pipeline + automation └── VM [zlh-back] - PBS backup + Backblaze B2 replication ``` ### Network Topology ``` zlh-router (VM 1000): ├─ WAN1: Platform services (API, portal, monitoring) ├─ CORE_LAN: 10.60.0.0/24 (internal services) ├─ MGMT_LAN: 172.60.0.10/24 (inter-router communication) └─ WireGuard: Admin access zpack-router (VM 1006): ├─ WAN2: 139.64.165.248 (game services) ├─ ZPACK_LAN: 10.70.0.0/24 (Velocity @ 10.70.0.241) ├─ DEV_LAN: 10.100.0.0/24 (developer environments - future) ├─ GAME_LAN: 10.200.0.0/24 (game server LXCs) └─ MGMT_LAN: 172.60.0.20/24 (control plane communication) ``` ### Traffic Flows - **Platform Access**: Client → WAN1 → zlh-router → Frontend/API - **Game Play**: Player → WAN2 (139.64.165.248) → zpack-router → Velocity (10.70.0.241) → Game Server (10.200.0.X) - **Control Plane**: API → MGMT_LAN (172.60.0.X) → Velocity/DNS/Monitoring --- ## ✅ What's Working (December 7, 2025) ### Provisioning Pipeline (100% Operational) | Component | Status | Notes | |-----------|--------|-------| | **LXC Container Creation** | ✅ | Template VMID 800, auto-cloning working | | **VMID Allocation** | ✅ | Sequential assignment from range | | **IP Detection** | ✅ | Automatic network configuration | | **Go Agent Deployment** | ✅ | Payload delivery + self-repair system | | **Java Runtime Selection** | ✅ | Auto-detect MC version → Java 17/21 | | **All 6 MC Variants** | ✅ | Vanilla, Paper, Purpur, Fabric, Forge, NeoForge | | **Server Startup** | ✅ | All variants start successfully | | **DNS Publishing** | ✅ | Cloudflare + Technitium A + SRV records | | **Velocity Registration** | ✅ | Dynamic backend server registration | | **Client Connectivity** | ✅ | Players can connect and play | ### Control Functions | Function | Status | Implementation | |----------|--------|----------------| | **Start/Stop/Restart** | ✅ | HTTP API → Go agent | | **Console Commands** | ✅ | Command injection working | | **Log Tailing** | ⚠️ | HTTP polling only (need WebSocket) | | **Status Reporting** | ✅ | Agent emits RUNNING state | | **Crash Detection** | ✅ | Agent tracks exit codes | ### Game Support Matrix **Launch Ready (Minecraft Only)**: - ✅ **Vanilla** - Official Mojang server - ✅ **Paper** - Primary recommendation (vanilla + plugins) - ✅ **Purpur** - Paper fork with extra features - ✅ **Fabric** - Lightweight mod support - ✅ **Forge** - Heavy mod support (tech/magic mods) - ✅ **NeoForge** - Modern Forge fork (**competitive advantage**) **Supported Versions**: 1.12.2, 1.16.5, 1.18.2, 1.19.2, 1.20.1, 1.21.x **Deferred to Post-Launch**: - 📋 Terraria - 📋 Project Zomboid - 📋 Valheim - 📋 Rust --- ## 🚨 Known Issues & Gaps ### Critical Bugs (Non-Blocking, System Works) 1. **Forge server.jar Glob Logic** (`artifacts.go` lines 112-116, 147-151) - Tries to find `*server.jar` but Forge ≥1.17 doesn't create this - **Fix**: Remove glob/rename logic (Forge uses `run.sh` + `libraries/`) - **Impact**: System works, but unnecessary code 2. **ensureProvisioned() Fallthrough** (`agent.go` lines 155-171) - After Forge check, falls through to check `server.jar` - **Fix**: Add `else` to prevent fallthrough - **Impact**: Minor efficiency issue 3. **Forge Stop Command Exclusion** (`process.go` line 83) - Excludes Forge from receiving `stop` command - **Fix**: Remove exclusion (Forge accepts stop commands) - **Impact**: Manual workaround needed for Forge stops ### Missing Competitive Features (CRITICAL) | Feature | Apex | Shockbyte | ZeroLagHub | Priority | |---------|------|-----------|------------|----------| | **All MC Variants** | ✅ | ✅ | ✅ | - | | **NeoForge** | ❌ | ❌ | ✅ | ADVANTAGE | | **Performance** | 🟡 | 🟡 | ✅ | ADVANTAGE | | **Console Streaming** | ✅ | ✅ | ❌ | 🔴 HIGH | | **File Management** | ✅ | ✅ | ❌ | 🟡 MEDIUM | | **Backups** | ✅ | ✅ | ❌ | 🟡 MEDIUM | | **Crash Protection** | ✅ | ✅ | ❌ | 🔴 HIGH | | **Disk Monitoring** | ✅ | ✅ | ❌ | 🔴 HIGH | --- ## 🎯 Platform Readiness Assessment (85%) ### Core Platform (100%) - ✅ Container orchestration - ✅ Multi-variant provisioning - ✅ Network routing (dual-router) - ✅ DNS automation (Cloudflare + Technitium) - ✅ Velocity proxy integration - ✅ Start/stop/restart control - ✅ Console command injection - ✅ Status monitoring ### Operational Features (70%) - ✅ Log tailing (HTTP polling) - ✅ Crash detection - ❌ **WebSocket console** (need real-time streaming) - ❌ **Crash loop protection** (need exponential backoff) - ❌ **Disk space monitoring** (prevent corruption) ### File Management (0%) - ❌ File upload/download - ❌ Backup/restore system - ❌ World file management ### Advanced Features (Planned) - 📋 Resource monitoring dashboard - 📋 Plugin marketplace - 📋 Developer platform APIs - 📋 Performance optimization tools --- ## 🚀 Launch Decision Point ### Option A: Launch NOW (Soft Beta) **Status**: 85% ready **Timeline**: Immediate **Pros**: Fast to market, gather user feedback **Cons**: Missing competitive UX features, higher support burden **Recommendation**: ⚠️ Acceptable for 10-20 beta users only ### Option B: +1 Week (Critical Features) ⭐ RECOMMENDED **Status**: 95% ready after additions **Timeline**: December 14, 2025 **Add**: WebSocket console + Crash protection + Disk monitoring **Effort**: 7-9 hours total **Pros**: Competitive feature parity, professional launch **Cons**: Minimal delay **Recommendation**: ✅ Best balance of quality and speed ### Option C: +1 Month (Full Feature Parity) **Status**: 100% ready **Timeline**: January 7, 2026 **Add**: All UX features + file management + backups **Effort**: ~30 hours **Pros**: Complete competitive offering **Cons**: Slower to market, feature creep risk **Recommendation**: ⚠️ Over-engineering for launch --- ## 📋 Critical Outstanding Items ### 🔴 High Priority (Before Launch) **1. WebSocket Console Streaming** [4-6 hours] - **Current**: HTTP polling via `/logs/tail` - **Needed**: Real-time WebSocket streaming - **Why**: Industry standard, users expect it - **Technical**: Socket.io integration to Go agent **2. Crash Loop Protection** [2 hours] - **Current**: Immediate restart on crash - **Needed**: Exponential backoff (5s, 10s, 15s), stop after 3 crashes - **Why**: Prevents resource thrashing - **Technical**: Agent retry logic with backoff timer **3. Disk Space Monitoring** [1 hour] - **Current**: No checks - **Needed**: Alert when <1GB free, prevent start if insufficient - **Why**: Prevents world corruption - **Technical**: Agent disk space check before start ### 🟡 Medium Priority (Week 1) **4. File Upload/Download** [6-8 hours] - Plugin management, world uploads - HTTP multipart + streaming **5. Backup System** [8-10 hours] - World backup/restore - Integration with PBS backup infrastructure **6. Enhanced Health Checks** [3-4 hours] - Query server status - Resource monitoring (CPU/RAM) ### 🟢 Low Priority (Month 1) 7. Resource monitoring dashboard 8. Plugin marketplace integration 9. Developer platform APIs 10. Performance optimization --- ## 🗄️ Technical Architecture Details ### Directory Structure (Finalized) ``` /opt/zlh///world/ Examples: /opt/zlh/minecraft/vanilla/world/ /opt/zlh/minecraft/forge/world/ /opt/zlh/minecraft/fabric/world/ ``` **Benefits**: - Clear game/variant separation - Scalable to all future games - Self-documenting paths - Easy backup automation ### Container Model **Architecture**: One game per LXC container **Rationale**: Industry standard, 3-5x simpler than multi-game **Benefits**: - Better resource isolation - Simpler billing - Clearer security boundaries - Easier debugging ### Java Runtime Selection ``` MC 1.21.x → Java 21 MC ≥1.20.5 → Java 21 MC <1.20.5 → Java 17 ``` ### Artifact Download Paths ``` minecraft/vanilla//server.jar minecraft/paper//server.jar minecraft/purpur//server.jar minecraft/fabric//fabric-server.jar minecraft/forge//forge-installer.jar minecraft/neoforge//neoforge-installer.jar ``` **Critical Note**: Fabric uses `fabric-server.jar` (pre-built), not installer pattern --- ## 💰 Business Model & Revenue Strategy ### Developer-to-Player Pipeline ``` Step 1: Developer Acquisition ├─ Development Environment: $20/month └─ Testing Server: $25/month (50% discount) Step 2: Player Acquisition (via developer) ├─ Player 1-10: $15/month each (25% discount) └─ Total Player Revenue: $150/month Step 3: Developer Commission ├─ Revenue Share: 7.5% of player revenue ├─ Developer Earns: $11.25/month └─ Platform Keeps: $138.75/month Total Monthly Revenue from One Developer: $20 (dev env) + $25 (test server) + $150 (players) = $195/month Revenue Multiplier: 9.75x on developer acquisition cost ``` ### Financial Projections **Month 6**: $8K-30K (LXC advantage + developer pipeline) **Month 12**: $25K-100K (custom platform competitive advantages) **Month 24**: $75K-300K (market leadership + technology licensing) ### Competitive Advantages 1. **LXC Performance**: 20-30% improvement over Docker competitors 2. **Developer Ecosystem**: Complete dev-to-production pipeline vs pure hosting 3. **Open Source Foundation**: 30-40% cost advantage over corporate providers 4. **Gaming-First Architecture**: Purpose-built vs adapted generic hosting 5. **NeoForge Support**: Ahead of Apex and Shockbyte --- ## 🔐 Security Vulnerabilities (CRITICAL - Active Fix Required) ### API Department Issues 1. **Server Ownership Bypass** - Any user can control any server via UUID - No ownership validation in API endpoints - **Impact**: Critical security flaw 2. **Admin Privilege Escalation** - Frontend can claim admin via JWT manipulation - No server-side role validation - **Impact**: Complete access control bypass 3. **Token URL Exposure** - JWTs visible in browser history/logs - Tokens passed as URL parameters - **Impact**: Token theft vulnerability 4. **API Key Validation Missing** - Authentication bypass vulnerabilities - Inconsistent validation patterns - **Impact**: Unauthorized API access ### Required Fixes - Implement ownership checks on all server operations - Server-side JWT validation and role enforcement - Move tokens from URL to headers/cookies - Comprehensive API key validation **Priority**: Must fix before public launch (current soft beta acceptable) --- ## 🛠️ Ford Assembly Line Department Structure ### Management Department (Coordination Hub) - **Role**: Strategic oversight, cross-department integration - **AI Resource**: Claude (architecture) + ChatGPT (implementation) - **Current Focus**: Launch readiness + critical feature completion ### 5 Specialized Departments **1. API Department** ⚠️ CRITICAL SECURITY + DEVELOPER PLATFORM - Tech: Node.js/Express, MariaDB, JWT auth, Pterodactyl integration - Priority: Security fixes + developer environment APIs **2. Infrastructure Department** ✅ LXC INTEGRATION PRIORITY - Tech: Proxmox VMs, Ansible automation, PBS backup, Monitoring - Achievement: Enterprise backup system operational - Capacity: 1.8TB available, supports 75-100 developers **3. Frontend Department** 🔧 TOKEN SECURITY + DEVELOPER UI - Tech: Next.js 15, TailwindCSS, sci-fi HUD aesthetic, TypeScript - Priority: Token security + developer dashboard **4. Pterodactyl Department** ⚠️ OAUTH + WINGS LXC - Role: Panel customization, OAuth integration - Future: Wings LXC integration for performance advantage **5. Planning & Brainstorming Department** 🧠 STRATEGIC EXECUTION - Role: Long-term vision, competitive strategy - Focus: Developer acquisition, viral growth mechanics --- ## 📋 Immediate Next Steps (Priority Order) ### Phase 1: Critical Features (Before Launch) 1. ✅ **Fix Go Agent Bugs** - Remove Forge glob, fix fallthrough, enable stop commands 2. 🔧 **WebSocket Console** - Implement real-time streaming (4-6 hours) 3. 🔧 **Crash Loop Protection** - Add exponential backoff (2 hours) 4. 🔧 **Disk Space Monitoring** - Prevent starts on low disk (1 hour) ### Phase 2: Launch Readiness 5. 📋 **Security Audit** - Review critical vulnerabilities 6. 📋 **Documentation** - User guides, API docs 7. 📋 **Monitoring** - Alert thresholds, dashboards 8. 📋 **Soft Beta** - 10-20 users, gather feedback ### Phase 3: Week 1 Post-Launch 9. 📋 **File Management** - Upload/download interface 10. 📋 **Backup System** - World backup/restore 11. 📋 **Enhanced Health Checks** - Resource monitoring --- ## 🎯 Success Metrics ### Technical Metrics - ✅ 100% provisioning success rate (all 6 variants) - ⚠️ Zero DNS orphan records (needs EdgeState migration) - ⚠️ Sub-second WebSocket latency (needs implementation) - ✅ LXC 20-30% performance advantage (validated) ### Business Metrics (Future) - Developer referral system operational - Revenue sharing calculations accurate - Customer quota enforcement working - Usage metering for billing ### User Experience Metrics - Professional HUD aesthetic maintained - Zero breaking changes during updates - Seamless dev-to-production pipeline - <3s average provisioning time --- ## 📁 Key Files & Locations ### API Service (`/home/zlh/zlh-api-v2/`) - `prisma/schema.prisma` - Database schema - `src/services/edgePublisher.js` - DNS + Velocity publishing - `src/services/dePublisher.js` - Edge cleanup - `src/services/portAllocator.js` - Port management - `src/clients/cloudflareClient.js` - Cloudflare API wrapper - `src/clients/technitiumClient.js` - Technitium DNS API wrapper ### Go Agent (`/opt/zlh-agent/`) - `agent.go` - Main provisioning logic - `artifacts.go` - Download + verification (has bugs) - `process.go` - Server lifecycle management (has bug) - `api.go` - HTTP server for control commands - `payload.json` - Configuration from API ### Frontend (`/home/zlh/zlh-portal/`) - Next.js 15 application - Steel-texture HUD aesthetic - Developer dashboard (in progress) --- ## ⚠️ Critical Rules & Constraints ### DO NOT - ❌ Infer hostnames from DNS records - ❌ Use DNS as source of truth - ❌ Delete Cloudflare records without record IDs - ❌ Launch without WebSocket console (competitive requirement) - ❌ Skip crash protection (operational stability) - ❌ Ignore disk space monitoring (data safety) ### ALWAYS - ✅ Treat DB as authoritative source of truth - ✅ Store Cloudflare record IDs in EdgeState - ✅ Use exact hostname matching - ✅ Track all async operations in JobLog - ✅ Audit significant actions - ✅ Test all 6 MC variants before deploy --- ## 💡 Key Architectural Decisions (ADRs) **ADR-001: Minecraft-Only Launch** **Decision**: Launch with Minecraft only, defer other games **Rationale**: Market validation, focused quality, faster to market **Consequence**: 6 variants + 6 versions = comprehensive MC offering **ADR-002: One Game Per Container** **Decision**: Single game per LXC container **Rationale**: Industry standard, 3-5x simpler than multi-game **Consequence**: Better isolation, clearer billing, easier debugging **ADR-003: Velocity Over Direct Port Forwarding** **Decision**: Use Velocity proxy for Minecraft routing **Rationale**: Single entry point, dynamic registration, no NAT complexity **Consequence**: No external port allocation needed for MC **ADR-004: Hybrid Pterodactyl + Custom API** **Decision**: Keep Pterodactyl panel, build custom API alongside **Rationale**: Preserve working OAuth, gradual migration path **Consequence**: Dual system complexity, eventual migration needed **ADR-005: Go Agent Architecture** **Decision**: Containerized Go agent handles provisioning **Rationale**: Language-agnostic, self-healing, version-aware **Consequence**: Robust provisioning, automatic repair, clean separation --- ## 🧠 Session Continuity Prompt For AI assistants resuming work on this project: > Resume from ZeroLagHub Master Bootstrap (December 7, 2025). > > **Current State**: Platform 85% launch-ready. All 6 Minecraft variants provisioning successfully via Go agent. Core functionality operational, need critical UX features for competitive parity. > > **Launch Decision**: Recommend +1 week for WebSocket console, crash protection, and disk monitoring. > > **Known Bugs**: 3 non-blocking Go agent issues (Forge glob, fallthrough, stop exclusion). > > **Critical Context**: > - Security vulnerabilities exist but acceptable for soft beta > - Business model validated with 9.75x revenue multiplier > - Developer-to-player pipeline is core differentiator > - LXC performance advantage is primary competitive edge > > **Next Actions**: Fix Go agent bugs, implement critical features, launch beta. --- ## 📞 Support & Escalation - **Platform Owner**: 44 years old, full-stack developer - **AI Coordination**: Claude (architecture) + ChatGPT (implementation) - **Infrastructure**: GTHost dedicated server ($109/month) - **Domain**: zerolaghub.com, zpack.zerolaghub.com - **Public Game IP**: 139.64.165.248 --- ## 📊 Platform Status Summary **Technical Readiness**: 85% complete **Competitive Position**: Ready to compete on core provisioning, need UX polish **Strategic Clarity**: Clear path to launch with validated business model **Infrastructure**: Production-grade with enterprise backup system **Security**: Known vulnerabilities, acceptable for soft beta, must fix before public launch --- ## 🎯 Strategic Recommendation **Recommended Path**: Option B (+1 Week) **Rationale**: 1. WebSocket console is table stakes (competitors have it) 2. Crash protection prevents operational nightmares 3. Disk monitoring prevents data loss 4. 1 week is negligible for long-term platform success 5. Professional launch > rushed launch **Timeline**: - **Dec 7-10**: Implement critical features (WebSocket, crash, disk) - **Dec 11-13**: Testing + bug fixes - **Dec 14**: Soft beta launch (10-20 users) - **Dec 21**: Public launch after beta feedback --- **This document serves as the single source of truth for project continuity. Update after each major milestone or architectural change.** 🚀 **Next action: Decide launch timeline, then implement critical features or launch beta.**