Upload files to "/"

2025-12-13 16:40:48 +00:00 · 2025-12-13 16:40:48 +00:00 · 5a5ad001af
commit 5a5ad001af
parent 882c0c7169
5 changed files with 2595 additions and 0 deletions
--- a/ZeroLagHub_Cross_Project_Tracker.md
+++ b/ZeroLagHub_Cross_Project_Tracker.md
@ -0,0 +1,846 @@
+# 🛡️ ZeroLagHub Cross-Project Tracker & Drift Prevention System
+
+**Last Updated**: December 7, 2025  
+**Version**: 1.0 (Canonical Architecture Enforcement)  
+**Status**: ACTIVE - Must Be Consulted Before All Code Changes
+
+---
+
+## 🎯 Document Purpose
+
+This document establishes **architectural boundaries** across the three major ZeroLagHub systems to prevent drift, confusion, and broken contracts when switching between contexts or AI assistants.
+
+**Critical Insight**: Most project failures come from **gradual architectural erosion**, not sudden breaking changes. This document prevents that.
+
+---
+
+## 📊 The Three Systems (Canonical Ownership)
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                     NODE.JS API v2                          │
+│                  (Orchestration Engine)                     │
+│                                                             │
+│  Owns: VMID allocation, port management, DNS publishing,   │
+│        Velocity registration, job queue, database state    │
+│                                                             │
+│  Speaks To: Proxmox, Cloudflare, Technitium, Velocity,    │
+│             MariaDB, BullMQ, Go Agent (HTTP)               │
+│                                                             │
+│  Never Touches: Container filesystem, game server files,   │
+│                 Java installation, artifact downloads       │
+└─────────────────────┬───────────────────────────────────────┘
+                      │
+                      │ HTTP Contract
+                      │ POST /config, /start, /stop
+                      │ GET /status, /health
+                      │
+                      ▼
+┌─────────────────────────────────────────────────────────────┐
+│                       GO AGENT                              │
+│              (Container-Internal Manager)                   │
+│                                                             │
+│  Owns: Server installation, Java runtime, artifact         │
+│        downloads, process management, READY detection,     │
+│        filesystem layout, verification + self-repair       │
+│                                                             │
+│  Speaks To: Local filesystem, game server process,         │
+│             API (status updates via HTTP polling)          │
+│                                                             │
+│  Never Touches: Proxmox, DNS, Cloudflare, Velocity,        │
+│                 port allocation, VMID selection            │
+└─────────────────────┬───────────────────────────────────────┘
+                      │
+                      │ Status Polling
+                      │ Agent reports state
+                      │
+                      ▼
+┌─────────────────────────────────────────────────────────────┐
+│                    NEXT.JS FRONTEND                         │
+│                   (Customer + Admin UI)                     │
+│                                                             │
+│  Owns: User interaction, form validation, display logic,   │
+│        client-side state, UI components                    │
+│                                                             │
+│  Speaks To: API v2 only (REST + WebSocket when added)      │
+│                                                             │
+│  Never Touches: Proxmox, Go Agent, DNS, Velocity,          │
+│                 Cloudflare, direct container access        │
+└─────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 🗺️ System Ownership Matrix (CANONICAL)
+
+| Area | Node.js API v2 | Go Agent | Frontend |
+|------|----------------|----------|----------|
+| **Provisioning Orchestration** | ✅ OWNER (allocates VMID, ports, builds LXC config) | ❌ Executes inside container | ❌ Triggers only |
+| **Template Selection** | ✅ OWNER (selects template, passes config) | ❌ Template contains agent | ❌ Displays options |
+| **Server Installation** | ❌ Never | ✅ OWNER (Java, artifacts, validation) | ❌ Displays results |
+| **Runtime Control** | ✅ OWNER (sends commands) | ✅ OWNER (executes commands) | ❌ UI only |
+| **DNS (Cloudflare + Technitium)** | ✅ OWNER (creates, deletes, tracks IDs) | ❌ Never | ❌ Displays info |
+| **Velocity Registration** | ✅ OWNER (registers, deregisters) | ❌ Never | ❌ Displays status |
+| **IP Logic** | ✅ OWNER (external + internal IPs) | ❌ Sees container IP only | ❌ Displays final |
+| **Port Allocation** | ✅ OWNER (PortPool DB management) | ❌ Receives assignments | ❌ Displays ports |
+| **Monitoring** | ✅ OWNER (collects metrics) | ✅ OWNER (exposes /health) | ❌ Displays data |
+| **Error Handling** | ✅ OWNER (BullMQ jobs, retries) | ❌ Local output only | ❌ User notifications |
+
+---
+
+## 🔄 API ↔ Agent Contract (IMMUTABLE)
+
+### **API → Agent Endpoints**
+
+```
+POST /config
+├─ Payload: {
+│    game: "minecraft",
+│    variant: "paper",
+│    version: "1.21.3",
+│    ports: [25565, 25575],
+│    memory: "4G",
+│    motd: "Welcome to ZeroLagHub",
+│    worldSettings: {...}
+│  }
+└─ Response: 200 OK
+
+POST /start
+└─ Triggers: ensureProvisioned() → StartServer()
+
+POST /stop
+└─ Triggers: Graceful shutdown via server stop command
+
+POST /restart
+└─ Triggers: stop → start sequence
+
+GET /status
+└─ Response: {
+     state: "RUNNING" | "INSTALLING" | "FAILED",
+     pid: 12345,
+     uptime: 3600,
+     lastError: null
+   }
+
+GET /health
+└─ Response: {
+     healthy: true,
+     java: "/usr/lib/jvm/java-21-openjdk-amd64",
+     variant: "paper",
+     ready: true
+   }
+```
+
+### **Agent → API Signals (via /status polling)**
+
+```
+State Machine:
+INSTALLING
+  ├─> DOWNLOADING_ARTIFACT
+  ├─> INSTALLING_JAVA
+  ├─> FINALIZING
+  ├─> RUNNING
+  └─> FAILED | CRASHED
+```
+
+### **🚫 Forbidden Agent Behaviors**
+
+Agent **NEVER**:
+- ❌ Allocates or manages ports
+- ❌ Talks to Proxmox API
+- ❌ Creates DNS records
+- ❌ Registers with Velocity
+- ❌ Modifies LXC config
+- ❌ Allocates VMIDs
+- ❌ Manages templates
+- ❌ Talks to Cloudflare or Technitium
+
+**Violation Detection**: If agent code imports Proxmox client, DNS client, or port allocation logic → **DRIFT VIOLATION**
+
+---
+
+## 🖥️ Frontend ↔ API Contract (IMMUTABLE)
+
+### **Frontend Allowed Endpoints**
+
+```
+Provisioning:
+POST /api/containers/create
+GET  /api/containers/:vmid
+DELETE /api/containers/:vmid
+
+Control:
+POST /api/containers/:vmid/start
+POST /api/containers/:vmid/stop
+POST /api/containers/:vmid/restart
+
+Status:
+GET /api/containers/:vmid/status
+GET /api/containers/:vmid/logs
+GET /api/containers/:vmid/stats
+
+Discovery:
+GET /api/templates
+GET /api/containers (list user's containers)
+
+Read-Only Info:
+GET /api/dns/:hostname (display only)
+GET /api/velocity/status (display only)
+```
+
+### **🚫 Forbidden Frontend Behaviors**
+
+Frontend **NEVER**:
+- ❌ Talks directly to Go Agent
+- ❌ Calls Proxmox API
+- ❌ Creates DNS records
+- ❌ Registers with Velocity
+- ❌ Allocates ports
+- ❌ Executes container commands
+- ❌ Accesses MariaDB directly
+
+**Violation Detection**: If frontend code imports Proxmox client, agent HTTP client (except via API), or database client → **DRIFT VIOLATION**
+
+---
+
+## 🚨 Drift Detection Rules (ACTIVE ENFORCEMENT)
+
+### **Rule #1: Provisioning Ownership**
+
+**Violation**: Agent asked to allocate ports, choose templates, create DNS, call Proxmox, manage VMIDs
+
+**Correct Path**: API owns ALL provisioning orchestration
+
+**Example Violation**:
+```go
+// WRONG - Agent should NEVER do this
+func (a *Agent) allocatePorts() ([]int, error) {
+    // ... port allocation logic
+}
+```
+
+**Correct Pattern**:
+```javascript
+// RIGHT - API allocates, agent receives
+async function provisionInstance(game, variant) {
+    const ports = await portAllocator.allocate(vmid);
+    await agent.postConfig({ ports, game, variant });
+}
+```
+
+---
+
+### **Rule #2: Artifact Ownership**
+
+**Violation**: API asked to install Java, download server.jar, run installers
+
+**Correct Path**: Agent owns ALL in-container installation
+
+**Example Violation**:
+```javascript
+// WRONG - API should NEVER do this
+async function installMinecraft(vmid, version) {
+    await proxmox.exec(vmid, `wget https://...`);
+    await proxmox.exec(vmid, `java -jar installer.jar`);
+}
+```
+
+**Correct Pattern**:
+```go
+// RIGHT - Agent handles installation
+func (p *Provisioner) ProvisionAll(cfg Config) error {
+    downloadArtifact(cfg.Variant, cfg.Version)
+    installJavaRuntime(cfg.Version)
+    verifyInstallation()
+}
+```
+
+---
+
+### **Rule #3: Direct Container Access**
+
+**Violation**: Frontend or API wants to exec commands directly into container
+
+**Correct Path**: Agent owns container execution layer
+
+**Example Violation**:
+```typescript
+// WRONG - Frontend should NEVER do this
+async function restartServer(vmid: number) {
+    const ssh = new SSHClient();
+    await ssh.connect(containerIP);
+    await ssh.exec('systemctl restart minecraft');
+}
+```
+
+**Correct Pattern**:
+```typescript
+// RIGHT - Frontend talks to API, API talks to agent
+async function restartServer(vmid: number) {
+    await api.post(`/containers/${vmid}/restart`);
+}
+```
+
+---
+
+### **Rule #4: Networking Responsibilities**
+
+**Violation**: Agent asked to select public vs internal IPs, decide DNS zones
+
+**Correct Path**: API owns dual-IP logic (Cloudflare external, Technitium internal)
+
+**Example Violation**:
+```go
+// WRONG - Agent should NEVER decide this
+func (a *Agent) determinePublicIP() string {
+    if a.needsCloudflare() {
+        return "139.64.165.248"
+    }
+    return a.containerIP
+}
+```
+
+**Correct Pattern**:
+```javascript
+// RIGHT - API decides network topology
+function determineIPs(vmid, game) {
+    const internalIP = `10.200.0.${vmid - 1000}`;
+    const externalIP = "139.64.165.248"; // Cloudflare target
+    const velocityIP = "10.70.0.241"; // Internal routing
+    
+    return { internalIP, externalIP, velocityIP };
+}
+```
+
+---
+
+### **Rule #5: Proxy Responsibilities**
+
+**Violation**: Agent asked to register with Velocity, configure proxy routing
+
+**Correct Path**: API owns ALL proxy integrations
+
+**Example Violation**:
+```go
+// WRONG - Agent should NEVER do this
+func (a *Agent) registerWithVelocity() error {
+    client := velocity.NewClient()
+    return client.Register(a.hostname, a.port)
+}
+```
+
+**Correct Pattern**:
+```javascript
+// RIGHT - API handles Velocity registration
+async function registerVelocity(vmid, hostname, internalIP) {
+    await velocityBridge.registerBackend({
+        name: hostname,
+        address: internalIP,
+        port: 25565
+    });
+}
+```
+
+---
+
+## 🔄 Context Switching Safety Workflow
+
+### **When Moving to Node.js API Work:**
+
+**Pre-Switch Checklist**:
+- [ ] Agent contract unchanged? (POST /config, /start, /stop, GET /status)
+- [ ] Database schema unchanged? (Prisma models consistent)
+- [ ] LXC template IDs unchanged? (VMID 800 for game, 6000-series for dev)
+- [ ] DNS/IP logic consistent? (Cloudflare external, Technitium internal)
+- [ ] Port allocation logic preserved? (PortPool DB-backed)
+
+**Common Drift Patterns**:
+- ⚠️ Adding agent installation logic to API
+- ⚠️ Changing agent contract without updating both sides
+- ⚠️ Moving DNS logic to different service
+- ⚠️ Bypassing job queue for provisioning
+
+---
+
+### **When Moving to Go Agent Work:**
+
+**Pre-Switch Checklist**:
+- [ ] No provisioning logic outside allowed scope (no port allocation, DNS, etc.)
+- [ ] File paths remain canonical (`/opt/zlh/<game>/<variant>/world`)
+- [ ] Naming conventions maintained (`server.jar`, `fabric-server.jar`, etc.)
+- [ ] No external API calls (Proxmox, DNS, Velocity)
+- [ ] Status states unchanged (INSTALLING, RUNNING, FAILED, etc.)
+
+**Common Drift Patterns**:
+- ⚠️ Adding port allocation to agent
+- ⚠️ Making agent talk to external services
+- ⚠️ Changing directory structure without API coordination
+- ⚠️ Adding orchestration logic to agent
+
+---
+
+### **When Moving to Frontend Work:**
+
+**Pre-Switch Checklist**:
+- [ ] Only API-approved fields used in UI
+- [ ] No direct agent HTTP calls
+- [ ] VMID not exposed in user-facing UI
+- [ ] Internal IPs not displayed to users
+- [ ] All state from API, not computed locally
+
+**Common Drift Patterns**:
+- ⚠️ Adding direct agent calls from frontend
+- ⚠️ Computing server state client-side
+- ⚠️ Exposing internal infrastructure details
+- ⚠️ Bypassing API for container control
+
+---
+
+## 🚨 High-Risk Integration Zones (GUARDED)
+
+These areas have historically caused drift across sessions:
+
+### **1. Forge / NeoForge Installation Logic**
+- **Risk**: Agent vs API confusion on who handles `run.sh` patching
+- **Guard**: Agent owns ALL Forge installation, API just passes config
+- **Test**: Can provision Forge 1.21.3 without API filesystem access?
+
+### **2. Cloudflare SRV Deletion**
+- **Risk**: Case sensitivity, subdomain normalization, record ID tracking
+- **Guard**: API stores Cloudflare record IDs in EdgeState, deletes by ID
+- **Test**: Create → Delete → Recreate same hostname without orphans?
+
+### **3. Technitium DNS Zone Mismatch**
+- **Risk**: Wrong zone selection, duplicate records
+- **Guard**: API hardcodes zone as `zpack.zerolaghub.com`, validates before creation
+- **Test**: No records created in wrong zones?
+
+### **4. Velocity Registration Order**
+- **Risk**: Registering before server ready, deregistering incorrectly
+- **Guard**: API waits for agent RUNNING state, then registers Velocity
+- **Test**: Player connection works immediately after provisioning complete?
+
+### **5. PortPool Commit Logic**
+- **Risk**: Race conditions, double-allocation, uncommitted ports
+- **Guard**: API allocates → provisions → commits (rollback on failure)
+- **Test**: Concurrent provisions don't collide on ports?
+
+### **6. Agent READY Detection**
+- **Risk**: False negatives, false positives, variant-specific patterns
+- **Guard**: Agent uses variant-aware log parsing, multiple confirmation lines
+- **Test**: All 6 variants correctly detect READY state?
+
+### **7. Server Start-Up False Negatives**
+- **Risk**: Timeout too short, log parsing too strict
+- **Guard**: Agent increases timeout for Forge (90s), multiple log patterns
+- **Test**: Forge installer completes without false failure?
+
+### **8. IP Selection Logic**
+- **Risk**: Confusing external (Cloudflare) vs internal (Velocity/Technitium) IPs
+- **Guard**: API clearly separates: externalIP (139.64.165.248), internalIP (10.200.0.X), velocityIP (10.70.0.241)
+- **Test**: DNS points to correct IPs, Velocity routes to correct internal IP?
+
+---
+
+## 📋 Architecture Decision Log (LOCKED) ⭐ NEW
+
+**Purpose**: Records finalized architectural decisions that **must not be re-litigated** unless explicitly requested.
+
+**Status**: LOCKED - These decisions are final and cannot be changed without user explicitly saying "Revisit decision X"
+
+---
+
+### **DEC-001: Templates vs Go Agent (FINAL)**
+
+**Decision**: Hybrid model
+- LXC templates define **base environment only**
+- Go Agent is **authoritative execution layer** inside containers
+
+**Rationale**:
+- Templates alone cannot handle multi-variant logic (Forge, NeoForge, Fabric)
+- Agent enables self-repair, async provisioning, runtime control
+- Hybrid provides speed + flexibility without API container access
+
+**Applies To**: API v2, Go Agent, Frontend  
+**Status**: ✅ LOCKED
+
+---
+
+### **DEC-002: Provisioning Authority**
+
+**Decision**: API orchestrates, Agent executes
+
+**Rationale**:
+- API has global visibility (DB, DNS, Proxmox, Velocity)
+- Agent is intentionally sandboxed to container filesystem + process
+
+**Applies To**: All systems  
+**Status**: ✅ LOCKED
+
+---
+
+### **DEC-003: DNS & Edge Publishing Ownership**
+
+**Decision**: API-only responsibility
+
+**Rationale**:
+- Requires external credentials (Cloudflare, Technitium)
+- Must correlate DB state, record IDs, reconciliation jobs
+
+**Applies To**: API v2  
+**Status**: ✅ LOCKED
+
+---
+
+### **DEC-004: Proxy Stack**
+
+**Decision**: Traefik + Velocity only
+
+**Rationale**:
+- Traefik for HTTP/control-plane
+- Velocity for Minecraft TCP routing
+- **HAProxy explicitly deprecated** for ZeroLagHub
+
+**Applies To**: Infrastructure, API  
+**Status**: ✅ LOCKED
+
+---
+
+### **DEC-005: State Persistence**
+
+**Decision**: MariaDB is single source of truth
+
+**Rationale**:
+- Flat files caused race conditions and drift
+- DB enables reconciliation, recovery, observability
+
+**Applies To**: API v2  
+**Status**: ✅ LOCKED
+
+---
+
+### **DEC-006: Frontend Access Model**
+
+**Decision**: Frontend communicates with API only
+
+**Rationale**:
+- Security boundary
+- Prevents leaking infrastructure details
+
+**Applies To**: Frontend  
+**Status**: ✅ LOCKED
+
+---
+
+### **DEC-007: Architecture Enforcement Policy**
+
+**Decision**: Drift prevention is mandatory
+
+**Rationale**:
+- Prevents oscillation between alternatives
+- Preserves velocity during late-stage development
+
+**Applies To**: All work sessions  
+**Status**: ✅ LOCKED
+
+---
+
+## 📋 Canonical Architecture Anchors (ABSOLUTE)
+
+These rules are **immutable** unless explicitly changed with full system review:
+
+### **Anchor #1: Orchestration**
+✅ API v2 orchestrates EVERYTHING (jobs, provisioning, DNS, proxy, lifecycle)
+
+### **Anchor #2: Container-Internal**
+✅ Agent performs EVERYTHING inside container (install, start, stop, detect)
+
+### **Anchor #3: Templates**
+✅ Templates contain agent + base environment (VMID 800 for game, 6000-series for dev)
+
+### **Anchor #4: Job Queue**
+✅ BullMQ/Redis drive job system (async provisioning, retries, reconciliation)
+
+### **Anchor #5: Database**
+✅ MariaDB holds all state (PortPool, ContainerInstance, EdgeState, etc.)
+
+### **Anchor #6: Infrastructure**
+✅ Proxmox API (not Ansible) for LXC management
+
+### **Anchor #7: Routing**
+✅ Traefik (HTTP) + Velocity (Minecraft) are ONLY routing/proxy systems
+
+### **Anchor #8: DNS**
+✅ Cloudflare = authoritative public DNS  
+✅ Technitium = authoritative internal DNS
+
+### **Anchor #9: Frontend Isolation**
+✅ Frontend speaks ONLY to API (no direct agent, Proxmox, DNS, Velocity)
+
+### **Anchor #10: Directory Structure**
+✅ `/opt/zlh/<game>/<variant>/world` is canonical game server path
+
+---
+
+## 🛡️ Enforcement Policies (ACTIVE)
+
+When future instructions conflict with Canonical Architecture or Architecture Decision Log:
+
+### **Step 1: STOP**
+Immediately halt the task. Do not proceed with drift-inducing change.
+
+### **Step 2: RAISE DRIFT WARNING**
+```
+⚠️ DRIFT WARNING ⚠️
+
+Proposed change violates Canonical Architecture:
+
+Rule Violated: [Rule #X: Description]
+OR
+Decision Violated: [DEC-XXX: Decision Name]
+
+Violation: [Specific behavior that violates rule/decision]
+Correct Path: [Architecture-aligned approach]
+
+Impact: [What breaks if this drift is allowed]
+
+Options:
+1. Implement correct architecture-aligned path
+2. Amend Canonical Architecture (requires full system review)
+3. Request user to "Revisit decision X" (for ADL changes)
+4. Cancel proposed change
+```
+
+### **Step 3: PROVIDE CORRECT PATH**
+Show the architecture-aligned implementation that achieves the same goal.
+
+### **Step 4: ASK FOR DIRECTION**
+```
+Should we:
+A) Implement the correct architecture-aligned path?
+B) Perform full system review to amend architecture?
+C) Request user to explicitly revisit locked decision?
+D) Cancel this change?
+```
+
+### **Step 5: ARCHITECTURE DECISION LOG SPECIAL RULE**
+If violation is against a **LOCKED** Architecture Decision (DEC-001 through DEC-007):
+
+**ADDITIONAL CHECK**:
+```
+⚠️ LOCKED DECISION WARNING ⚠️
+
+This change conflicts with Architecture Decision Log entry: [DEC-XXX]
+Status: LOCKED
+
+This decision can ONLY be changed if the user explicitly says:
+"Revisit decision [DEC-XXX]"
+
+Without explicit user request to revisit, this decision is FINAL.
+
+Proceeding with this change would violate architectural governance.
+```
+
+**Never proceed** with drift-inducing changes without explicit confirmation.  
+**Never re-litigate** locked decisions without user explicitly requesting revision.
+
+---
+
+## 📊 Drift Detection Examples
+
+### **Example 1: Agent Port Allocation (VIOLATION)**
+
+**Proposed Change**:
+```go
+// Agent code proposal
+func (a *Agent) allocatePort() int {
+    // Find available port...
+    return port
+}
+```
+
+**Drift Warning**:
+```
+⚠️ DRIFT WARNING ⚠️
+
+Rule Violated: Rule #1 - Provisioning Ownership
+Violation: Agent attempting to allocate ports
+Correct Path: API allocates ports, agent receives them via /config
+
+Impact: Port collisions, database inconsistency, broken PortPool
+
+Recommendation: Remove port allocation from agent, ensure API sends ports in config payload
+```
+
+---
+
+### **Example 2: Frontend Direct Agent Call (VIOLATION)**
+
+**Proposed Change**:
+```typescript
+// Frontend code proposal
+async function getServerLogs(vmid: number) {
+    const agentURL = `http://10.200.0.${vmid}:8080/logs`;
+    return await fetch(agentURL);
+}
+```
+
+**Drift Warning**:
+```
+⚠️ DRIFT WARNING ⚠️
+
+Rule Violated: Rule #3 - Direct Container Access
+Violation: Frontend bypassing API to talk to agent
+Correct Path: Frontend → API → Agent (API proxies logs)
+
+Impact: Broken frontend if container IP changes, no auth/rate limiting, security risk
+
+Recommendation: Add GET /api/containers/:vmid/logs endpoint that proxies to agent
+```
+
+---
+
+### **Example 3: API Installing Java (VIOLATION)**
+
+**Proposed Change**:
+```javascript
+// API code proposal
+async function provisionServer(vmid, game, variant) {
+    await proxmox.exec(vmid, 'apt-get install openjdk-21-jdk');
+    await proxmox.exec(vmid, 'wget https://papermc.io/...');
+}
+```
+
+**Drift Warning**:
+```
+⚠️ DRIFT WARNING ⚠️
+
+Rule Violated: Rule #2 - Artifact Ownership
+Violation: API performing in-container installation
+Correct Path: API sends config to agent, agent handles installation
+
+Impact: Breaks agent self-repair, variant-specific logic duplicated, no verification system
+
+Recommendation: Remove installation logic from API, ensure agent receives proper config via POST /config
+```
+
+---
+
+## 📁 Integration with Existing Documentation
+
+### **Relationship to Master Bootstrap**
+- Master Bootstrap: Strategic overview and business model
+- **This Document**: Technical governance and boundary enforcement
+- **Usage**: Consult this before implementing ANY code changes
+
+### **Relationship to Complete Current State**
+- Complete Current State: What's working, what's next
+- **This Document**: How things MUST work (regardless of current state)
+- **Usage**: This is the "law", current state is "status"
+
+### **Relationship to Engineering Handover**
+- Engineering Handover: Daily tactical tasks and sprint plan
+- **This Document**: Constraints within which tasks must be implemented
+- **Usage**: Check this before starting each handover task
+
+---
+
+## 🔄 Architecture Amendment Process
+
+If legitimate need to change Canonical Architecture:
+
+### **Step 1: Identify Change**
+Document exactly what architectural boundary needs to change and why.
+
+### **Step 2: Full System Impact Analysis**
+- What breaks in API?
+- What breaks in Agent?
+- What breaks in Frontend?
+- What changes to contracts?
+- What database migrations needed?
+
+### **Step 3: Update ALL Affected Documents**
+- This document (Canonical Architecture)
+- Master Bootstrap (if strategic impact)
+- Complete Current State (implementation changes)
+- Engineering Handover (sprint tasks)
+- Agent Spec, Operational Guide (if affected)
+
+### **Step 4: Update ALL Systems**
+- API code + tests
+- Agent code + tests
+- Frontend code + tests
+- Database schema (migration)
+- Infrastructure config
+
+### **Step 5: Validation**
+- Integration tests pass?
+- No new drift introduced?
+- Documentation consistent?
+- All AIs briefed on change?
+
+**Only after ALL steps** is architecture amendment complete.
+
+---
+
+## 🎯 Quick Reference Card
+
+### **API Owns**
+- ✅ Provisioning orchestration
+- ✅ Port allocation (PortPool)
+- ✅ DNS (Cloudflare + Technitium)
+- ✅ Velocity registration
+- ✅ IP logic (external + internal)
+- ✅ Job queue (BullMQ)
+- ✅ Database state
+
+### **Agent Owns**
+- ✅ Container-internal installation
+- ✅ Java runtime
+- ✅ Artifact downloads
+- ✅ Server process management
+- ✅ READY detection
+- ✅ Self-repair + verification
+- ✅ Filesystem layout
+
+### **Frontend Owns**
+- ✅ User interaction
+- ✅ Display logic
+- ✅ Client state
+- ✅ Form validation
+
+### **Never**
+- ❌ Agent allocates ports
+- ❌ Agent talks to DNS/Velocity/Proxmox
+- ❌ API installs server files
+- ❌ API executes in-container commands
+- ❌ Frontend talks to agent directly
+- ❌ Frontend talks to infrastructure
+
+---
+
+## 📋 Session Start Checklist
+
+Before every coding session with any AI:
+
+- [ ] Read this document's Quick Reference Card
+- [ ] Identify which system you're working on (API, Agent, Frontend)
+- [ ] Review that system's "Owns" list
+- [ ] Check High-Risk Integration Zones if touching those areas
+- [ ] Verify no drift from previous session
+- [ ] Confirm contracts unchanged since last session
+
+**If ANY doubt**: Re-read full Canonical Architecture Anchors section.
+
+---
+
+## ✅ Document Status
+
+**Status**: ACTIVE - Must be consulted before all code changes  
+**Enforcement**: MANDATORY - Drift violations must be caught  
+**Authority**: CANONICAL - Overrides conflicting guidance  
+**Updates**: Only via Architecture Amendment Process
+
+---
+
+🛡️ **This document prevents architectural drift. Violate at your own risk.**
--- a/ZeroLagHub_Drift_Prevention_Card.md
+++ b/ZeroLagHub_Drift_Prevention_Card.md
@ -0,0 +1,105 @@
+# 🛡️ ZeroLagHub Drift Prevention - Quick Start Card
+
+**Use This**: At the start of EVERY coding session (API, Agent, or Frontend)
+
+---
+
+## ⚡ 30-Second Architecture Check
+
+### **Working on API?**
+✅ Can allocate: Ports, VMIDs, DNS, Velocity  
+❌ Cannot: Install Java, download artifacts, exec in container
+
+### **Working on Agent?**
+✅ Can install: Java, artifacts, server files  
+❌ Cannot: Allocate ports, create DNS, call Proxmox, register Velocity
+
+### **Working on Frontend?**
+✅ Can: Display data, call API endpoints  
+❌ Cannot: Talk to Agent, Proxmox, DNS, Velocity, allocate anything
+
+---
+
+## 🔒 7 Locked Architectural Decisions (NEW) ⭐
+
+**These decisions are FINAL** - cannot be changed without user saying "Revisit decision X":
+
+1. **DEC-001**: Templates + Agent hybrid (not templates-only or agent-only)
+2. **DEC-002**: API orchestrates, Agent executes (not reversed)
+3. **DEC-003**: API owns DNS (Agent never creates DNS)
+4. **DEC-004**: Traefik + Velocity only (no HAProxy)
+5. **DEC-005**: MariaDB is source of truth (no flat files)
+6. **DEC-006**: Frontend → API only (no direct agent calls)
+7. **DEC-007**: Drift prevention mandatory (always enforced)
+
+**If proposing change that conflicts with DEC-001 through DEC-007**:
+→ STOP → Consult Cross-Project Tracker → Request "Revisit decision X" from user
+
+---
+
+## 🚨 Drift Detection Triggers
+
+**STOP and consult full tracker if you hear:**
+
+1. "Agent should allocate ports..."
+2. "API should install Java inside container..."
+3. "Frontend should call agent directly..."
+4. "Agent should register with Velocity..."
+5. "API should decide what Java version..."
+6. "Frontend should manage DNS records..."
+7. "Let's use templates only..." (violates DEC-001)
+8. "Let's add HAProxy..." (violates DEC-004)
+9. "Let's use flat files instead of DB..." (violates DEC-005)
+
+**All of these are VIOLATIONS** → Consult [ZeroLagHub_Cross_Project_Tracker.md](computer:///mnt/user-data/outputs/ZeroLagHub_Cross_Project_Tracker.md)
+
+---
+
+## 📋 Pre-Coding Checklist
+
+Before writing ANY code:
+
+- [ ] Which system am I modifying? (API / Agent / Frontend)
+- [ ] Does this change cross boundaries? (If yes → read full tracker)
+- [ ] Am I adding external API calls to Agent? (If yes → VIOLATION)
+- [ ] Am I adding container execution to API? (If yes → VIOLATION)
+- [ ] Am I bypassing API in Frontend? (If yes → VIOLATION)
+
+---
+
+## 🎯 The Golden Rules
+
+1. **API orchestrates** (allocates resources, publishes state)
+2. **Agent executes** (installs, runs, monitors inside container)
+3. **Frontend displays** (no direct infrastructure access)
+
+**Anything else** → Drift → Consult full tracker
+
+---
+
+## 📞 Quick Reference
+
+**Full Tracker**: [ZeroLagHub_Cross_Project_Tracker.md](computer:///mnt/user-data/outputs/ZeroLagHub_Cross_Project_Tracker.md)
+
+**High-Risk Zones**:
+- Forge/NeoForge installation
+- Cloudflare SRV deletion
+- Velocity registration order
+- PortPool commit logic
+- Agent READY detection
+- IP selection logic
+
+---
+
+## ✅ Session Start Command
+
+```
+Read ZeroLagHub Cross-Project Tracker Quick Start Card.
+Activate drift detection.
+Confirm which system I'm working on: [API / Agent / Frontend]
+Proceed with architecture-aligned implementation only.
+```
+
+---
+
+🛡️ **Drift prevention ACTIVE. Proceed with confidence.**
--- a/ZeroLagHub_GPT_Implementation_Handover_Dec2025.md
+++ b/ZeroLagHub_GPT_Implementation_Handover_Dec2025.md
@ -0,0 +1,573 @@
+# 🚀 ZeroLagHub - GPT Implementation Handover (December 2025)
+
+**Last Updated**: December 7, 2025  
+**Version**: 4.0 (Launch-Ready Implementation Guide)  
+**Status**: 85% Platform Complete - Active Development Sprint
+
+---
+
+## 🎯 Your Role (GPT - Implementation AI)
+
+**You are**: The tactical implementation AI responsible for **building features**  
+**Claude is**: The strategic architecture AI responsible for **design decisions**
+
+### **Your Responsibilities**
+✅ Implement features within architectural boundaries  
+✅ Write code for API, Agent, and Frontend  
+✅ Fix bugs and optimize performance  
+✅ Execute sprint tasks from Kanban board  
+✅ Consult Cross-Project Tracker before crossing system boundaries
+
+### **Your Constraints**
+❌ Do NOT make architectural decisions without Claude  
+❌ Do NOT violate ownership boundaries (API vs Agent vs Frontend)  
+❌ Do NOT change contracts without updating both sides  
+❌ Do NOT skip drift prevention checks
+
+---
+
+## 📋 Critical Documents (READ THESE FIRST)
+
+### **🛡️ MANDATORY Before ANY Code**
+1. **[Drift Prevention Card](computer:///mnt/user-data/outputs/ZeroLagHub_Drift_Prevention_Card.md)** (30 seconds)
+   - Quick boundary check for every session
+   - Violation triggers to watch for
+
+2. **[Cross-Project Tracker](computer:///mnt/user-data/outputs/ZeroLagHub_Cross_Project_Tracker.md)** (consult before changes)
+   - Ownership matrix (API vs Agent vs Frontend)
+   - Canonical contracts (API ↔ Agent, Frontend ↔ API)
+   - Drift detection rules with examples
+
+### **📊 Implementation Context**
+3. **[Complete Current State](computer:///mnt/user-data/outputs/ZeroLagHub_Complete_Current_State_Dec7.md)** (5 minutes)
+   - Engineering Kanban (DONE/IN PROGRESS/TODO)
+   - 3-day sprint plan with tasks
+   - Troubleshooting guides per variant
+   - Launch readiness matrix
+
+4. **Engineering Handover** (from today's uploaded document)
+   - System lifecycle diagram
+   - Provisioning sequence
+   - Verification system specs
+   - Today's accomplishments
+
+### **🔧 Technical Reference**
+5. **[Agent Complete Spec](computer:///home/claude/ZeroLagHub_Agent_Complete_Spec.md)** (as needed)
+   - Go agent implementation details
+   - API endpoints and contracts
+   
+6. **[Infrastructure Specs](computer:///mnt/user-data/outputs/ZeroLagHub_Infrastructure_Specifications.md)** (as needed)
+   - GTHost hardware constraints
+   - Capacity planning
+
+---
+
+## 🎯 Current Platform Status (December 7, 2025)
+
+### **What's Working** ✅ (85% Complete)
+
+**Core Provisioning Pipeline**:
+- ✅ All 6 Minecraft variants (Vanilla, Paper, Purpur, Fabric, Forge, NeoForge)
+- ✅ VMID allocation (sequential)
+- ✅ LXC container creation (template VMID 800)
+- ✅ IP detection (10.200.0.X)
+- ✅ Go agent deployment + self-repair
+- ✅ Java runtime auto-selection (17/21)
+- ✅ DNS automation (Cloudflare + Technitium)
+- ✅ Velocity proxy registration
+- ✅ Start/stop/restart control
+- ✅ Console command injection
+- ✅ Log tailing (HTTP polling)
+- ✅ Crash detection
+
+**Supported Minecraft Versions**: 1.12.2, 1.16.5, 1.18.2, 1.19.2, 1.20.1, 1.21.x
+
+### **What's Missing** ❌ (15% To-Do)
+
+**Critical for Launch** (7-9 hours):
+- ❌ WebSocket console streaming (4-6 hours) - **HIGH PRIORITY**
+- ❌ Crash loop protection with backoff (2 hours) - **HIGH PRIORITY**
+- ❌ Disk space monitoring (1 hour) - **HIGH PRIORITY**
+
+**Dev Platform** (1 day):
+- 🔧 Dev container provisioning (Day 1 of sprint)
+- 🔧 EdgeState schema migration (Day 2 of sprint)
+- 🔧 Reconciliation job (Day 3 of sprint)
+
+**Nice-to-Have** (future):
+- 📋 File upload/download
+- 📋 Backup/restore UI
+- 📋 Resource monitoring dashboard
+
+---
+
+## 🗺️ System Architecture (Know Your Boundaries)
+
+### **Three-System Ownership**
+
+```
+┌─────────────────────────────────────────┐
+│         NODE.JS API (Orchestrator)      │
+│ You Own: Routes, services, job queue    │
+│ Speaks To: Proxmox, DNS, Velocity, DB   │
+│ Never: Installs Java, downloads files   │
+└────────────┬────────────────────────────┘
+             │ HTTP Contract
+             │ POST /config, /start, /stop
+             │ GET /status, /health
+             ▼
+┌─────────────────────────────────────────┐
+│       GO AGENT (Container Manager)      │
+│ You Own: Installation, verification     │
+│ Speaks To: Filesystem, game process     │
+│ Never: Allocates ports, creates DNS     │
+└────────────┬────────────────────────────┘
+             │ Status Polling
+             ▼
+┌─────────────────────────────────────────┐
+│      NEXT.JS FRONTEND (UI Only)         │
+│ You Own: Components, client state       │
+│ Speaks To: API only                     │
+│ Never: Agent, Proxmox, DNS, Velocity    │
+└─────────────────────────────────────────┘
+```
+
+### **Critical Boundaries (NEVER CROSS)**
+
+**API Must NOT**:
+- ❌ Install Java inside containers
+- ❌ Download game server files
+- ❌ Execute commands directly in containers (use Agent)
+
+**Agent Must NOT**:
+- ❌ Allocate ports
+- ❌ Create DNS records
+- ❌ Register with Velocity
+- ❌ Talk to Proxmox API
+- ❌ Manage VMIDs
+
+**Frontend Must NOT**:
+- ❌ Talk directly to Agent
+- ❌ Call Proxmox API
+- ❌ Create DNS records
+- ❌ Bypass API for any infrastructure
+
+**Violation = STOP → Consult Cross-Project Tracker**
+
+---
+
+## 📅 3-Day Sprint Plan (Your Tasks)
+
+### **Day 1: Dev Containers** (December 8)
+
+**Goal**: Enable developer environment provisioning
+
+**Tasks**:
+1. [ ] Define dev container spec (Python, Node, Go, Java)
+2. [ ] Create template VMID 6000 (base dev environment)
+3. [ ] API: Add `/api/dev-instances` endpoints (create, delete, status)
+4. [ ] Agent: Add dev provisioning flow (no game server start)
+5. [ ] Test: Provision Python + Node dev environments
+
+**Success Criteria**:
+- Can provision dev environment with chosen language
+- Dev container accessible via SSH or web console
+- No game server logic triggered
+
+**Files to Modify**:
+- `src/routes/devInstances.js` (new)
+- `src/services/devProvisioner.js` (new)
+- Agent: `dev.go` (new provisioning flow)
+
+---
+
+### **Day 2: EdgeState + DNS Reliability** (December 9)
+
+**Goal**: Fix Cloudflare SRV deletion problem
+
+**Tasks**:
+1. [ ] Implement EdgeState model in Prisma schema
+2. [ ] Update `edgePublisher.js` to store Cloudflare record IDs
+3. [ ] Update `dePublisher.js` to delete by record ID (not hostname)
+4. [ ] Test: Create → Delete → Recreate same hostname
+
+**Success Criteria**:
+- No orphaned DNS records after deletion
+- EdgeState tracks all Cloudflare record IDs
+- Re-provisioning same hostname works
+
+**Files to Modify**:
+- `prisma/schema.prisma` (add EdgeState model)
+- `src/services/edgePublisher.js`
+- `src/services/dePublisher.js`
+- `src/clients/cloudflareClient.js` (return record IDs)
+
+---
+
+### **Day 3: Reconciliation + Hardening** (December 10)
+
+**Goal**: Self-healing infrastructure
+
+**Tasks**:
+1. [ ] Create reconciliation job (DB ↔ Proxmox ↔ DNS ↔ Velocity)
+2. [ ] Detect orphaned containers (in Proxmox but not DB)
+3. [ ] Detect orphaned DNS records (in DNS but not DB)
+4. [ ] Auto-cleanup with confirmation prompt
+5. [ ] Regression test suite
+
+**Success Criteria**:
+- Reconciliation job detects all orphans
+- Can auto-clean with user confirmation
+- System recovers from partial failures
+
+**Files to Create**:
+- `src/jobs/reconciliationJob.js`
+- `src/services/reconciler.js` (rewrite)
+- `tests/reconciliation.test.js`
+
+---
+
+## 🐛 Known Bugs (Fix These)
+
+### **Go Agent Bugs** (Non-Blocking, But Should Fix)
+
+1. **Forge server.jar Glob Logic** (`artifacts.go` lines 112-116, 147-151)
+   ```go
+   // REMOVE THIS - Forge doesn't create server.jar
+   serverJarPath := filepath.Join(installDir, "*server.jar")
+   ```
+   **Fix**: Remove glob/rename logic entirely
+
+2. **ensureProvisioned() Fallthrough** (`agent.go` lines 155-171)
+   ```go
+   // ADD ELSE HERE
+   if variant == "forge" || variant == "neoforge" {
+       // Forge logic
+   } else {  // <-- ADD THIS
+       // Vanilla-like logic
+   }
+   ```
+
+3. **Forge Stop Command Exclusion** (`process.go` line 83)
+   ```go
+   // REMOVE THIS EXCLUSION - Forge accepts stop commands
+   if p.variant != "forge" && p.variant != "neoforge" {
+       p.sendCommand("stop")
+   }
+   ```
+   **Fix**: Remove the if condition, send stop to all variants
+
+---
+
+## 🚨 High-Risk Integration Zones (Careful!)
+
+These areas have caused drift in past sessions:
+
+1. **Forge/NeoForge Installation**
+   - Agent owns ALL installation logic
+   - API only passes config, never executes install commands
+
+2. **Cloudflare SRV Deletion**
+   - Must use record IDs (not hostname inference)
+   - Store IDs in EdgeState on creation
+
+3. **Velocity Registration Order**
+   - Wait for Agent to report RUNNING state
+   - Then register with Velocity (not before)
+
+4. **Port Allocation**
+   - API allocates → provisions → commits
+   - Rollback on failure (don't leak ports)
+
+5. **Agent READY Detection**
+   - Use variant-aware log parsing
+   - Forge takes 60-90s (don't timeout early)
+
+---
+
+## 📁 Key File Locations
+
+### **API Service** (`/home/zlh/zlh-api-v2/`)
+```
+src/
+├── routes/
+│   ├── containers.js          # Game server endpoints
+│   └── devInstances.js        # Dev environment endpoints (TO CREATE)
+├── services/
+│   ├── edgePublisher.js       # DNS + Velocity publishing
+│   ├── dePublisher.js         # Edge cleanup (NEEDS REWRITE)
+│   ├── portAllocator.js       # Port management
+│   └── reconciler.js          # Orphan detection (NEEDS REWRITE)
+├── clients/
+│   ├── cloudflareClient.js    # Cloudflare API (UPDATE for record IDs)
+│   ├── technitiumClient.js    # Technitium DNS
+│   └── proxmoxClient.js       # Proxmox API
+└── jobs/
+    └── reconciliationJob.js   # Self-healing job (TO CREATE)
+```
+
+### **Go Agent** (`/opt/zlh-agent/`)
+```
+├── agent.go              # Main provisioning logic (FIX fallthrough)
+├── artifacts.go          # Download + verification (REMOVE Forge glob)
+├── process.go            # Server lifecycle (FIX Forge stop exclusion)
+├── api.go                # HTTP server for control
+└── dev.go                # Dev environment provisioning (TO CREATE)
+```
+
+### **Frontend** (`/home/zlh/zlh-portal/`)
+```
+src/
+├── app/
+│   ├── containers/       # Game server UI
+│   └── dev/              # Dev environment UI (TO CREATE)
+└── components/
+    └── Console.tsx       # WebSocket console (TO CREATE)
+```
+
+---
+
+## 🎮 Minecraft Variant Status
+
+| Variant | Install | Verify | Start | READY Detection | Status |
+|---------|---------|--------|-------|-----------------|--------|
+| Vanilla | ✅ | ✅ | ✅ | ✅ | Production |
+| Paper | ✅ | ✅ | ✅ | ✅ | Production |
+| Purpur | ✅ | ✅ | ✅ | ✅ | Production |
+| Fabric | ✅ | ✅ | ✅ | ✅ | Production |
+| Forge | ✅ | ✅ | ✅ | ✅ | Production (has bugs) |
+| NeoForge | ✅ | ✅ | ✅ | ✅ | Production |
+
+**All variants work** - 3 non-blocking bugs should be fixed for code quality.
+
+---
+
+## 🔄 Provisioning Flow (Know This)
+
+```
+1. User creates server via Frontend
+   ↓
+2. Frontend → API: POST /api/containers/create
+   ↓
+3. API allocates VMID, ports (if needed)
+   ↓
+4. API clones LXC from template VMID 800
+   ↓
+5. API configures container (IP, resources)
+   ↓
+6. API starts LXC
+   ↓
+7. API detects container IP (10.200.0.X)
+   ↓
+8. API → Agent: POST /config (payload)
+   ↓
+9. Agent saves payload.json
+   ↓
+10. Agent spawns install goroutine (async)
+    ├─ Download Java
+    ├─ Download game artifacts
+    ├─ Verify installation
+    └─ Self-repair if needed
+   ↓
+11. Agent starts server
+   ↓
+12. Agent detects READY (log parsing)
+   ↓
+13. Agent sets state = RUNNING
+   ↓
+14. API polls /status until RUNNING
+   ↓
+15. API saves to database
+   ↓
+16. API publishes DNS (Cloudflare + Technitium)
+   ↓
+17. API registers with Velocity
+   ↓
+18. API returns SUCCESS to user
+   ↓
+COMPLETE ✅
+```
+
+---
+
+## 🛡️ Drift Prevention (ACTIVE)
+
+### **Before Writing ANY Code**
+
+**Ask yourself**:
+1. Which system am I modifying? (API / Agent / Frontend)
+2. Does this cross boundaries? (If yes → read Cross-Project Tracker)
+3. Am I adding external calls to Agent? (If yes → VIOLATION)
+4. Am I adding container execution to API? (If yes → VIOLATION)
+5. Am I bypassing API in Frontend? (If yes → VIOLATION)
+
+**If ANY doubt** → Stop and consult [Cross-Project Tracker](computer:///mnt/user-data/outputs/ZeroLagHub_Cross_Project_Tracker.md)
+
+### **Common Violations to Avoid**
+
+❌ Agent allocating ports → API owns this  
+❌ API installing Java → Agent owns this  
+❌ Frontend calling Agent directly → Must go through API  
+❌ Agent creating DNS → API owns this  
+❌ API deciding Java version → Agent owns this (version-aware)  
+
+---
+
+## 🎯 Success Metrics (How You'll Be Measured)
+
+### **Sprint Completion**
+- [ ] All 3 days of sprint tasks completed
+- [ ] Dev containers operational
+- [ ] EdgeState tracking DNS record IDs
+- [ ] Reconciliation job working
+
+### **Code Quality**
+- [ ] No architectural violations (follow Cross-Project Tracker)
+- [ ] All 3 Go agent bugs fixed
+- [ ] Tests passing
+- [ ] No new drift introduced
+
+### **Platform Readiness**
+- [ ] 95% launch-ready after sprint
+- [ ] All MC variants still working
+- [ ] No regressions from changes
+
+---
+
+## 🧪 Testing Requirements
+
+### **Before Committing Code**
+
+**Test Each Variant**:
+```bash
+# Test provisioning
+POST /api/containers/create {variant: "vanilla"}
+POST /api/containers/create {variant: "paper"}
+POST /api/containers/create {variant: "fabric"}
+POST /api/containers/create {variant: "forge"}
+POST /api/containers/create {variant: "neoforge"}
+
+# Verify RUNNING state
+GET /api/containers/:vmid/status
+# Should return: {state: "RUNNING"}
+
+# Test control
+POST /api/containers/:vmid/stop
+POST /api/containers/:vmid/start
+POST /api/containers/:vmid/restart
+
+# Test cleanup
+DELETE /api/containers/:vmid
+# Verify no orphaned DNS records
+```
+
+**Test Dev Containers**:
+```bash
+POST /api/dev-instances/create {language: "python"}
+POST /api/dev-instances/create {language: "node"}
+
+# Verify accessible
+ssh into dev container
+# Should have language toolchain installed
+```
+
+---
+
+## 📊 Infrastructure Constraints (Know Your Limits)
+
+**Hardware** (GTHost Dedicated):
+- CPU: 12 cores / 24 threads (Intel Xeon Silver 4116)
+- RAM: 192 GB (can run 30-50 simultaneous 4GB servers)
+- Storage: 1.8 TB free (300-500 servers capacity)
+- Network: 300 Mbit/s (30-60 concurrent players)
+
+**Current Allocation**:
+- 11 VMs: 56 GB RAM, 24 CPU threads, ~512 GB disk
+- Available for servers: 128 GB RAM, 1.8 TB disk
+
+**Don't exceed these limits** - check before provisioning.
+
+---
+
+## 🚀 Launch Decision (Context)
+
+**Option A**: Launch NOW (85% ready, soft beta only)  
+**Option B**: +3 Days Sprint (95% ready, RECOMMENDED)  
+**Option C**: +1 Week (98% ready, over-engineering)
+
+**Your role**: Execute Option B sprint (complete 3 days of tasks)
+
+**After sprint**: Platform ready for professional launch
+
+---
+
+## 📞 Session Continuity
+
+### **Starting Fresh Session (You)**
+
+```
+1. Read Drift Prevention Card (30 seconds)
+   └─ Activate boundary awareness
+
+2. Read Complete Current State (5 minutes)
+   └─ Get Kanban state + sprint tasks
+
+3. Consult Cross-Project Tracker before code
+   └─ Verify no boundary violations
+
+4. Execute sprint tasks with constraints active
+
+5. Test all variants before committing
+```
+
+### **Handoff to Claude (Architecture Questions)**
+
+If you encounter:
+- Architectural decisions (e.g., should we change contracts?)
+- Strategic questions (e.g., which features to prioritize?)
+- Business model questions
+- Major design changes
+
+**STOP and escalate to Claude** for architectural guidance.
+
+---
+
+## ✅ Quick Reference
+
+### **Your Mission**
+Execute 3-day sprint → Deliver 95% launch-ready platform
+
+### **Your Boundaries**
+API orchestrates | Agent installs | Frontend displays
+
+### **Your Critical Docs**
+1. Drift Prevention Card (session start)
+2. Cross-Project Tracker (before code)
+3. Complete Current State (sprint tasks)
+4. Engineering Handover (technical details)
+
+### **Your Success**
+- [ ] 3-day sprint complete
+- [ ] No architectural violations
+- [ ] All variants still working
+- [ ] Tests passing
+
+---
+
+## 🎯 Start Here (First Actions)
+
+**Right Now**:
+1. ✅ Read [Drift Prevention Card](computer:///mnt/user-data/outputs/ZeroLagHub_Drift_Prevention_Card.md) (30 sec)
+2. ✅ Read [Complete Current State](computer:///mnt/user-data/outputs/ZeroLagHub_Complete_Current_State_Dec7.md) (5 min)
+3. ✅ Check Engineering Kanban for current TODO
+4. ✅ Begin Day 1 sprint: Dev containers
+
+**Remember**:
+- 🛡️ Drift prevention ACTIVE
+- 📋 Consult tracker before crossing boundaries
+- ✅ Test all variants before committing
+- 🚀 Goal: 95% launch-ready after 3 days
+
+---
+
+**Status**: You have everything you need to execute the sprint. Let's build! 🚀
--- a/ZeroLagHub_Infrastructure_Specifications.md
+++ b/ZeroLagHub_Infrastructure_Specifications.md
@ -0,0 +1,465 @@
+# 🏗️ ZeroLagHub Infrastructure - Complete Specifications
+
+**Last Updated**: December 7, 2025  
+**Provider**: GTHost  
+**Cost**: $109/month  
+**Status**: Production Infrastructure
+
+---
+
+## 🖥️ Dedicated Server Hardware
+
+### **Server Platform**
+**Model**: Supermicro 2029TP-HC1R (hot-swap HDD chassis)  
+**Form Factor**: Enterprise rackmount server
+
+### **CPU**
+**Model**: Intel Xeon Silver 4116  
+**Cores**: 12 cores / 24 threads  
+**Clock Speed**: 2.1 GHz base, 3.0 GHz turbo  
+**Architecture**: Intel Skylake-SP (Server Processor)  
+**TDP**: 85W  
+**Cache**: 16.5 MB L3
+
+**Performance**:
+- Single-threaded: Excellent for game servers
+- Multi-threaded: Handles 11 VMs + 75-100 containers
+- Turbo boost ensures responsive provisioning
+
+### **Memory (RAM)**
+**Configuration**: 6 x 32GB DIMMs  
+**Total**: 192 GB  
+**Type**: Hynix DDR4 RDIMM (Registered)  
+**Speed**: 2400 MHz  
+**ECC**: Yes (Error-Correcting Code)
+
+**Benefits**:
+- ECC protects against memory corruption
+- Registered DIMMs = enterprise reliability
+- 192GB = ample headroom for VM overhead + game servers
+
+### **Storage**
+**Configuration**: 2 x 1.92TB SSD  
+**Model**: Samsung PM863 (Enterprise SSD)  
+**Total Capacity**: 3.84 TB raw  
+**Available**: ~1.8 TB (after Proxmox + VMs + overhead)  
+**Interface**: SATA (likely in RAID configuration)
+
+**Samsung PM863 Specs**:
+- Enterprise-grade datacenter SSD
+- Optimized for mixed workload
+- Power Loss Protection (PLP)
+- High endurance rating
+
+**RAID Configuration** (Likely):
+- RAID 1 (mirrored) for redundancy
+- Or ZFS RAID-Z for Proxmox storage
+
+### **Network**
+**Bandwidth**: 300 Mbit/s (37.5 MB/s)  
+**Metering**: Unmetered (no bandwidth caps)  
+**Uplink**: Enterprise datacenter connection
+
+**Performance Assessment**:
+- 300 Mbit/s = 30-40 concurrent Minecraft players comfortably
+- Unmetered = no surprise overage charges
+- Sufficient for soft launch, may need upgrade at scale
+
+### **Operating System**
+**OS**: Proxmox VE 8 (64-bit)  
+**Base**: Debian 12 "Bookworm"  
+**Kernel**: Linux 6.2+
+
+---
+
+## 📊 Partition Layout
+
+### **Disk Partitions**
+```
+/boot:  1024 MB (1 GB)  - Boot partition
+/swap:  2048 MB (2 GB)  - Swap space
+/root:  Auto size       - Main Proxmox system + VM storage
+                          (~3.8 TB usable after RAID)
+```
+
+**Storage Allocation** (Estimated):
+```
+Total:          3.84 TB raw (2 x 1.92TB)
+RAID overhead:  -0.04 TB (metadata, alignment)
+                --------
+Available:      3.80 TB
+
+Proxmox OS:     -0.10 TB (Proxmox + system)
+VM Disks:       -1.80 TB (11 VMs, templates)
+LXC Containers: -0.10 TB (current containers)
+                --------
+Free Space:     1.80 TB (for game servers + growth)
+```
+
+---
+
+## 🎯 Performance Characteristics
+
+### **CPU Capabilities**
+
+**Per-Core Performance**: ⭐⭐⭐⭐ (Excellent for game servers)
+- Minecraft servers are single-threaded
+- 3.0 GHz turbo provides snappy performance
+- 12 cores = 12 simultaneous high-performance game servers
+
+**Multi-Core Throughput**: ⭐⭐⭐⭐⭐ (Excellent for hosting)
+- 24 threads handle VM overhead efficiently
+- Can run 11 Proxmox VMs + 75-100 LXC containers
+- Provisioning operations don't impact running servers
+
+**Virtualization**: ⭐⭐⭐⭐⭐ (Native support)
+- Intel VT-x + VT-d enabled
+- Hardware-accelerated virtualization
+- LXC containers = near-native performance
+
+### **Memory Capabilities**
+
+**Capacity**: 192 GB = **Excellent** for scale
+- Average Minecraft server: 2-4 GB
+- 192 GB / 4 GB = **48 simultaneous 4GB servers**
+- Or 96 lightweight 2GB servers
+
+**Speed**: 2400 MHz DDR4 = **Good** (not bleeding edge, but sufficient)
+- DDR4-2400 provides adequate bandwidth for game hosting
+- ECC ensures data integrity under load
+
+**Reliability**: ECC RDIMM = **Enterprise-grade**
+- Detects and corrects memory errors
+- Critical for 24/7 uptime
+
+### **Storage Capabilities**
+
+**Capacity**: 3.84 TB = **Very Good** for initial scale
+- Each Minecraft server: 1-5 GB (world size varies)
+- Can host 300-500 Minecraft servers comfortably
+- 1.8 TB free = room for significant growth
+
+**Performance**: Samsung PM863 = **Excellent** for workload
+- Random IOPS: ~10,000 read, ~2,000 write
+- Sequential: 520 MB/s read, 485 MB/s write
+- Perfect for database + game world I/O
+
+**Reliability**: Enterprise SSD = **Excellent**
+- Power Loss Protection prevents corruption
+- Rated for 1.3 PB writes (years of 24/7 operation)
+- RAID 1 (likely) provides redundancy
+
+### **Network Capabilities**
+
+**Bandwidth**: 300 Mbit/s = **Adequate for soft launch**
+- Minecraft player: ~0.5-1 Mbit/s
+- 300 Mbit/s = 30-60 players (conservative estimate)
+- Unmetered = no bandwidth overage charges
+
+**Upgrade Path**: GTHost likely offers 1 Gbps upgrades
+- 1 Gbps would support 100-200 players
+- Consider upgrade when approaching 40+ concurrent players
+
+---
+
+## 🏗️ Current Resource Allocation
+
+### **VM Resource Breakdown** (11 VMs)
+
+```
+Hypervisor Overhead:  ~8 GB RAM, 2 CPU cores
+
+Critical Production:
+├─ VM 100 (zlh-panel):       4 GB RAM, 2 cores, 32 GB disk
+├─ VM 103 (zlh-api):         4 GB RAM, 2 cores, 32 GB disk
+└─ VM 101 (zlh-wings):       8 GB RAM, 4 cores, 64 GB disk
+
+Platform Services:
+├─ VM 102 (zlh-portal):      4 GB RAM, 2 cores, 32 GB disk
+├─ VM 104 (zlh-monitor):     8 GB RAM, 2 cores, 64 GB disk
+└─ VM 1002 (zlh-proxy):      2 GB RAM, 1 core, 16 GB disk
+
+Network Layer:
+├─ VM 1000 (zlh-router):     4 GB RAM, 2 cores, 32 GB disk
+├─ VM 1006 (zpack-router):   4 GB RAM, 2 cores, 32 GB disk
+└─ VM 1001 (zlh-dns):        2 GB RAM, 1 core, 16 GB disk
+
+Development/Support:
+├─ VM 300 (zlh-panel-dev):   4 GB RAM, 2 cores, 32 GB disk
+└─ VM 2000 (zlh-ci):         4 GB RAM, 2 cores, 32 GB disk
+
+Backup:
+└─ VM [zlh-back]:            8 GB RAM, 2 cores, 128 GB disk
+
+TOTAL VM ALLOCATION:         56 GB RAM, 24 cores, ~512 GB disk
+```
+
+### **Available for Game Servers**
+
+```
+RAM Available:    192 GB - 56 GB (VMs) - 8 GB (overhead) = 128 GB
+CPU Available:    24 threads - 24 (VM allocation) = 0 (shared)
+Disk Available:   1.8 TB free
+
+Game Server Capacity (Conservative):
+├─ 2GB servers:   64 simultaneous servers
+├─ 4GB servers:   32 simultaneous servers
+└─ 8GB servers:   16 simultaneous servers
+
+Developer Environment Capacity:
+├─ 2GB dev envs:  64 simultaneous environments
+└─ 4GB dev envs:  32 simultaneous environments
+```
+
+**Note**: CPU is oversubscribed (common in hosting) since most game servers idle at <20% CPU usage. Turbo boost ensures good single-thread performance when needed.
+
+---
+
+## 📈 Capacity & Scaling Projections
+
+### **Current Capacity** (As Deployed)
+
+**Game Servers**: 30-50 active servers with current VM allocation  
+**Developer Environments**: 75-100 environments (documented capacity)  
+**Concurrent Players**: 30-60 players (network limited)
+
+### **Optimized Capacity** (With Tuning)
+
+**Game Servers**: 60-80 active servers (after VM consolidation)  
+**Developer Environments**: 100-150 environments  
+**Concurrent Players**: Still 30-60 (network bottleneck)
+
+### **Maximum Theoretical Capacity**
+
+**Game Servers**: 128 lightweight servers (if only game hosting, no dev)  
+**Developer Environments**: 192 environments (if only dev, no games)  
+**Storage**: 300-500 servers before storage exhaustion
+
+**Limiting Factors**:
+1. **Network** (300 Mbit/s) - limits concurrent players
+2. **RAM** (192 GB) - limits concurrent heavy servers
+3. **Storage** (1.8 TB free) - limits total servers
+
+---
+
+## 💰 Cost Analysis
+
+### **Current Infrastructure Cost**
+
+**Monthly**: $109 GTHost dedicated server  
+**Annually**: $1,308
+
+**Cost per Resource**:
+- Per GB RAM: $0.57/month ($109 ÷ 192 GB)
+- Per CPU core: $9.08/month ($109 ÷ 12 cores)
+- Per TB storage: $28.39/month ($109 ÷ 3.84 TB)
+
+### **Competitive Analysis**
+
+**AWS Equivalent** (m5.2xlarge + storage + bandwidth):
+- 8 vCPU, 32 GB RAM, 1 TB storage, 1 Gbps
+- Cost: ~$300-400/month
+
+**Hetzner Dedicated** (Similar specs):
+- 12 core Xeon, 128 GB RAM, 2x2TB SSD
+- Cost: ~$100/month (but higher network costs)
+
+**GTHost Value**: ⭐⭐⭐⭐⭐ Excellent
+- 40-60% cheaper than AWS
+- Competitive with Hetzner
+- Unmetered bandwidth (key advantage)
+
+---
+
+## 🎯 Competitive Advantages
+
+### **1. LXC Performance**
+- Host hardware enables 20-30% better performance vs Docker
+- Intel Xeon Silver 4116 single-thread performance excellent for games
+
+### **2. Resource Density**
+- 192 GB RAM supports 30-50 simultaneous 4GB servers
+- Competitors typically offer 64-128 GB at this price point
+
+### **3. Storage Performance**
+- Samsung PM863 enterprise SSDs outperform consumer SSDs
+- Power Loss Protection prevents world corruption
+- Hot-swap chassis enables maintenance without downtime
+
+### **4. Network**
+- Unmetered = no bandwidth surprises
+- 300 Mbit/s adequate for soft launch
+- Upgrade path available when needed
+
+---
+
+## ⚠️ Identified Constraints
+
+### **1. Network Bandwidth** (Current Bottleneck)
+- **300 Mbit/s limits to 30-60 concurrent players**
+- **Recommendation**: Monitor bandwidth usage, upgrade to 1 Gbps when approaching 40 players
+- **Upgrade Cost**: Likely +$20-50/month for 1 Gbps
+
+### **2. CPU Oversubscription**
+- 24 threads allocated to VMs, but most VMs idle
+- Game servers share CPU via time-slicing
+- **Risk**: If all servers spike simultaneously, performance degrades
+- **Mitigation**: Limit concurrent servers to 40-50 until load testing proves higher safe
+
+### **3. Storage Growth**
+- 1.8 TB free supports 300-500 servers
+- Each server grows over time (world expansion)
+- **Recommendation**: Monitor disk usage, plan expansion at 70% utilization
+- **Expansion Options**: Add external storage or upgrade to larger SSDs
+
+---
+
+## 🔧 Optimization Opportunities
+
+### **Immediate Optimizations** (No Cost)
+
+1. **VM Consolidation**
+   - Merge zlh-panel-dev into zlh-panel (save 4 GB RAM, 2 cores)
+   - Merge zlh-proxy into zlh-router (save 2 GB RAM, 1 core)
+   - **Gain**: 6 GB RAM, 3 cores for game servers
+
+2. **LXC Over VMs**
+   - Convert lightweight VMs to LXC containers
+   - Example: zlh-dns, zlh-proxy candidates
+   - **Gain**: Lower overhead, faster provisioning
+
+3. **Memory Ballooning**
+   - Enable KSM (Kernel Same-page Merging) on Proxmox
+   - Deduplicate identical memory pages
+   - **Gain**: 5-10% more available RAM
+
+### **Paid Optimizations** (Consider at Scale)
+
+1. **Network Upgrade**: 1 Gbps uplink (+$20-50/month)
+   - Removes player concurrency bottleneck
+   - Enables 100-200 player capacity
+
+2. **Storage Expansion**: Add 4TB NVMe (+$50/month)
+   - Doubles storage to ~6 TB total
+   - Supports 600-1000 servers
+
+3. **Cloudflare Enterprise** (+$200/month)
+   - DDoS protection for game traffic
+   - CDN for static assets
+   - Worth it at 100+ servers
+
+---
+
+## 📊 Hardware Lifecycle
+
+### **Current Status** (December 2025)
+
+**Server Age**: Unknown (likely 1-3 years based on Xeon Silver 4116 era)  
+**Expected Lifespan**: 5-7 years for enterprise server  
+**Remaining Life**: Likely 3-5 years
+
+**Components**:
+- CPU: Xeon Silver 4116 (2017 release) - still very capable
+- RAM: DDR4-2400 (current gen, plenty of life)
+- SSD: Samsung PM863 (enterprise grade, high endurance)
+
+### **Upgrade Path** (Future)
+
+**Year 1-2** (Current plan):
+- Optimize existing hardware
+- Minor network upgrades if needed
+
+**Year 3-4** (Growth phase):
+- Consider second dedicated server
+- Load balance across servers
+- Geographic distribution
+
+**Year 5+** (Scale phase):
+- Migrate to colocation or cloud
+- Multi-datacenter deployment
+
+---
+
+## 🛡️ Reliability Features
+
+### **Hardware Reliability**
+
+✅ **ECC Memory** - Corrects single-bit errors automatically  
+✅ **Enterprise SSDs** - Power Loss Protection, high endurance  
+✅ **Hot-Swap Chassis** - Replace drives without shutdown  
+✅ **Redundant Power** (likely) - Supermicro chassis typically dual PSU  
+
+### **Software Reliability**
+
+✅ **Proxmox High Availability** - VM failover (if configured)  
+✅ **PBS Backup** - Incremental backups to Backblaze B2  
+✅ **LXC Snapshots** - Fast rollback capability  
+✅ **RAID Mirroring** (likely) - Disk failure protection  
+
+### **Network Reliability**
+
+✅ **Datacenter Uptime** - GTHost likely 99.9%+ SLA  
+✅ **Unmetered Bandwidth** - No throttling during spikes  
+⚠️ **Single Uplink** - No network redundancy (acceptable for price point)  
+
+---
+
+## 🎯 Summary & Recommendations
+
+### **Hardware Assessment**: ⭐⭐⭐⭐ Very Good for Use Case
+
+**Strengths**:
+- Excellent CPU for game server hosting (Xeon Silver 4116)
+- Abundant RAM (192 GB = 30-50 servers)
+- Enterprise storage (Samsung PM863 + hot-swap)
+- Unmetered bandwidth (no surprise charges)
+- Great value ($109/month for these specs)
+
+**Limitations**:
+- Network bandwidth (300 Mbit/s = 30-60 players)
+- Storage growth constraint (monitor usage)
+- CPU oversubscription (limit concurrent servers initially)
+
+### **Recommendations**
+
+**Now** (Launch Phase):
+1. ✅ Deploy on current hardware - adequate for soft launch
+2. ✅ Limit to 40-50 concurrent servers initially
+3. ✅ Monitor bandwidth, RAM, and disk usage
+
+**Month 1-3** (Early Growth):
+1. 🔧 Optimize VM allocation (consolidate where possible)
+2. 🔧 Implement aggressive monitoring
+3. 🔧 Consider 1 Gbps network upgrade if approaching 40 players
+
+**Month 6-12** (Scale Phase):
+1. 📈 Evaluate storage expansion based on usage
+2. 📈 Consider second server for geographic distribution
+3. 📈 Implement Cloudflare Enterprise for DDoS protection
+
+### **Capacity Targets by Phase**
+
+**Soft Launch** (Month 1-3): 20-30 servers, 10-20 players  
+**Public Launch** (Month 3-6): 40-50 servers, 30-40 players  
+**Growth Phase** (Month 6-12): 60-80 servers, 60-100 players (with 1 Gbps upgrade)  
+**Scale Phase** (Month 12+): 100+ servers, multi-server deployment
+
+---
+
+## ✅ Conclusion
+
+**Status**: Infrastructure is **production-ready** for ZeroLagHub launch.
+
+**Key Points**:
+- Hardware specifications are excellent for initial scale
+- 192 GB RAM supports 30-50 game servers
+- Storage capacity adequate for 300-500 servers
+- Network bandwidth is current bottleneck (acceptable for soft launch)
+- Cost-effective ($109/month for enterprise-grade hardware)
+
+**Green Light**: ✅ Launch when platform development complete (currently 85% ready).
+
+---
+
+**Last Updated**: December 7, 2025  
+**Source**: GTHost server specifications + ZeroLagHub infrastructure analysis
--- a/ZeroLagHub_Master_Bootstrap_Dec2025.md
+++ b/ZeroLagHub_Master_Bootstrap_Dec2025.md
@ -0,0 +1,606 @@
+# 🚀 ZeroLagHub - Master Bootstrap Document (December 2025)
+
+**Last Updated**: December 7, 2025  
+**Version**: 4.0 (Platform Launch Ready)  
+**Status**: 85% Complete - Launch Decision Point
+
+---
+
+## 📌 Quick Start for New AI Sessions
+
+> **Resume Point**: Platform 85% launch-ready, all core provisioning operational, critical UX features needed.
+> 
+> **Current Phase**: Launch readiness assessment - choose NOW vs +1 week vs +1 month
+> 
+> **Critical Context**: All 6 Minecraft variants provisioning successfully via Go agent. Need WebSocket console, crash protection, and disk monitoring for competitive parity.
+
+---
+
+## 🎯 Project Overview
+
+**ZeroLagHub** is a developer-focused game server hosting platform built on:
+- **Proxmox VE** with LXC containers (20-30% performance advantage over Docker)
+- **Hybrid Architecture**: Pterodactyl panel + Custom Node.js API + Go provisioning agent
+- **Velocity proxy** for seamless Minecraft routing
+- **Dual-router architecture** for traffic separation
+- **Developer-to-player revenue pipeline** with 9.75x revenue multiplier
+
+### Core Value Proposition
+Complete dev-to-production pipeline: Development environments ($20/mo) → Testing servers (50% discount) → Player hosting (25% discount) → Revenue sharing (7.5% commission) = viral growth through developer ecosystem.
+
+---
+
+## 🏗️ Current Architecture (December 2025)
+
+### Infrastructure Overview (11 VMs)
+
+```
+Critical Production:
+├── VM 100 (zlh-panel)      - Pterodactyl panel + OAuth customization
+├── VM 103 (zlh-api)        - Node.js backend + developer platform APIs
+├── VM 101 (zlh-wings)      - Game servers + LXC integration target
+
+Platform Services:
+├── VM 102 (zlh-portal)     - Next.js frontend + developer dashboard
+├── VM 104 (zlh-monitor)    - Prometheus/Grafana monitoring
+
+Network & Infrastructure:
+├── VM 1000 (zlh-router)    - Platform services routing + VLANs
+├── VM 1006 (zpack-router)  - Game traffic routing + Velocity
+├── VM 1001 (zlh-dns)       - Technitium DNS + development domains
+├── VM 1002 (zlh-proxy)     - Caddy reverse proxy + SSL automation
+├── VM 300 (zlh-panel-dev)  - Development environment + testing
+├── VM 2000 (zlh-ci)        - CI/CD pipeline + automation
+└── VM [zlh-back]           - PBS backup + Backblaze B2 replication
+```
+
+### Network Topology
+
+```
+zlh-router (VM 1000):
+├─ WAN1: Platform services (API, portal, monitoring)
+├─ CORE_LAN: 10.60.0.0/24 (internal services)
+├─ MGMT_LAN: 172.60.0.10/24 (inter-router communication)
+└─ WireGuard: Admin access
+
+zpack-router (VM 1006):
+├─ WAN2: 139.64.165.248 (game services)
+├─ ZPACK_LAN: 10.70.0.0/24 (Velocity @ 10.70.0.241)
+├─ DEV_LAN: 10.100.0.0/24 (developer environments - future)
+├─ GAME_LAN: 10.200.0.0/24 (game server LXCs)
+└─ MGMT_LAN: 172.60.0.20/24 (control plane communication)
+```
+
+### Traffic Flows
+
+- **Platform Access**: Client → WAN1 → zlh-router → Frontend/API
+- **Game Play**: Player → WAN2 (139.64.165.248) → zpack-router → Velocity (10.70.0.241) → Game Server (10.200.0.X)
+- **Control Plane**: API → MGMT_LAN (172.60.0.X) → Velocity/DNS/Monitoring
+
+---
+
+## ✅ What's Working (December 7, 2025)
+
+### Provisioning Pipeline (100% Operational)
+
+| Component | Status | Notes |
+|-----------|--------|-------|
+| **LXC Container Creation** | ✅ | Template VMID 800, auto-cloning working |
+| **VMID Allocation** | ✅ | Sequential assignment from range |
+| **IP Detection** | ✅ | Automatic network configuration |
+| **Go Agent Deployment** | ✅ | Payload delivery + self-repair system |
+| **Java Runtime Selection** | ✅ | Auto-detect MC version → Java 17/21 |
+| **All 6 MC Variants** | ✅ | Vanilla, Paper, Purpur, Fabric, Forge, NeoForge |
+| **Server Startup** | ✅ | All variants start successfully |
+| **DNS Publishing** | ✅ | Cloudflare + Technitium A + SRV records |
+| **Velocity Registration** | ✅ | Dynamic backend server registration |
+| **Client Connectivity** | ✅ | Players can connect and play |
+
+### Control Functions
+
+| Function | Status | Implementation |
+|----------|--------|----------------|
+| **Start/Stop/Restart** | ✅ | HTTP API → Go agent |
+| **Console Commands** | ✅ | Command injection working |
+| **Log Tailing** | ⚠️ | HTTP polling only (need WebSocket) |
+| **Status Reporting** | ✅ | Agent emits RUNNING state |
+| **Crash Detection** | ✅ | Agent tracks exit codes |
+
+### Game Support Matrix
+
+**Launch Ready (Minecraft Only)**:
+- ✅ **Vanilla** - Official Mojang server
+- ✅ **Paper** - Primary recommendation (vanilla + plugins)
+- ✅ **Purpur** - Paper fork with extra features
+- ✅ **Fabric** - Lightweight mod support
+- ✅ **Forge** - Heavy mod support (tech/magic mods)
+- ✅ **NeoForge** - Modern Forge fork (**competitive advantage**)
+
+**Supported Versions**: 1.12.2, 1.16.5, 1.18.2, 1.19.2, 1.20.1, 1.21.x
+
+**Deferred to Post-Launch**:
+- 📋 Terraria
+- 📋 Project Zomboid
+- 📋 Valheim
+- 📋 Rust
+
+---
+
+## 🚨 Known Issues & Gaps
+
+### Critical Bugs (Non-Blocking, System Works)
+
+1. **Forge server.jar Glob Logic** (`artifacts.go` lines 112-116, 147-151)
+   - Tries to find `*server.jar` but Forge ≥1.17 doesn't create this
+   - **Fix**: Remove glob/rename logic (Forge uses `run.sh` + `libraries/`)
+   - **Impact**: System works, but unnecessary code
+
+2. **ensureProvisioned() Fallthrough** (`agent.go` lines 155-171)
+   - After Forge check, falls through to check `server.jar`
+   - **Fix**: Add `else` to prevent fallthrough
+   - **Impact**: Minor efficiency issue
+
+3. **Forge Stop Command Exclusion** (`process.go` line 83)
+   - Excludes Forge from receiving `stop` command
+   - **Fix**: Remove exclusion (Forge accepts stop commands)
+   - **Impact**: Manual workaround needed for Forge stops
+
+### Missing Competitive Features (CRITICAL)
+
+| Feature | Apex | Shockbyte | ZeroLagHub | Priority |
+|---------|------|-----------|------------|----------|
+| **All MC Variants** | ✅ | ✅ | ✅ | - |
+| **NeoForge** | ❌ | ❌ | ✅ | ADVANTAGE |
+| **Performance** | 🟡 | 🟡 | ✅ | ADVANTAGE |
+| **Console Streaming** | ✅ | ✅ | ❌ | 🔴 HIGH |
+| **File Management** | ✅ | ✅ | ❌ | 🟡 MEDIUM |
+| **Backups** | ✅ | ✅ | ❌ | 🟡 MEDIUM |
+| **Crash Protection** | ✅ | ✅ | ❌ | 🔴 HIGH |
+| **Disk Monitoring** | ✅ | ✅ | ❌ | 🔴 HIGH |
+
+---
+
+## 🎯 Platform Readiness Assessment (85%)
+
+### Core Platform (100%)
+- ✅ Container orchestration
+- ✅ Multi-variant provisioning
+- ✅ Network routing (dual-router)
+- ✅ DNS automation (Cloudflare + Technitium)
+- ✅ Velocity proxy integration
+- ✅ Start/stop/restart control
+- ✅ Console command injection
+- ✅ Status monitoring
+
+### Operational Features (70%)
+- ✅ Log tailing (HTTP polling)
+- ✅ Crash detection
+- ❌ **WebSocket console** (need real-time streaming)
+- ❌ **Crash loop protection** (need exponential backoff)
+- ❌ **Disk space monitoring** (prevent corruption)
+
+### File Management (0%)
+- ❌ File upload/download
+- ❌ Backup/restore system
+- ❌ World file management
+
+### Advanced Features (Planned)
+- 📋 Resource monitoring dashboard
+- 📋 Plugin marketplace
+- 📋 Developer platform APIs
+- 📋 Performance optimization tools
+
+---
+
+## 🚀 Launch Decision Point
+
+### Option A: Launch NOW (Soft Beta)
+**Status**: 85% ready  
+**Timeline**: Immediate  
+**Pros**: Fast to market, gather user feedback  
+**Cons**: Missing competitive UX features, higher support burden  
+**Recommendation**: ⚠️ Acceptable for 10-20 beta users only
+
+### Option B: +1 Week (Critical Features) ⭐ RECOMMENDED
+**Status**: 95% ready after additions  
+**Timeline**: December 14, 2025  
+**Add**: WebSocket console + Crash protection + Disk monitoring  
+**Effort**: 7-9 hours total  
+**Pros**: Competitive feature parity, professional launch  
+**Cons**: Minimal delay  
+**Recommendation**: ✅ Best balance of quality and speed
+
+### Option C: +1 Month (Full Feature Parity)
+**Status**: 100% ready  
+**Timeline**: January 7, 2026  
+**Add**: All UX features + file management + backups  
+**Effort**: ~30 hours  
+**Pros**: Complete competitive offering  
+**Cons**: Slower to market, feature creep risk  
+**Recommendation**: ⚠️ Over-engineering for launch
+
+---
+
+## 📋 Critical Outstanding Items
+
+### 🔴 High Priority (Before Launch)
+
+**1. WebSocket Console Streaming** [4-6 hours]
+- **Current**: HTTP polling via `/logs/tail`
+- **Needed**: Real-time WebSocket streaming
+- **Why**: Industry standard, users expect it
+- **Technical**: Socket.io integration to Go agent
+
+**2. Crash Loop Protection** [2 hours]
+- **Current**: Immediate restart on crash
+- **Needed**: Exponential backoff (5s, 10s, 15s), stop after 3 crashes
+- **Why**: Prevents resource thrashing
+- **Technical**: Agent retry logic with backoff timer
+
+**3. Disk Space Monitoring** [1 hour]
+- **Current**: No checks
+- **Needed**: Alert when <1GB free, prevent start if insufficient
+- **Why**: Prevents world corruption
+- **Technical**: Agent disk space check before start
+
+### 🟡 Medium Priority (Week 1)
+
+**4. File Upload/Download** [6-8 hours]
+- Plugin management, world uploads
+- HTTP multipart + streaming
+
+**5. Backup System** [8-10 hours]
+- World backup/restore
+- Integration with PBS backup infrastructure
+
+**6. Enhanced Health Checks** [3-4 hours]
+- Query server status
+- Resource monitoring (CPU/RAM)
+
+### 🟢 Low Priority (Month 1)
+
+7. Resource monitoring dashboard
+8. Plugin marketplace integration
+9. Developer platform APIs
+10. Performance optimization
+
+---
+
+## 🗄️ Technical Architecture Details
+
+### Directory Structure (Finalized)
+
+```
+/opt/zlh/<game>/<variant>/world/
+
+Examples:
+/opt/zlh/minecraft/vanilla/world/
+/opt/zlh/minecraft/forge/world/
+/opt/zlh/minecraft/fabric/world/
+```
+
+**Benefits**:
+- Clear game/variant separation
+- Scalable to all future games
+- Self-documenting paths
+- Easy backup automation
+
+### Container Model
+
+**Architecture**: One game per LXC container  
+**Rationale**: Industry standard, 3-5x simpler than multi-game  
+**Benefits**:
+- Better resource isolation
+- Simpler billing
+- Clearer security boundaries
+- Easier debugging
+
+### Java Runtime Selection
+
+```
+MC 1.21.x        → Java 21
+MC ≥1.20.5       → Java 21
+MC <1.20.5       → Java 17
+```
+
+### Artifact Download Paths
+
+```
+minecraft/vanilla/<version>/server.jar
+minecraft/paper/<version>/server.jar
+minecraft/purpur/<version>/server.jar
+minecraft/fabric/<version>/fabric-server.jar
+minecraft/forge/<version>/forge-installer.jar
+minecraft/neoforge/<version>/neoforge-installer.jar
+```
+
+**Critical Note**: Fabric uses `fabric-server.jar` (pre-built), not installer pattern
+
+---
+
+## 💰 Business Model & Revenue Strategy
+
+### Developer-to-Player Pipeline
+
+```
+Step 1: Developer Acquisition
+├─ Development Environment: $20/month
+└─ Testing Server: $25/month (50% discount)
+
+Step 2: Player Acquisition (via developer)
+├─ Player 1-10: $15/month each (25% discount)
+└─ Total Player Revenue: $150/month
+
+Step 3: Developer Commission
+├─ Revenue Share: 7.5% of player revenue
+├─ Developer Earns: $11.25/month
+└─ Platform Keeps: $138.75/month
+
+Total Monthly Revenue from One Developer:
+$20 (dev env) + $25 (test server) + $150 (players) = $195/month
+Revenue Multiplier: 9.75x on developer acquisition cost
+```
+
+### Financial Projections
+
+**Month 6**: $8K-30K (LXC advantage + developer pipeline)  
+**Month 12**: $25K-100K (custom platform competitive advantages)  
+**Month 24**: $75K-300K (market leadership + technology licensing)
+
+### Competitive Advantages
+
+1. **LXC Performance**: 20-30% improvement over Docker competitors
+2. **Developer Ecosystem**: Complete dev-to-production pipeline vs pure hosting
+3. **Open Source Foundation**: 30-40% cost advantage over corporate providers
+4. **Gaming-First Architecture**: Purpose-built vs adapted generic hosting
+5. **NeoForge Support**: Ahead of Apex and Shockbyte
+
+---
+
+## 🔐 Security Vulnerabilities (CRITICAL - Active Fix Required)
+
+### API Department Issues
+
+1. **Server Ownership Bypass**
+   - Any user can control any server via UUID
+   - No ownership validation in API endpoints
+   - **Impact**: Critical security flaw
+
+2. **Admin Privilege Escalation**
+   - Frontend can claim admin via JWT manipulation
+   - No server-side role validation
+   - **Impact**: Complete access control bypass
+
+3. **Token URL Exposure**
+   - JWTs visible in browser history/logs
+   - Tokens passed as URL parameters
+   - **Impact**: Token theft vulnerability
+
+4. **API Key Validation Missing**
+   - Authentication bypass vulnerabilities
+   - Inconsistent validation patterns
+   - **Impact**: Unauthorized API access
+
+### Required Fixes
+
+- Implement ownership checks on all server operations
+- Server-side JWT validation and role enforcement
+- Move tokens from URL to headers/cookies
+- Comprehensive API key validation
+
+**Priority**: Must fix before public launch (current soft beta acceptable)
+
+---
+
+## 🛠️ Ford Assembly Line Department Structure
+
+### Management Department (Coordination Hub)
+- **Role**: Strategic oversight, cross-department integration
+- **AI Resource**: Claude (architecture) + ChatGPT (implementation)
+- **Current Focus**: Launch readiness + critical feature completion
+
+### 5 Specialized Departments
+
+**1. API Department** ⚠️ CRITICAL SECURITY + DEVELOPER PLATFORM
+- Tech: Node.js/Express, MariaDB, JWT auth, Pterodactyl integration
+- Priority: Security fixes + developer environment APIs
+
+**2. Infrastructure Department** ✅ LXC INTEGRATION PRIORITY
+- Tech: Proxmox VMs, Ansible automation, PBS backup, Monitoring
+- Achievement: Enterprise backup system operational
+- Capacity: 1.8TB available, supports 75-100 developers
+
+**3. Frontend Department** 🔧 TOKEN SECURITY + DEVELOPER UI
+- Tech: Next.js 15, TailwindCSS, sci-fi HUD aesthetic, TypeScript
+- Priority: Token security + developer dashboard
+
+**4. Pterodactyl Department** ⚠️ OAUTH + WINGS LXC
+- Role: Panel customization, OAuth integration
+- Future: Wings LXC integration for performance advantage
+
+**5. Planning & Brainstorming Department** 🧠 STRATEGIC EXECUTION
+- Role: Long-term vision, competitive strategy
+- Focus: Developer acquisition, viral growth mechanics
+
+---
+
+## 📋 Immediate Next Steps (Priority Order)
+
+### Phase 1: Critical Features (Before Launch)
+1. ✅ **Fix Go Agent Bugs** - Remove Forge glob, fix fallthrough, enable stop commands
+2. 🔧 **WebSocket Console** - Implement real-time streaming (4-6 hours)
+3. 🔧 **Crash Loop Protection** - Add exponential backoff (2 hours)
+4. 🔧 **Disk Space Monitoring** - Prevent starts on low disk (1 hour)
+
+### Phase 2: Launch Readiness
+5. 📋 **Security Audit** - Review critical vulnerabilities
+6. 📋 **Documentation** - User guides, API docs
+7. 📋 **Monitoring** - Alert thresholds, dashboards
+8. 📋 **Soft Beta** - 10-20 users, gather feedback
+
+### Phase 3: Week 1 Post-Launch
+9. 📋 **File Management** - Upload/download interface
+10. 📋 **Backup System** - World backup/restore
+11. 📋 **Enhanced Health Checks** - Resource monitoring
+
+---
+
+## 🎯 Success Metrics
+
+### Technical Metrics
+- ✅ 100% provisioning success rate (all 6 variants)
+- ⚠️ Zero DNS orphan records (needs EdgeState migration)
+- ⚠️ Sub-second WebSocket latency (needs implementation)
+- ✅ LXC 20-30% performance advantage (validated)
+
+### Business Metrics (Future)
+- Developer referral system operational
+- Revenue sharing calculations accurate
+- Customer quota enforcement working
+- Usage metering for billing
+
+### User Experience Metrics
+- Professional HUD aesthetic maintained
+- Zero breaking changes during updates
+- Seamless dev-to-production pipeline
+- <3s average provisioning time
+
+---
+
+## 📁 Key Files & Locations
+
+### API Service (`/home/zlh/zlh-api-v2/`)
+- `prisma/schema.prisma` - Database schema
+- `src/services/edgePublisher.js` - DNS + Velocity publishing
+- `src/services/dePublisher.js` - Edge cleanup
+- `src/services/portAllocator.js` - Port management
+- `src/clients/cloudflareClient.js` - Cloudflare API wrapper
+- `src/clients/technitiumClient.js` - Technitium DNS API wrapper
+
+### Go Agent (`/opt/zlh-agent/`)
+- `agent.go` - Main provisioning logic
+- `artifacts.go` - Download + verification (has bugs)
+- `process.go` - Server lifecycle management (has bug)
+- `api.go` - HTTP server for control commands
+- `payload.json` - Configuration from API
+
+### Frontend (`/home/zlh/zlh-portal/`)
+- Next.js 15 application
+- Steel-texture HUD aesthetic
+- Developer dashboard (in progress)
+
+---
+
+## ⚠️ Critical Rules & Constraints
+
+### DO NOT
+- ❌ Infer hostnames from DNS records
+- ❌ Use DNS as source of truth
+- ❌ Delete Cloudflare records without record IDs
+- ❌ Launch without WebSocket console (competitive requirement)
+- ❌ Skip crash protection (operational stability)
+- ❌ Ignore disk space monitoring (data safety)
+
+### ALWAYS
+- ✅ Treat DB as authoritative source of truth
+- ✅ Store Cloudflare record IDs in EdgeState
+- ✅ Use exact hostname matching
+- ✅ Track all async operations in JobLog
+- ✅ Audit significant actions
+- ✅ Test all 6 MC variants before deploy
+
+---
+
+## 💡 Key Architectural Decisions (ADRs)
+
+**ADR-001: Minecraft-Only Launch**  
+**Decision**: Launch with Minecraft only, defer other games  
+**Rationale**: Market validation, focused quality, faster to market  
+**Consequence**: 6 variants + 6 versions = comprehensive MC offering
+
+**ADR-002: One Game Per Container**  
+**Decision**: Single game per LXC container  
+**Rationale**: Industry standard, 3-5x simpler than multi-game  
+**Consequence**: Better isolation, clearer billing, easier debugging
+
+**ADR-003: Velocity Over Direct Port Forwarding**  
+**Decision**: Use Velocity proxy for Minecraft routing  
+**Rationale**: Single entry point, dynamic registration, no NAT complexity  
+**Consequence**: No external port allocation needed for MC
+
+**ADR-004: Hybrid Pterodactyl + Custom API**  
+**Decision**: Keep Pterodactyl panel, build custom API alongside  
+**Rationale**: Preserve working OAuth, gradual migration path  
+**Consequence**: Dual system complexity, eventual migration needed
+
+**ADR-005: Go Agent Architecture**  
+**Decision**: Containerized Go agent handles provisioning  
+**Rationale**: Language-agnostic, self-healing, version-aware  
+**Consequence**: Robust provisioning, automatic repair, clean separation
+
+---
+
+## 🧠 Session Continuity Prompt
+
+For AI assistants resuming work on this project:
+
+> Resume from ZeroLagHub Master Bootstrap (December 7, 2025).
+> 
+> **Current State**: Platform 85% launch-ready. All 6 Minecraft variants provisioning successfully via Go agent. Core functionality operational, need critical UX features for competitive parity.
+> 
+> **Launch Decision**: Recommend +1 week for WebSocket console, crash protection, and disk monitoring.
+> 
+> **Known Bugs**: 3 non-blocking Go agent issues (Forge glob, fallthrough, stop exclusion).
+> 
+> **Critical Context**: 
+> - Security vulnerabilities exist but acceptable for soft beta
+> - Business model validated with 9.75x revenue multiplier
+> - Developer-to-player pipeline is core differentiator
+> - LXC performance advantage is primary competitive edge
+> 
+> **Next Actions**: Fix Go agent bugs, implement critical features, launch beta.
+
+---
+
+## 📞 Support & Escalation
+
+- **Platform Owner**: 44 years old, full-stack developer
+- **AI Coordination**: Claude (architecture) + ChatGPT (implementation)
+- **Infrastructure**: GTHost dedicated server ($109/month)
+- **Domain**: zerolaghub.com, zpack.zerolaghub.com
+- **Public Game IP**: 139.64.165.248
+
+---
+
+## 📊 Platform Status Summary
+
+**Technical Readiness**: 85% complete  
+**Competitive Position**: Ready to compete on core provisioning, need UX polish  
+**Strategic Clarity**: Clear path to launch with validated business model  
+**Infrastructure**: Production-grade with enterprise backup system  
+**Security**: Known vulnerabilities, acceptable for soft beta, must fix before public launch
+
+---
+
+## 🎯 Strategic Recommendation
+
+**Recommended Path**: Option B (+1 Week)
+
+**Rationale**:
+1. WebSocket console is table stakes (competitors have it)
+2. Crash protection prevents operational nightmares
+3. Disk monitoring prevents data loss
+4. 1 week is negligible for long-term platform success
+5. Professional launch > rushed launch
+
+**Timeline**:
+- **Dec 7-10**: Implement critical features (WebSocket, crash, disk)
+- **Dec 11-13**: Testing + bug fixes
+- **Dec 14**: Soft beta launch (10-20 users)
+- **Dec 21**: Public launch after beta feedback
+
+---
+
+**This document serves as the single source of truth for project continuity. Update after each major milestone or architectural change.**
+
+🚀 **Next action: Decide launch timeline, then implement critical features or launch beta.**