# πŸ›‘οΈ ZeroLagHub Cross-Project Tracker & Drift Prevention System **Last Updated**: December 7, 2025 **Version**: 1.0 (Canonical Architecture Enforcement) **Status**: ACTIVE - Must Be Consulted Before All Code Changes --- ## 🎯 Document Purpose This document establishes **architectural boundaries** across the three major ZeroLagHub systems to prevent drift, confusion, and broken contracts when switching between contexts or AI assistants. **Critical Insight**: Most project failures come from **gradual architectural erosion**, not sudden breaking changes. This document prevents that. --- ## πŸ“Š The Three Systems (Canonical Ownership) ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ NODE.JS API v2 β”‚ β”‚ (Orchestration Engine) β”‚ β”‚ β”‚ β”‚ Owns: VMID allocation, port management, DNS publishing, β”‚ β”‚ Velocity registration, job queue, database state β”‚ β”‚ β”‚ β”‚ Speaks To: Proxmox, Cloudflare, Technitium, Velocity, β”‚ β”‚ MariaDB, BullMQ, Go Agent (HTTP) β”‚ β”‚ β”‚ β”‚ Never Touches: Container filesystem, game server files, β”‚ β”‚ Java installation, artifact downloads β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ HTTP Contract β”‚ POST /config, /start, /stop β”‚ GET /status, /health β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ GO AGENT β”‚ β”‚ (Container-Internal Manager) β”‚ β”‚ β”‚ β”‚ Owns: Server installation, Java runtime, artifact β”‚ β”‚ downloads, process management, READY detection, β”‚ β”‚ filesystem layout, verification + self-repair β”‚ β”‚ β”‚ β”‚ Speaks To: Local filesystem, game server process, β”‚ β”‚ API (status updates via HTTP polling) β”‚ β”‚ β”‚ β”‚ Never Touches: Proxmox, DNS, Cloudflare, Velocity, β”‚ β”‚ port allocation, VMID selection β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ Status Polling β”‚ Agent reports state β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ NEXT.JS FRONTEND β”‚ β”‚ (Customer + Admin UI) β”‚ β”‚ β”‚ β”‚ Owns: User interaction, form validation, display logic, β”‚ β”‚ client-side state, UI components β”‚ β”‚ β”‚ β”‚ Speaks To: API v2 only (REST + WebSocket when added) β”‚ β”‚ β”‚ β”‚ Never Touches: Proxmox, Go Agent, DNS, Velocity, β”‚ β”‚ Cloudflare, direct container access β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` --- ## πŸ—ΊοΈ System Ownership Matrix (CANONICAL) | Area | Node.js API v2 | Go Agent | Frontend | |------|----------------|----------|----------| | **Provisioning Orchestration** | βœ… OWNER (allocates VMID, ports, builds LXC config) | ❌ Executes inside container | ❌ Triggers only | | **Template Selection** | βœ… OWNER (selects template, passes config) | ❌ Template contains agent | ❌ Displays options | | **Server Installation** | ❌ Never | βœ… OWNER (Java, artifacts, validation) | ❌ Displays results | | **Runtime Control** | βœ… OWNER (sends commands) | βœ… OWNER (executes commands) | ❌ UI only | | **DNS (Cloudflare + Technitium)** | βœ… OWNER (creates, deletes, tracks IDs) | ❌ Never | ❌ Displays info | | **Velocity Registration** | βœ… OWNER (registers, deregisters) | ❌ Never | ❌ Displays status | | **IP Logic** | βœ… OWNER (external + internal IPs) | ❌ Sees container IP only | ❌ Displays final | | **Port Allocation** | βœ… OWNER (PortPool DB management) | ❌ Receives assignments | ❌ Displays ports | | **Monitoring** | βœ… OWNER (collects metrics) | βœ… OWNER (exposes /health) | ❌ Displays data | | **Error Handling** | βœ… OWNER (BullMQ jobs, retries) | ❌ Local output only | ❌ User notifications | --- ## πŸ”„ API ↔ Agent Contract (IMMUTABLE) ### **API β†’ Agent Endpoints** ``` POST /config β”œβ”€ Payload: { β”‚ game: "minecraft", β”‚ variant: "paper", β”‚ version: "1.21.3", β”‚ ports: [25565, 25575], β”‚ memory: "4G", β”‚ motd: "Welcome to ZeroLagHub", β”‚ worldSettings: {...} β”‚ } └─ Response: 200 OK POST /start └─ Triggers: ensureProvisioned() β†’ StartServer() POST /stop └─ Triggers: Graceful shutdown via server stop command POST /restart └─ Triggers: stop β†’ start sequence GET /status └─ Response: { state: "RUNNING" | "INSTALLING" | "FAILED", pid: 12345, uptime: 3600, lastError: null } GET /health └─ Response: { healthy: true, java: "/usr/lib/jvm/java-21-openjdk-amd64", variant: "paper", ready: true } ``` ### **Agent β†’ API Signals (via /status polling)** ``` State Machine: INSTALLING β”œβ”€> DOWNLOADING_ARTIFACT β”œβ”€> INSTALLING_JAVA β”œβ”€> FINALIZING β”œβ”€> RUNNING └─> FAILED | CRASHED ``` ### **🚫 Forbidden Agent Behaviors** Agent **NEVER**: - ❌ Allocates or manages ports - ❌ Talks to Proxmox API - ❌ Creates DNS records - ❌ Registers with Velocity - ❌ Modifies LXC config - ❌ Allocates VMIDs - ❌ Manages templates - ❌ Talks to Cloudflare or Technitium **Violation Detection**: If agent code imports Proxmox client, DNS client, or port allocation logic β†’ **DRIFT VIOLATION** --- ## πŸ–₯️ Frontend ↔ API Contract (IMMUTABLE) ### **Frontend Allowed Endpoints** ``` Provisioning: POST /api/containers/create GET /api/containers/:vmid DELETE /api/containers/:vmid Control: POST /api/containers/:vmid/start POST /api/containers/:vmid/stop POST /api/containers/:vmid/restart Status: GET /api/containers/:vmid/status GET /api/containers/:vmid/logs GET /api/containers/:vmid/stats Discovery: GET /api/templates GET /api/containers (list user's containers) Read-Only Info: GET /api/dns/:hostname (display only) GET /api/velocity/status (display only) ``` ### **🚫 Forbidden Frontend Behaviors** Frontend **NEVER**: - ❌ Talks directly to Go Agent - ❌ Calls Proxmox API - ❌ Creates DNS records - ❌ Registers with Velocity - ❌ Allocates ports - ❌ Executes container commands - ❌ Accesses MariaDB directly **Violation Detection**: If frontend code imports Proxmox client, agent HTTP client (except via API), or database client β†’ **DRIFT VIOLATION** --- ## 🚨 Drift Detection Rules (ACTIVE ENFORCEMENT) ### **Rule #1: Provisioning Ownership** **Violation**: Agent asked to allocate ports, choose templates, create DNS, call Proxmox, manage VMIDs **Correct Path**: API owns ALL provisioning orchestration **Example Violation**: ```go // WRONG - Agent should NEVER do this func (a *Agent) allocatePorts() ([]int, error) { // ... port allocation logic } ``` **Correct Pattern**: ```javascript // RIGHT - API allocates, agent receives async function provisionInstance(game, variant) { const ports = await portAllocator.allocate(vmid); await agent.postConfig({ ports, game, variant }); } ``` --- ### **Rule #2: Artifact Ownership** **Violation**: API asked to install Java, download server.jar, run installers **Correct Path**: Agent owns ALL in-container installation **Example Violation**: ```javascript // WRONG - API should NEVER do this async function installMinecraft(vmid, version) { await proxmox.exec(vmid, `wget https://...`); await proxmox.exec(vmid, `java -jar installer.jar`); } ``` **Correct Pattern**: ```go // RIGHT - Agent handles installation func (p *Provisioner) ProvisionAll(cfg Config) error { downloadArtifact(cfg.Variant, cfg.Version) installJavaRuntime(cfg.Version) verifyInstallation() } ``` --- ### **Rule #3: Direct Container Access** **Violation**: Frontend or API wants to exec commands directly into container **Correct Path**: Agent owns container execution layer **Example Violation**: ```typescript // WRONG - Frontend should NEVER do this async function restartServer(vmid: number) { const ssh = new SSHClient(); await ssh.connect(containerIP); await ssh.exec('systemctl restart minecraft'); } ``` **Correct Pattern**: ```typescript // RIGHT - Frontend talks to API, API talks to agent async function restartServer(vmid: number) { await api.post(`/containers/${vmid}/restart`); } ``` --- ### **Rule #4: Networking Responsibilities** **Violation**: Agent asked to select public vs internal IPs, decide DNS zones **Correct Path**: API owns dual-IP logic (Cloudflare external, Technitium internal) **Example Violation**: ```go // WRONG - Agent should NEVER decide this func (a *Agent) determinePublicIP() string { if a.needsCloudflare() { return "139.64.165.248" } return a.containerIP } ``` **Correct Pattern**: ```javascript // RIGHT - API decides network topology function determineIPs(vmid, game) { const internalIP = `10.200.0.${vmid - 1000}`; const externalIP = "139.64.165.248"; // Cloudflare target const velocityIP = "10.70.0.241"; // Internal routing return { internalIP, externalIP, velocityIP }; } ``` --- ### **Rule #5: Proxy Responsibilities** **Violation**: Agent asked to register with Velocity, configure proxy routing **Correct Path**: API owns ALL proxy integrations **Example Violation**: ```go // WRONG - Agent should NEVER do this func (a *Agent) registerWithVelocity() error { client := velocity.NewClient() return client.Register(a.hostname, a.port) } ``` **Correct Pattern**: ```javascript // RIGHT - API handles Velocity registration async function registerVelocity(vmid, hostname, internalIP) { await velocityBridge.registerBackend({ name: hostname, address: internalIP, port: 25565 }); } ``` --- ## πŸ”„ Context Switching Safety Workflow ### **When Moving to Node.js API Work:** **Pre-Switch Checklist**: - [ ] Agent contract unchanged? (POST /config, /start, /stop, GET /status) - [ ] Database schema unchanged? (Prisma models consistent) - [ ] LXC template IDs unchanged? (VMID 800 for game, 6000-series for dev) - [ ] DNS/IP logic consistent? (Cloudflare external, Technitium internal) - [ ] Port allocation logic preserved? (PortPool DB-backed) **Common Drift Patterns**: - ⚠️ Adding agent installation logic to API - ⚠️ Changing agent contract without updating both sides - ⚠️ Moving DNS logic to different service - ⚠️ Bypassing job queue for provisioning --- ### **When Moving to Go Agent Work:** **Pre-Switch Checklist**: - [ ] No provisioning logic outside allowed scope (no port allocation, DNS, etc.) - [ ] File paths remain canonical (`/opt/zlh///world`) - [ ] Naming conventions maintained (`server.jar`, `fabric-server.jar`, etc.) - [ ] No external API calls (Proxmox, DNS, Velocity) - [ ] Status states unchanged (INSTALLING, RUNNING, FAILED, etc.) **Common Drift Patterns**: - ⚠️ Adding port allocation to agent - ⚠️ Making agent talk to external services - ⚠️ Changing directory structure without API coordination - ⚠️ Adding orchestration logic to agent --- ### **When Moving to Frontend Work:** **Pre-Switch Checklist**: - [ ] Only API-approved fields used in UI - [ ] No direct agent HTTP calls - [ ] VMID not exposed in user-facing UI - [ ] Internal IPs not displayed to users - [ ] All state from API, not computed locally **Common Drift Patterns**: - ⚠️ Adding direct agent calls from frontend - ⚠️ Computing server state client-side - ⚠️ Exposing internal infrastructure details - ⚠️ Bypassing API for container control --- ## 🚨 High-Risk Integration Zones (GUARDED) These areas have historically caused drift across sessions: ### **1. Forge / NeoForge Installation Logic** - **Risk**: Agent vs API confusion on who handles `run.sh` patching - **Guard**: Agent owns ALL Forge installation, API just passes config - **Test**: Can provision Forge 1.21.3 without API filesystem access? ### **2. Cloudflare SRV Deletion** - **Risk**: Case sensitivity, subdomain normalization, record ID tracking - **Guard**: API stores Cloudflare record IDs in EdgeState, deletes by ID - **Test**: Create β†’ Delete β†’ Recreate same hostname without orphans? ### **3. Technitium DNS Zone Mismatch** - **Risk**: Wrong zone selection, duplicate records - **Guard**: API hardcodes zone as `zpack.zerolaghub.com`, validates before creation - **Test**: No records created in wrong zones? ### **4. Velocity Registration Order** - **Risk**: Registering before server ready, deregistering incorrectly - **Guard**: API waits for agent RUNNING state, then registers Velocity - **Test**: Player connection works immediately after provisioning complete? ### **5. PortPool Commit Logic** - **Risk**: Race conditions, double-allocation, uncommitted ports - **Guard**: API allocates β†’ provisions β†’ commits (rollback on failure) - **Test**: Concurrent provisions don't collide on ports? ### **6. Agent READY Detection** - **Risk**: False negatives, false positives, variant-specific patterns - **Guard**: Agent uses variant-aware log parsing, multiple confirmation lines - **Test**: All 6 variants correctly detect READY state? ### **7. Server Start-Up False Negatives** - **Risk**: Timeout too short, log parsing too strict - **Guard**: Agent increases timeout for Forge (90s), multiple log patterns - **Test**: Forge installer completes without false failure? ### **8. IP Selection Logic** - **Risk**: Confusing external (Cloudflare) vs internal (Velocity/Technitium) IPs - **Guard**: API clearly separates: externalIP (139.64.165.248), internalIP (10.200.0.X), velocityIP (10.70.0.241) - **Test**: DNS points to correct IPs, Velocity routes to correct internal IP? --- ## πŸ“‹ Architecture Decision Log (LOCKED) ⭐ NEW **Purpose**: Records finalized architectural decisions that **must not be re-litigated** unless explicitly requested. **Status**: LOCKED - These decisions are final and cannot be changed without user explicitly saying "Revisit decision X" --- ### **DEC-001: Templates vs Go Agent (FINAL)** **Decision**: Hybrid model - LXC templates define **base environment only** - Go Agent is **authoritative execution layer** inside containers **Rationale**: - Templates alone cannot handle multi-variant logic (Forge, NeoForge, Fabric) - Agent enables self-repair, async provisioning, runtime control - Hybrid provides speed + flexibility without API container access **Applies To**: API v2, Go Agent, Frontend **Status**: βœ… LOCKED --- ### **DEC-002: Provisioning Authority** **Decision**: API orchestrates, Agent executes **Rationale**: - API has global visibility (DB, DNS, Proxmox, Velocity) - Agent is intentionally sandboxed to container filesystem + process **Applies To**: All systems **Status**: βœ… LOCKED --- ### **DEC-003: DNS & Edge Publishing Ownership** **Decision**: API-only responsibility **Rationale**: - Requires external credentials (Cloudflare, Technitium) - Must correlate DB state, record IDs, reconciliation jobs **Applies To**: API v2 **Status**: βœ… LOCKED --- ### **DEC-004: Proxy Stack** **Decision**: Traefik + Velocity only **Rationale**: - Traefik for HTTP/control-plane - Velocity for Minecraft TCP routing - **HAProxy explicitly deprecated** for ZeroLagHub **Applies To**: Infrastructure, API **Status**: βœ… LOCKED --- ### **DEC-005: State Persistence** **Decision**: MariaDB is single source of truth **Rationale**: - Flat files caused race conditions and drift - DB enables reconciliation, recovery, observability **Applies To**: API v2 **Status**: βœ… LOCKED --- ### **DEC-006: Frontend Access Model** **Decision**: Frontend communicates with API only **Rationale**: - Security boundary - Prevents leaking infrastructure details **Applies To**: Frontend **Status**: βœ… LOCKED --- ### **DEC-007: Architecture Enforcement Policy** **Decision**: Drift prevention is mandatory **Rationale**: - Prevents oscillation between alternatives - Preserves velocity during late-stage development **Applies To**: All work sessions **Status**: βœ… LOCKED --- ## πŸ“‹ Canonical Architecture Anchors (ABSOLUTE) These rules are **immutable** unless explicitly changed with full system review: ### **Anchor #1: Orchestration** βœ… API v2 orchestrates EVERYTHING (jobs, provisioning, DNS, proxy, lifecycle) ### **Anchor #2: Container-Internal** βœ… Agent performs EVERYTHING inside container (install, start, stop, detect) ### **Anchor #3: Templates** βœ… Templates contain agent + base environment (VMID 800 for game, 6000-series for dev) ### **Anchor #4: Job Queue** βœ… BullMQ/Redis drive job system (async provisioning, retries, reconciliation) ### **Anchor #5: Database** βœ… MariaDB holds all state (PortPool, ContainerInstance, EdgeState, etc.) ### **Anchor #6: Infrastructure** βœ… Proxmox API (not Ansible) for LXC management ### **Anchor #7: Routing** βœ… Traefik (HTTP) + Velocity (Minecraft) are ONLY routing/proxy systems ### **Anchor #8: DNS** βœ… Cloudflare = authoritative public DNS βœ… Technitium = authoritative internal DNS ### **Anchor #9: Frontend Isolation** βœ… Frontend speaks ONLY to API (no direct agent, Proxmox, DNS, Velocity) ### **Anchor #10: Directory Structure** βœ… `/opt/zlh///world` is canonical game server path --- ## πŸ›‘οΈ Enforcement Policies (ACTIVE) When future instructions conflict with Canonical Architecture or Architecture Decision Log: ### **Step 1: STOP** Immediately halt the task. Do not proceed with drift-inducing change. ### **Step 2: RAISE DRIFT WARNING** ``` ⚠️ DRIFT WARNING ⚠️ Proposed change violates Canonical Architecture: Rule Violated: [Rule #X: Description] OR Decision Violated: [DEC-XXX: Decision Name] Violation: [Specific behavior that violates rule/decision] Correct Path: [Architecture-aligned approach] Impact: [What breaks if this drift is allowed] Options: 1. Implement correct architecture-aligned path 2. Amend Canonical Architecture (requires full system review) 3. Request user to "Revisit decision X" (for ADL changes) 4. Cancel proposed change ``` ### **Step 3: PROVIDE CORRECT PATH** Show the architecture-aligned implementation that achieves the same goal. ### **Step 4: ASK FOR DIRECTION** ``` Should we: A) Implement the correct architecture-aligned path? B) Perform full system review to amend architecture? C) Request user to explicitly revisit locked decision? D) Cancel this change? ``` ### **Step 5: ARCHITECTURE DECISION LOG SPECIAL RULE** If violation is against a **LOCKED** Architecture Decision (DEC-001 through DEC-007): **ADDITIONAL CHECK**: ``` ⚠️ LOCKED DECISION WARNING ⚠️ This change conflicts with Architecture Decision Log entry: [DEC-XXX] Status: LOCKED This decision can ONLY be changed if the user explicitly says: "Revisit decision [DEC-XXX]" Without explicit user request to revisit, this decision is FINAL. Proceeding with this change would violate architectural governance. ``` **Never proceed** with drift-inducing changes without explicit confirmation. **Never re-litigate** locked decisions without user explicitly requesting revision. --- ## πŸ“Š Drift Detection Examples ### **Example 1: Agent Port Allocation (VIOLATION)** **Proposed Change**: ```go // Agent code proposal func (a *Agent) allocatePort() int { // Find available port... return port } ``` **Drift Warning**: ``` ⚠️ DRIFT WARNING ⚠️ Rule Violated: Rule #1 - Provisioning Ownership Violation: Agent attempting to allocate ports Correct Path: API allocates ports, agent receives them via /config Impact: Port collisions, database inconsistency, broken PortPool Recommendation: Remove port allocation from agent, ensure API sends ports in config payload ``` --- ### **Example 2: Frontend Direct Agent Call (VIOLATION)** **Proposed Change**: ```typescript // Frontend code proposal async function getServerLogs(vmid: number) { const agentURL = `http://10.200.0.${vmid}:8080/logs`; return await fetch(agentURL); } ``` **Drift Warning**: ``` ⚠️ DRIFT WARNING ⚠️ Rule Violated: Rule #3 - Direct Container Access Violation: Frontend bypassing API to talk to agent Correct Path: Frontend β†’ API β†’ Agent (API proxies logs) Impact: Broken frontend if container IP changes, no auth/rate limiting, security risk Recommendation: Add GET /api/containers/:vmid/logs endpoint that proxies to agent ``` --- ### **Example 3: API Installing Java (VIOLATION)** **Proposed Change**: ```javascript // API code proposal async function provisionServer(vmid, game, variant) { await proxmox.exec(vmid, 'apt-get install openjdk-21-jdk'); await proxmox.exec(vmid, 'wget https://papermc.io/...'); } ``` **Drift Warning**: ``` ⚠️ DRIFT WARNING ⚠️ Rule Violated: Rule #2 - Artifact Ownership Violation: API performing in-container installation Correct Path: API sends config to agent, agent handles installation Impact: Breaks agent self-repair, variant-specific logic duplicated, no verification system Recommendation: Remove installation logic from API, ensure agent receives proper config via POST /config ``` --- ## πŸ“ Integration with Existing Documentation ### **Relationship to Master Bootstrap** - Master Bootstrap: Strategic overview and business model - **This Document**: Technical governance and boundary enforcement - **Usage**: Consult this before implementing ANY code changes ### **Relationship to Complete Current State** - Complete Current State: What's working, what's next - **This Document**: How things MUST work (regardless of current state) - **Usage**: This is the "law", current state is "status" ### **Relationship to Engineering Handover** - Engineering Handover: Daily tactical tasks and sprint plan - **This Document**: Constraints within which tasks must be implemented - **Usage**: Check this before starting each handover task --- ## πŸ”„ Architecture Amendment Process If legitimate need to change Canonical Architecture: ### **Step 1: Identify Change** Document exactly what architectural boundary needs to change and why. ### **Step 2: Full System Impact Analysis** - What breaks in API? - What breaks in Agent? - What breaks in Frontend? - What changes to contracts? - What database migrations needed? ### **Step 3: Update ALL Affected Documents** - This document (Canonical Architecture) - Master Bootstrap (if strategic impact) - Complete Current State (implementation changes) - Engineering Handover (sprint tasks) - Agent Spec, Operational Guide (if affected) ### **Step 4: Update ALL Systems** - API code + tests - Agent code + tests - Frontend code + tests - Database schema (migration) - Infrastructure config ### **Step 5: Validation** - Integration tests pass? - No new drift introduced? - Documentation consistent? - All AIs briefed on change? **Only after ALL steps** is architecture amendment complete. --- ## 🎯 Quick Reference Card ### **API Owns** - βœ… Provisioning orchestration - βœ… Port allocation (PortPool) - βœ… DNS (Cloudflare + Technitium) - βœ… Velocity registration - βœ… IP logic (external + internal) - βœ… Job queue (BullMQ) - βœ… Database state ### **Agent Owns** - βœ… Container-internal installation - βœ… Java runtime - βœ… Artifact downloads - βœ… Server process management - βœ… READY detection - βœ… Self-repair + verification - βœ… Filesystem layout ### **Frontend Owns** - βœ… User interaction - βœ… Display logic - βœ… Client state - βœ… Form validation ### **Never** - ❌ Agent allocates ports - ❌ Agent talks to DNS/Velocity/Proxmox - ❌ API installs server files - ❌ API executes in-container commands - ❌ Frontend talks to agent directly - ❌ Frontend talks to infrastructure --- ## πŸ“‹ Session Start Checklist Before every coding session with any AI: - [ ] Read this document's Quick Reference Card - [ ] Identify which system you're working on (API, Agent, Frontend) - [ ] Review that system's "Owns" list - [ ] Check High-Risk Integration Zones if touching those areas - [ ] Verify no drift from previous session - [ ] Confirm contracts unchanged since last session **If ANY doubt**: Re-read full Canonical Architecture Anchors section. --- ## βœ… Document Status **Status**: ACTIVE - Must be consulted before all code changes **Enforcement**: MANDATORY - Drift violations must be caught **Authority**: CANONICAL - Overrides conflicting guidance **Updates**: Only via Architecture Amendment Process --- πŸ›‘οΈ **This document prevents architectural drift. Violate at your own risk.**