knowledge-base/ZeroLagHub_Cross_Project_Tracker.md

# 🛡️ ZeroLagHub Cross-Project Tracker & Drift Prevention System

**Last Updated**: January 18, 2026
**Version**: 1.1 (Canonical Architecture Enforcement + Git Status Update)
**Status**: ACTIVE - Must Be Consulted Before All Code Changes

---

## 🎯 Document Purpose

This document establishes **architectural boundaries** across the three major ZeroLagHub systems to prevent drift, confusion, and broken contracts when switching between contexts or AI assistants.

**Critical Insight**: Most project failures come from **gradual architectural erosion**, not sudden breaking changes. This document prevents that.

---

## 🚨 CRITICAL STATUS UPDATE (January 18, 2026)

### **Git Repository Status**

**zlh-api Repository: 🔴 EMPTY**
- **Created**: December 28, 2025
- **Status**: No code pushed to git
- **Impact**:
  - Cannot verify December 20 DNS fix application
  - No version control for API codebase
  - No change tracking or collaboration possible
- **Action Required**: IMMEDIATE - Push API codebase to git

**zlh-grind Repository: ✅ CURRENT**
- **Last Updated**: January 18, 2026
- **Status**: Architectural guardrails established
- **Recent Changes**:
  - PORTAL_MIGRATION.md - Architectural boundaries added
  - CONSTRAINTS.md - Network architecture rules added
  - ANTI_DRIFT_GUARDRAIL.md - AI-specific guardrails added

**knowledge-base Repository: ⚠️ OUTDATED**
- **Last Updated**: December 7, 2025 (6+ weeks ago)
- **Missing**: 4+ weeks of session summaries (Dec 20 - Jan 18)
- **Status**: Being updated now (Jan 18, 2026)

### **Outstanding Critical Issues**

1. **DNS Fix Status Unknown** 🔴
   - Fix identified: December 20, 2025
   - Location: `provisionAgent.js` lines 46, 330-331, 402
   - Status: Cannot verify if applied (API not in git)
   - Impact: Servers may still be unreachable if not applied

2. **API Codebase Not Version Controlled** 🔴
   - Running in production but not in git
   - Cannot track changes, review code, or collaborate
   - Immediate risk to project continuity

3. **Documentation Debt** 🟡
   - 6 weeks without tracker updates
   - 4 weeks of undocumented sessions
   - Risk of losing institutional knowledge

---

## 📊 The Three Systems (Canonical Ownership)

```
┌────────────────────────────────────────────────────────────┐
│                   NODE.JS API v2                           │
│                (Orchestration Engine)                      │
│                                                            │
│  Owns: VMID allocation, port management, DNS publishing,   │
│        Velocity registration, job queue, database state    │
│                                                            │
│  Speaks To: Proxmox, Cloudflare, Technitium, Velocity,    │
│             MariaDB, BullMQ, Go Agent (HTTP)               │
│                                                            │
│  Never Touches: Container filesystem, game server files,   │
│                Java installation, artifact downloads       │
└────────────────────────────┬───────────────────────────────┘
                             │
                             │ HTTP Contract
                             │ POST /config, /start, /stop
                             │ GET /status, /health
                             │
                             ▼
┌────────────────────────────────────────────────────────────┐
│                      GO AGENT                              │
│              (Container-Internal Manager)                  │
│                                                            │
│  Owns: Server installation, Java runtime, artifact        │
│        downloads, process management, READY detection,     │
│        filesystem layout, verification + self-repair      │
│                                                            │
│  Speaks To: Local filesystem, game server process,        │
│             API (status updates via HTTP polling)          │
│                                                            │
│  Never Touches: Proxmox, DNS, Cloudflare, Velocity,       │
│                 port allocation, VMID selection            │
└────────────────────────────┬───────────────────────────────┘
                             │
                             │ Status Polling
                             │ Agent reports state
                             │
                             ▼
┌────────────────────────────────────────────────────────────┐
│                   NEXT.JS FRONTEND                         │
│                (Customer + Admin UI)                       │
│                                                            │
│  Owns: User interaction, form validation, display logic,   │
│        client-side state, UI components                    │
│                                                            │
│  Speaks To: API v2 only (REST + WebSocket when added)     │
│                                                            │
│  Never Touches: Proxmox, Go Agent, DNS, Velocity,         │
│                 Cloudflare, direct container access        │
└────────────────────────────────────────────────────────────┘
```

---

## 🗺️ System Ownership Matrix (CANONICAL)

| Area | Node.js API v2 | Go Agent | Frontend |
|------|----------------|----------|----------|
| **Provisioning Orchestration** | ✅ OWNER (allocates VMID, ports, builds LXC config) | ❌ Executes inside container | ❌ Triggers only |
| **Template Selection** | ✅ OWNER (selects template, passes config) | ❌ Template contains agent | ❌ Displays options |
| **Server Installation** | ❌ Never | ✅ OWNER (Java, artifacts, validation) | ❌ Displays results |
| **Runtime Control** | ✅ OWNER (sends commands) | ✅ OWNER (executes commands) | ❌ UI only |
| **DNS (Cloudflare + Technitium)** | ✅ OWNER (creates, deletes, tracks IDs) | ❌ Never | ❌ Displays info |
| **Velocity Registration** | ✅ OWNER (registers, deregisters) | ❌ Never | ❌ Displays status |
| **IP Logic** | ✅ OWNER (external + internal IPs) | ❌ Sees container IP only | ❌ Displays final |
| **Port Allocation** | ✅ OWNER (PortPool DB management) | ❌ Receives assignments | ❌ Displays ports |
| **Monitoring** | ✅ OWNER (collects metrics) | ✅ OWNER (exposes /health) | ❌ Displays data |
| **Error Handling** | ✅ OWNER (BullMQ jobs, retries) | ❌ Local output only | ❌ User notifications |

---

## 🔒 API ↔ Agent Contract (IMMUTABLE)

### **API → Agent Endpoints**

```
POST /config
├─ Payload: {
│    game: "minecraft",
│    variant: "paper",
│    version: "1.21.3",
│    ports: [25565, 25575],
│    memory: "4G",
│    motd: "Welcome to ZeroLagHub",
│    worldSettings: {...}
│  }
└─ Response: 200 OK

POST /start
└─ Triggers: ensureProvisioned() → StartServer()

POST /stop
└─ Triggers: Graceful shutdown via server stop command

POST /restart
└─ Triggers: stop → start sequence

GET /status
└─ Response: {
     state: "RUNNING" | "INSTALLING" | "FAILED",
     pid: 12345,
     uptime: 3600,
     lastError: null
   }

GET /health
└─ Response: {
     healthy: true,
     java: "/usr/lib/jvm/java-21-openjdk-amd64",
     variant: "paper",
     ready: true
   }
```

### **Agent → API Signals (via /status polling)**

```
State Machine:
INSTALLING
  ├─> DOWNLOADING_ARTIFACT
  ├─> INSTALLING_JAVA
  ├─> FINALIZING
  ├─> RUNNING
  └─> FAILED | CRASHED
```

### **🚫 Forbidden Agent Behaviors**

Agent **NEVER**:
- ❌ Allocates or manages ports
- ❌ Talks to Proxmox API
- ❌ Creates DNS records
- ❌ Registers with Velocity
- ❌ Modifies LXC config
- ❌ Allocates VMIDs
- ❌ Manages templates
- ❌ Talks to Cloudflare or Technitium

**Violation Detection**: If agent code imports Proxmox client, DNS client, or port allocation logic → **DRIFT VIOLATION**

---

## 🖥️ Frontend ↔ API Contract (IMMUTABLE)

### **Frontend Allowed Endpoints**

```
Provisioning:
POST /api/containers/create
GET  /api/containers/:vmid
DELETE /api/containers/:vmid

Control:
POST /api/containers/:vmid/start
POST /api/containers/:vmid/stop
POST /api/containers/:vmid/restart

Status:
GET /api/containers/:vmid/status
GET /api/containers/:vmid/logs
GET /api/containers/:vmid/stats

Discovery:
GET /api/templates
GET /api/containers (list user's containers)

Read-Only Info:
GET /api/dns/:hostname (display only)
GET /api/velocity/status (display only)
```

### **🚫 Forbidden Frontend Behaviors**

Frontend **NEVER**:
- ❌ Talks directly to Go Agent
- ❌ Calls Proxmox API
- ❌ Creates DNS records
- ❌ Registers with Velocity
- ❌ Allocates ports
- ❌ Executes container commands
- ❌ Accesses MariaDB directly

**Violation Detection**: If frontend code imports Proxmox client, agent HTTP client (except via API), or database client → **DRIFT VIOLATION**

---

## 🚨 NEW: Architectural Boundary Enforcement (January 18, 2026)

### **Critical Rule: Frontend Cannot Call Agents**

**The Hard Reality**:
- Container IPs are internal-only (10.x network)
- No network path exists from browser to container
- Agents have no CORS headers (they're not web services)
- Direct calls would fail at network layer

**Why This Matters**:
- AI tools may suggest "quick fixes" calling agents directly
- Developers may try to add CORS to agents
- Frontend shortcuts bypass security/auth/rate limiting
- Breaks architectural isolation

**Correct Flow**:
```
User Action → Frontend → API → Agent → Response
```

**Forbidden Flow**:
```
User Action → Frontend → Agent (FAILS - no network path)
```

### **Common Drift Patterns (Now Prevented)**

**Pattern 1: AI Tool Suggests Direct Agent Call**
```javascript
// WRONG - AI suggestion
async function getServerLogs(vmid) {
    const agentURL = `http://10.200.0.${vmid}:8080/logs`;
    return await fetch(agentURL); // FAILS - no route
}

// CORRECT - Architectural pattern
async function getServerLogs(vmid) {
    return await api.get(`/containers/${vmid}/logs`);
}
```

**Pattern 2: Adding CORS to Agents**
```go
// WRONG - Never add this to agents
func (a *Agent) enableCORS() {
    a.router.Use(cors.New(cors.Config{
        AllowOrigins: ["*"],
    }))
}
```

**Pattern 3: Exposing Agent Ports**
```nginx
# WRONG - Never proxy agent ports
location /agent/ {
    proxy_pass http://10.200.0.100:8080/;
}
```

### **Enforcement Documentation**

Three-layer defense established in `zlh-grind`:
1. **PORTAL_MIGRATION.md** - High-level boundaries
2. **CONSTRAINTS.md** - Hard technical rules
3. **ANTI_DRIFT_GUARDRAIL.md** - AI-specific warnings

**Rule**: When code conflicts with documentation, **documentation wins**.

---

## 🛡️ Drift Detection Rules (ACTIVE ENFORCEMENT)

### **Rule #1: Provisioning Ownership**

**Violation**: Agent asked to allocate ports, choose templates, create DNS, call Proxmox, manage VMIDs

**Correct Path**: API owns ALL provisioning orchestration

**Example Violation**:
```go
// WRONG - Agent should NEVER do this
func (a *Agent) allocatePorts() ([]int, error) {
    // ... port allocation logic
}
```

**Correct Pattern**:
```javascript
// RIGHT - API allocates, agent receives
async function provisionInstance(game, variant) {
    const ports = await portAllocator.allocate(vmid);
    await agent.postConfig({ ports, game, variant });
}
```

---

### **Rule #2: Artifact Ownership**

**Violation**: API asked to install Java, download server.jar, run installers

**Correct Path**: Agent owns ALL in-container installation

**Example Violation**:
```javascript
// WRONG - API should NEVER do this
async function installMinecraft(vmid, version) {
    await proxmox.exec(vmid, `wget https://...`);
    await proxmox.exec(vmid, `java -jar installer.jar`);
}
```

**Correct Pattern**:
```go
// RIGHT - Agent handles installation
func (p *Provisioner) ProvisionAll(cfg Config) error {
    downloadArtifact(cfg.Variant, cfg.Version)
    installJavaRuntime(cfg.Version)
    verifyInstallation()
}
```

---

### **Rule #3: Direct Container Access**

**Violation**: Frontend or API wants to exec commands directly into container

**Correct Path**: Agent owns container execution layer

**Example Violation**:
```typescript
// WRONG - Frontend should NEVER do this
async function restartServer(vmid: number) {
    const ssh = new SSHClient();
    await ssh.connect(containerIP);
    await ssh.exec('systemctl restart minecraft');
}
```

**Correct Pattern**:
```typescript
// RIGHT - Frontend talks to API, API talks to agent
async function restartServer(vmid: number) {
    await api.post(`/containers/${vmid}/restart`);
}
```

---

### **Rule #4: Networking Responsibilities**

**Violation**: Agent asked to select public vs internal IPs, decide DNS zones

**Correct Path**: API owns dual-IP logic (Cloudflare external, Technitium internal)

**Example Violation**:
```go
// WRONG - Agent should NEVER decide this
func (a *Agent) determinePublicIP() string {
    if a.needsCloudflare() {
        return "139.64.165.248"
    }
    return a.containerIP
}
```

**Correct Pattern**:
```javascript
// RIGHT - API decides network topology
function determineIPs(vmid, game) {
    const internalIP = `10.200.0.${vmid - 1000}`;
    const externalIP = "139.64.165.248"; // Cloudflare target
    const velocityIP = "10.70.0.241"; // Internal routing

    return { internalIP, externalIP, velocityIP };
}
```

---

### **Rule #5: Proxy Responsibilities**

**Violation**: Agent asked to register with Velocity, configure proxy routing

**Correct Path**: API owns ALL proxy integrations

**Example Violation**:
```go
// WRONG - Agent should NEVER do this
func (a *Agent) registerWithVelocity() error {
    client := velocity.NewClient()
    return client.Register(a.hostname, a.port)
}
```

**Correct Pattern**:
```javascript
// RIGHT - API handles Velocity registration
async function registerVelocity(vmid, hostname, internalIP) {
    await velocityBridge.registerBackend({
        name: hostname,
        address: internalIP,
        port: 25565
    });
}
```

---

## 🔍 Context Switching Safety Workflow

### **When Moving to Node.js API Work:**

**Pre-Switch Checklist**:
- [ ] Agent contract unchanged? (POST /config, /start, /stop, GET /status)
- [ ] Database schema unchanged? (Prisma models consistent)
- [ ] LXC template IDs unchanged? (VMID 800 for game, 6000-series for dev)
- [ ] DNS/IP logic consistent? (Cloudflare external, Technitium internal)
- [ ] Port allocation logic preserved? (PortPool DB-backed)

**Common Drift Patterns**:
- ⚠️ Adding agent installation logic to API
- ⚠️ Changing agent contract without updating both sides
- ⚠️ Moving DNS logic to different service
- ⚠️ Bypassing job queue for provisioning

---

### **When Moving to Go Agent Work:**

**Pre-Switch Checklist**:
- [ ] No provisioning logic outside allowed scope (no port allocation, DNS, etc.)
- [ ] File paths remain canonical (`/opt/zlh/<game>/<variant>/world`)
- [ ] Naming conventions maintained (`server.jar`, `fabric-server.jar`, etc.)
- [ ] No external API calls (Proxmox, DNS, Velocity)
- [ ] Status states unchanged (INSTALLING, RUNNING, FAILED, etc.)

**Common Drift Patterns**:
- ⚠️ Adding port allocation to agent
- ⚠️ Making agent talk to external services
- ⚠️ Changing directory structure without API coordination
- ⚠️ Adding orchestration logic to agent

---

### **When Moving to Frontend Work:**

**Pre-Switch Checklist**:
- [ ] Only API-approved fields used in UI
- [ ] No direct agent HTTP calls
- [ ] VMID not exposed in user-facing UI
- [ ] Internal IPs not displayed to users
- [ ] All state from API, not computed locally

**Common Drift Patterns**:
- ⚠️ Adding direct agent calls from frontend
- ⚠️ Computing server state client-side
- ⚠️ Exposing internal infrastructure details
- ⚠️ Bypassing API for container control

---

## 🛡️ High-Risk Integration Zones (GUARDED)

These areas have historically caused drift across sessions:

### **1. Forge / NeoForge Installation Logic**
- **Risk**: Agent vs API confusion on who handles `run.sh` patching
- **Guard**: Agent owns ALL Forge installation, API just passes config
- **Test**: Can provision Forge 1.21.3 without API filesystem access?

### **2. Cloudflare SRV Deletion**
- **Risk**: Case sensitivity, subdomain normalization, record ID tracking
- **Guard**: API stores Cloudflare record IDs in EdgeState, deletes by ID
- **Test**: Create → Delete → Recreate same hostname without orphans?

### **3. Technitium DNS Zone Mismatch**
- **Risk**: Wrong zone selection, duplicate records
- **Guard**: API hardcodes zone as `zpack.zerolaghub.com`, validates before creation
- **Test**: No records created in wrong zones?

### **4. Velocity Registration Order**
- **Risk**: Registering before server ready, deregistering incorrectly
- **Guard**: API waits for agent RUNNING state, then registers Velocity
- **Test**: Player connection works immediately after provisioning complete?

### **5. PortPool Commit Logic**
- **Risk**: Race conditions, double-allocation, uncommitted ports
- **Guard**: API allocates → provisions → commits (rollback on failure)
- **Test**: Concurrent provisions don't collide on ports?

### **6. Agent READY Detection**
- **Risk**: False negatives, false positives, variant-specific patterns
- **Guard**: Agent uses variant-aware log parsing, multiple confirmation lines
- **Test**: All 6 variants correctly detect READY state?

### **7. Server Start-Up False Negatives**
- **Risk**: Timeout too short, log parsing too strict
- **Guard**: Agent increases timeout for Forge (90s), multiple log patterns
- **Test**: Forge installer completes without false failure?

### **8. IP Selection Logic**
- **Risk**: Confusing external (Cloudflare) vs internal (Velocity/Technitium) IPs
- **Guard**: API clearly separates: externalIP (139.64.165.248), internalIP (10.200.0.X), velocityIP (10.70.0.241)
- **Test**: DNS points to correct IPs, Velocity routes to correct internal IP?

---

## 📊 Architecture Decision Log (LOCKED) ⭐ NEW

**Purpose**: Records finalized architectural decisions that **must not be re-litigated** unless explicitly requested.

**Status**: LOCKED - These decisions are final and cannot be changed without user explicitly saying "Revisit decision X"

---

### **DEC-001: Templates vs Go Agent (FINAL)**

**Decision**: Hybrid model
- LXC templates define **base environment only**
- Go Agent is **authoritative execution layer** inside containers

**Rationale**:
- Templates alone cannot handle multi-variant logic (Forge, NeoForge, Fabric)
- Agent enables self-repair, async provisioning, runtime control
- Hybrid provides speed + flexibility without API container access

**Applies To**: API v2, Go Agent, Frontend
**Status**: ✅ LOCKED

---

### **DEC-002: Provisioning Authority**

**Decision**: API orchestrates, Agent executes

**Rationale**:
- API has global visibility (DB, DNS, Proxmox, Velocity)
- Agent is intentionally sandboxed to container filesystem + process

**Applies To**: All systems
**Status**: ✅ LOCKED

---

### **DEC-003: DNS & Edge Publishing Ownership**

**Decision**: API-only responsibility

**Rationale**:
- Requires external credentials (Cloudflare, Technitium)
- Must correlate DB state, record IDs, reconciliation jobs

**Applies To**: API v2
**Status**: ✅ LOCKED

---

### **DEC-004: Proxy Stack**

**Decision**: Traefik + Velocity only

**Rationale**:
- Traefik for HTTP/control-plane
- Velocity for Minecraft TCP routing
- **HAProxy explicitly deprecated** for ZeroLagHub

**Applies To**: Infrastructure, API
**Status**: ✅ LOCKED

---

### **DEC-005: State Persistence**

**Decision**: MariaDB is single source of truth

**Rationale**:
- Flat files caused race conditions and drift
- DB enables reconciliation, recovery, observability

**Applies To**: API v2
**Status**: ✅ LOCKED

---

### **DEC-006: Frontend Access Model**

**Decision**: Frontend communicates with API only

**Rationale**:
- Security boundary
- Prevents leaking infrastructure details

**Applies To**: Frontend
**Status**: ✅ LOCKED

---

### **DEC-007: Architecture Enforcement Policy**

**Decision**: Drift prevention is mandatory

**Rationale**:
- Prevents oscillation between alternatives
- Preserves velocity during late-stage development

**Applies To**: All work sessions
**Status**: ✅ LOCKED

---

### **DEC-008: Frontend-Agent Network Isolation (NEW - January 18, 2026)**

**Decision**: Frontend can NEVER call agents directly

**Rationale**:
- Container IPs (10.x) are internal-only with no public routing
- Agents have no CORS headers (not web services)
- Direct calls would fail at network layer
- API enforces auth, rate limits, access control

**Technical Reality**:
- No network path exists from browser to container
- Even if CORS added, network routing blocks access
- This is architectural fact, not policy choice

**Applies To**: Frontend, All Documentation
**Status**: ✅ LOCKED

---

## 📊 Canonical Architecture Anchors (ABSOLUTE)

These rules are **immutable** unless explicitly changed with full system review:

### **Anchor #1: Orchestration**
✅ API v2 orchestrates EVERYTHING (jobs, provisioning, DNS, proxy, lifecycle)

### **Anchor #2: Container-Internal**
✅ Agent performs EVERYTHING inside container (install, start, stop, detect)

### **Anchor #3: Templates**
✅ Templates contain agent + base environment (VMID 800 for game, 6000-series for dev)

### **Anchor #4: Job Queue**
✅ BullMQ/Redis drive job system (async provisioning, retries, reconciliation)

### **Anchor #5: Database**
✅ MariaDB holds all state (PortPool, ContainerInstance, EdgeState, etc.)

### **Anchor #6: Infrastructure**
✅ Proxmox API (not Ansible) for LXC management

### **Anchor #7: Routing**
✅ Traefik (HTTP) + Velocity (Minecraft) are ONLY routing/proxy systems

### **Anchor #8: DNS**
✅ Cloudflare = authoritative public DNS
✅ Technitium = authoritative internal DNS

### **Anchor #9: Frontend Isolation**
✅ Frontend speaks ONLY to API (no direct agent, Proxmox, DNS, Velocity)

### **Anchor #10: Directory Structure**
✅ `/opt/zlh/<game>/<variant>/world` is canonical game server path

### **Anchor #11: Network Isolation (NEW)**
✅ Container IPs (10.x) are internal-only
✅ No network path from frontend to agents
✅ API is the only bridge between public and internal networks

---

## 🛡️ Enforcement Policies (ACTIVE)

When future instructions conflict with Canonical Architecture or Architecture Decision Log:

### **Step 1: STOP**
Immediately halt the task. Do not proceed with drift-inducing change.

### **Step 2: RAISE DRIFT WARNING**
```
⚠️ DRIFT WARNING ⚠️

Proposed change violates Canonical Architecture:

Rule Violated: [Rule #X: Description]
OR
Decision Violated: [DEC-XXX: Decision Name]

Violation: [Specific behavior that violates rule/decision]
Correct Path: [Architecture-aligned approach]

Impact: [What breaks if this drift is allowed]

Options:
1. Implement correct architecture-aligned path
2. Amend Canonical Architecture (requires full system review)
3. Request user to "Revisit decision X" (for ADL changes)
4. Cancel proposed change
```

### **Step 3: PROVIDE CORRECT PATH**
Show the architecture-aligned implementation that achieves the same goal.

### **Step 4: ASK FOR DIRECTION**
```
Should we:
A) Implement the correct architecture-aligned path?
B) Perform full system review to amend architecture?
C) Request user to explicitly revisit locked decision?
D) Cancel this change?
```

### **Step 5: ARCHITECTURE DECISION LOG SPECIAL RULE**
If violation is against a **LOCKED** Architecture Decision (DEC-001 through DEC-008):

**ADDITIONAL CHECK**:
```
⚠️ LOCKED DECISION WARNING ⚠️

This change conflicts with Architecture Decision Log entry: [DEC-XXX]
Status: LOCKED

This decision can ONLY be changed if the user explicitly says:
"Revisit decision [DEC-XXX]"

Without explicit user request to revisit, this decision is FINAL.

Proceeding with this change would violate architectural governance.
```

**Never proceed** with drift-inducing changes without explicit confirmation.
**Never re-litigate** locked decisions without user explicitly requesting revision.

---

## 📚 Integration with Existing Documentation

### **Relationship to Master Bootstrap**
- Master Bootstrap: Strategic overview and business model
- **This Document**: Technical governance and boundary enforcement
- **Usage**: Consult this before implementing ANY code changes

### **Relationship to Complete Current State**
- Complete Current State: What's working, what's next
- **This Document**: How things MUST work (regardless of current state)
- **Usage**: This is the "law", current state is "status"

### **Relationship to Engineering Handover**
- Engineering Handover: Daily tactical tasks and sprint plan
- **This Document**: Constraints within which tasks must be implemented
- **Usage**: Check this before starting each handover task

---

## 🔧 Architecture Amendment Process

If legitimate need to change Canonical Architecture:

### **Step 1: Identify Change**
Document exactly what architectural boundary needs to change and why.

### **Step 2: Full System Impact Analysis**
- What breaks in API?
- What breaks in Agent?
- What breaks in Frontend?
- What changes to contracts?
- What database migrations needed?

### **Step 3: Update ALL Affected Documents**
- This document (Canonical Architecture)
- Master Bootstrap (if strategic impact)
- Complete Current State (implementation changes)
- Engineering Handover (sprint tasks)
- Agent Spec, Operational Guide (if affected)

### **Step 4: Update ALL Systems**
- API code + tests
- Agent code + tests
- Frontend code + tests
- Database schema (migration)
- Infrastructure config

### **Step 5: Validation**
- Integration tests pass?
- No new drift introduced?
- Documentation consistent?
- All AIs briefed on change?

**Only after ALL steps** is architecture amendment complete.

---

## 🎯 Quick Reference Card

### **API Owns**
- ✅ Provisioning orchestration
- ✅ Port allocation (PortPool)
- ✅ DNS (Cloudflare + Technitium)
- ✅ Velocity registration
- ✅ IP logic (external + internal)
- ✅ Job queue (BullMQ)
- ✅ Database state

### **Agent Owns**
- ✅ Container-internal installation
- ✅ Java runtime
- ✅ Artifact downloads
- ✅ Server process management
- ✅ READY detection
- ✅ Self-repair + verification
- ✅ Filesystem layout

### **Frontend Owns**
- ✅ User interaction
- ✅ Display logic
- ✅ Client state
- ✅ Form validation

### **Never**
- ❌ Agent allocates ports
- ❌ Agent talks to DNS/Velocity/Proxmox
- ❌ API installs server files
- ❌ API executes in-container commands
- ❌ Frontend talks to agent directly
- ❌ Frontend talks to infrastructure

---

## 📋 Session Start Checklist

Before every coding session with any AI:

- [ ] Read this document's Quick Reference Card
- [ ] Identify which system you're working on (API, Agent, Frontend)
- [ ] Review that system's "Owns" list
- [ ] Check High-Risk Integration Zones if touching those areas
- [ ] Verify no drift from previous session
- [ ] Confirm contracts unchanged since last session

**If ANY doubt**: Re-read full Canonical Architecture Anchors section.

---

## ✅ Document Status

**Status**: ACTIVE - Must be consulted before all code changes
**Enforcement**: MANDATORY - Drift violations must be caught
**Authority**: CANONICAL - Overrides conflicting guidance
**Updates**: Only via Architecture Amendment Process

---

🛡️ **This document prevents architectural drift. Violate at your own risk.**