847 lines
26 KiB
Markdown
847 lines
26 KiB
Markdown
# 🛡️ ZeroLagHub Cross-Project Tracker & Drift Prevention System
|
|
|
|
**Last Updated**: December 7, 2025
|
|
**Version**: 1.0 (Canonical Architecture Enforcement)
|
|
**Status**: ACTIVE - Must Be Consulted Before All Code Changes
|
|
|
|
---
|
|
|
|
## 🎯 Document Purpose
|
|
|
|
This document establishes **architectural boundaries** across the three major ZeroLagHub systems to prevent drift, confusion, and broken contracts when switching between contexts or AI assistants.
|
|
|
|
**Critical Insight**: Most project failures come from **gradual architectural erosion**, not sudden breaking changes. This document prevents that.
|
|
|
|
---
|
|
|
|
## 📊 The Three Systems (Canonical Ownership)
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ NODE.JS API v2 │
|
|
│ (Orchestration Engine) │
|
|
│ │
|
|
│ Owns: VMID allocation, port management, DNS publishing, │
|
|
│ Velocity registration, job queue, database state │
|
|
│ │
|
|
│ Speaks To: Proxmox, Cloudflare, Technitium, Velocity, │
|
|
│ MariaDB, BullMQ, Go Agent (HTTP) │
|
|
│ │
|
|
│ Never Touches: Container filesystem, game server files, │
|
|
│ Java installation, artifact downloads │
|
|
└─────────────────────┬───────────────────────────────────────┘
|
|
│
|
|
│ HTTP Contract
|
|
│ POST /config, /start, /stop
|
|
│ GET /status, /health
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ GO AGENT │
|
|
│ (Container-Internal Manager) │
|
|
│ │
|
|
│ Owns: Server installation, Java runtime, artifact │
|
|
│ downloads, process management, READY detection, │
|
|
│ filesystem layout, verification + self-repair │
|
|
│ │
|
|
│ Speaks To: Local filesystem, game server process, │
|
|
│ API (status updates via HTTP polling) │
|
|
│ │
|
|
│ Never Touches: Proxmox, DNS, Cloudflare, Velocity, │
|
|
│ port allocation, VMID selection │
|
|
└─────────────────────┬───────────────────────────────────────┘
|
|
│
|
|
│ Status Polling
|
|
│ Agent reports state
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ NEXT.JS FRONTEND │
|
|
│ (Customer + Admin UI) │
|
|
│ │
|
|
│ Owns: User interaction, form validation, display logic, │
|
|
│ client-side state, UI components │
|
|
│ │
|
|
│ Speaks To: API v2 only (REST + WebSocket when added) │
|
|
│ │
|
|
│ Never Touches: Proxmox, Go Agent, DNS, Velocity, │
|
|
│ Cloudflare, direct container access │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 🗺️ System Ownership Matrix (CANONICAL)
|
|
|
|
| Area | Node.js API v2 | Go Agent | Frontend |
|
|
|------|----------------|----------|----------|
|
|
| **Provisioning Orchestration** | ✅ OWNER (allocates VMID, ports, builds LXC config) | ❌ Executes inside container | ❌ Triggers only |
|
|
| **Template Selection** | ✅ OWNER (selects template, passes config) | ❌ Template contains agent | ❌ Displays options |
|
|
| **Server Installation** | ❌ Never | ✅ OWNER (Java, artifacts, validation) | ❌ Displays results |
|
|
| **Runtime Control** | ✅ OWNER (sends commands) | ✅ OWNER (executes commands) | ❌ UI only |
|
|
| **DNS (Cloudflare + Technitium)** | ✅ OWNER (creates, deletes, tracks IDs) | ❌ Never | ❌ Displays info |
|
|
| **Velocity Registration** | ✅ OWNER (registers, deregisters) | ❌ Never | ❌ Displays status |
|
|
| **IP Logic** | ✅ OWNER (external + internal IPs) | ❌ Sees container IP only | ❌ Displays final |
|
|
| **Port Allocation** | ✅ OWNER (PortPool DB management) | ❌ Receives assignments | ❌ Displays ports |
|
|
| **Monitoring** | ✅ OWNER (collects metrics) | ✅ OWNER (exposes /health) | ❌ Displays data |
|
|
| **Error Handling** | ✅ OWNER (BullMQ jobs, retries) | ❌ Local output only | ❌ User notifications |
|
|
|
|
---
|
|
|
|
## 🔄 API ↔ Agent Contract (IMMUTABLE)
|
|
|
|
### **API → Agent Endpoints**
|
|
|
|
```
|
|
POST /config
|
|
├─ Payload: {
|
|
│ game: "minecraft",
|
|
│ variant: "paper",
|
|
│ version: "1.21.3",
|
|
│ ports: [25565, 25575],
|
|
│ memory: "4G",
|
|
│ motd: "Welcome to ZeroLagHub",
|
|
│ worldSettings: {...}
|
|
│ }
|
|
└─ Response: 200 OK
|
|
|
|
POST /start
|
|
└─ Triggers: ensureProvisioned() → StartServer()
|
|
|
|
POST /stop
|
|
└─ Triggers: Graceful shutdown via server stop command
|
|
|
|
POST /restart
|
|
└─ Triggers: stop → start sequence
|
|
|
|
GET /status
|
|
└─ Response: {
|
|
state: "RUNNING" | "INSTALLING" | "FAILED",
|
|
pid: 12345,
|
|
uptime: 3600,
|
|
lastError: null
|
|
}
|
|
|
|
GET /health
|
|
└─ Response: {
|
|
healthy: true,
|
|
java: "/usr/lib/jvm/java-21-openjdk-amd64",
|
|
variant: "paper",
|
|
ready: true
|
|
}
|
|
```
|
|
|
|
### **Agent → API Signals (via /status polling)**
|
|
|
|
```
|
|
State Machine:
|
|
INSTALLING
|
|
├─> DOWNLOADING_ARTIFACT
|
|
├─> INSTALLING_JAVA
|
|
├─> FINALIZING
|
|
├─> RUNNING
|
|
└─> FAILED | CRASHED
|
|
```
|
|
|
|
### **🚫 Forbidden Agent Behaviors**
|
|
|
|
Agent **NEVER**:
|
|
- ❌ Allocates or manages ports
|
|
- ❌ Talks to Proxmox API
|
|
- ❌ Creates DNS records
|
|
- ❌ Registers with Velocity
|
|
- ❌ Modifies LXC config
|
|
- ❌ Allocates VMIDs
|
|
- ❌ Manages templates
|
|
- ❌ Talks to Cloudflare or Technitium
|
|
|
|
**Violation Detection**: If agent code imports Proxmox client, DNS client, or port allocation logic → **DRIFT VIOLATION**
|
|
|
|
---
|
|
|
|
## 🖥️ Frontend ↔ API Contract (IMMUTABLE)
|
|
|
|
### **Frontend Allowed Endpoints**
|
|
|
|
```
|
|
Provisioning:
|
|
POST /api/containers/create
|
|
GET /api/containers/:vmid
|
|
DELETE /api/containers/:vmid
|
|
|
|
Control:
|
|
POST /api/containers/:vmid/start
|
|
POST /api/containers/:vmid/stop
|
|
POST /api/containers/:vmid/restart
|
|
|
|
Status:
|
|
GET /api/containers/:vmid/status
|
|
GET /api/containers/:vmid/logs
|
|
GET /api/containers/:vmid/stats
|
|
|
|
Discovery:
|
|
GET /api/templates
|
|
GET /api/containers (list user's containers)
|
|
|
|
Read-Only Info:
|
|
GET /api/dns/:hostname (display only)
|
|
GET /api/velocity/status (display only)
|
|
```
|
|
|
|
### **🚫 Forbidden Frontend Behaviors**
|
|
|
|
Frontend **NEVER**:
|
|
- ❌ Talks directly to Go Agent
|
|
- ❌ Calls Proxmox API
|
|
- ❌ Creates DNS records
|
|
- ❌ Registers with Velocity
|
|
- ❌ Allocates ports
|
|
- ❌ Executes container commands
|
|
- ❌ Accesses MariaDB directly
|
|
|
|
**Violation Detection**: If frontend code imports Proxmox client, agent HTTP client (except via API), or database client → **DRIFT VIOLATION**
|
|
|
|
---
|
|
|
|
## 🚨 Drift Detection Rules (ACTIVE ENFORCEMENT)
|
|
|
|
### **Rule #1: Provisioning Ownership**
|
|
|
|
**Violation**: Agent asked to allocate ports, choose templates, create DNS, call Proxmox, manage VMIDs
|
|
|
|
**Correct Path**: API owns ALL provisioning orchestration
|
|
|
|
**Example Violation**:
|
|
```go
|
|
// WRONG - Agent should NEVER do this
|
|
func (a *Agent) allocatePorts() ([]int, error) {
|
|
// ... port allocation logic
|
|
}
|
|
```
|
|
|
|
**Correct Pattern**:
|
|
```javascript
|
|
// RIGHT - API allocates, agent receives
|
|
async function provisionInstance(game, variant) {
|
|
const ports = await portAllocator.allocate(vmid);
|
|
await agent.postConfig({ ports, game, variant });
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### **Rule #2: Artifact Ownership**
|
|
|
|
**Violation**: API asked to install Java, download server.jar, run installers
|
|
|
|
**Correct Path**: Agent owns ALL in-container installation
|
|
|
|
**Example Violation**:
|
|
```javascript
|
|
// WRONG - API should NEVER do this
|
|
async function installMinecraft(vmid, version) {
|
|
await proxmox.exec(vmid, `wget https://...`);
|
|
await proxmox.exec(vmid, `java -jar installer.jar`);
|
|
}
|
|
```
|
|
|
|
**Correct Pattern**:
|
|
```go
|
|
// RIGHT - Agent handles installation
|
|
func (p *Provisioner) ProvisionAll(cfg Config) error {
|
|
downloadArtifact(cfg.Variant, cfg.Version)
|
|
installJavaRuntime(cfg.Version)
|
|
verifyInstallation()
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### **Rule #3: Direct Container Access**
|
|
|
|
**Violation**: Frontend or API wants to exec commands directly into container
|
|
|
|
**Correct Path**: Agent owns container execution layer
|
|
|
|
**Example Violation**:
|
|
```typescript
|
|
// WRONG - Frontend should NEVER do this
|
|
async function restartServer(vmid: number) {
|
|
const ssh = new SSHClient();
|
|
await ssh.connect(containerIP);
|
|
await ssh.exec('systemctl restart minecraft');
|
|
}
|
|
```
|
|
|
|
**Correct Pattern**:
|
|
```typescript
|
|
// RIGHT - Frontend talks to API, API talks to agent
|
|
async function restartServer(vmid: number) {
|
|
await api.post(`/containers/${vmid}/restart`);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### **Rule #4: Networking Responsibilities**
|
|
|
|
**Violation**: Agent asked to select public vs internal IPs, decide DNS zones
|
|
|
|
**Correct Path**: API owns dual-IP logic (Cloudflare external, Technitium internal)
|
|
|
|
**Example Violation**:
|
|
```go
|
|
// WRONG - Agent should NEVER decide this
|
|
func (a *Agent) determinePublicIP() string {
|
|
if a.needsCloudflare() {
|
|
return "139.64.165.248"
|
|
}
|
|
return a.containerIP
|
|
}
|
|
```
|
|
|
|
**Correct Pattern**:
|
|
```javascript
|
|
// RIGHT - API decides network topology
|
|
function determineIPs(vmid, game) {
|
|
const internalIP = `10.200.0.${vmid - 1000}`;
|
|
const externalIP = "139.64.165.248"; // Cloudflare target
|
|
const velocityIP = "10.70.0.241"; // Internal routing
|
|
|
|
return { internalIP, externalIP, velocityIP };
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### **Rule #5: Proxy Responsibilities**
|
|
|
|
**Violation**: Agent asked to register with Velocity, configure proxy routing
|
|
|
|
**Correct Path**: API owns ALL proxy integrations
|
|
|
|
**Example Violation**:
|
|
```go
|
|
// WRONG - Agent should NEVER do this
|
|
func (a *Agent) registerWithVelocity() error {
|
|
client := velocity.NewClient()
|
|
return client.Register(a.hostname, a.port)
|
|
}
|
|
```
|
|
|
|
**Correct Pattern**:
|
|
```javascript
|
|
// RIGHT - API handles Velocity registration
|
|
async function registerVelocity(vmid, hostname, internalIP) {
|
|
await velocityBridge.registerBackend({
|
|
name: hostname,
|
|
address: internalIP,
|
|
port: 25565
|
|
});
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 🔄 Context Switching Safety Workflow
|
|
|
|
### **When Moving to Node.js API Work:**
|
|
|
|
**Pre-Switch Checklist**:
|
|
- [ ] Agent contract unchanged? (POST /config, /start, /stop, GET /status)
|
|
- [ ] Database schema unchanged? (Prisma models consistent)
|
|
- [ ] LXC template IDs unchanged? (VMID 800 for game, 6000-series for dev)
|
|
- [ ] DNS/IP logic consistent? (Cloudflare external, Technitium internal)
|
|
- [ ] Port allocation logic preserved? (PortPool DB-backed)
|
|
|
|
**Common Drift Patterns**:
|
|
- ⚠️ Adding agent installation logic to API
|
|
- ⚠️ Changing agent contract without updating both sides
|
|
- ⚠️ Moving DNS logic to different service
|
|
- ⚠️ Bypassing job queue for provisioning
|
|
|
|
---
|
|
|
|
### **When Moving to Go Agent Work:**
|
|
|
|
**Pre-Switch Checklist**:
|
|
- [ ] No provisioning logic outside allowed scope (no port allocation, DNS, etc.)
|
|
- [ ] File paths remain canonical (`/opt/zlh/<game>/<variant>/world`)
|
|
- [ ] Naming conventions maintained (`server.jar`, `fabric-server.jar`, etc.)
|
|
- [ ] No external API calls (Proxmox, DNS, Velocity)
|
|
- [ ] Status states unchanged (INSTALLING, RUNNING, FAILED, etc.)
|
|
|
|
**Common Drift Patterns**:
|
|
- ⚠️ Adding port allocation to agent
|
|
- ⚠️ Making agent talk to external services
|
|
- ⚠️ Changing directory structure without API coordination
|
|
- ⚠️ Adding orchestration logic to agent
|
|
|
|
---
|
|
|
|
### **When Moving to Frontend Work:**
|
|
|
|
**Pre-Switch Checklist**:
|
|
- [ ] Only API-approved fields used in UI
|
|
- [ ] No direct agent HTTP calls
|
|
- [ ] VMID not exposed in user-facing UI
|
|
- [ ] Internal IPs not displayed to users
|
|
- [ ] All state from API, not computed locally
|
|
|
|
**Common Drift Patterns**:
|
|
- ⚠️ Adding direct agent calls from frontend
|
|
- ⚠️ Computing server state client-side
|
|
- ⚠️ Exposing internal infrastructure details
|
|
- ⚠️ Bypassing API for container control
|
|
|
|
---
|
|
|
|
## 🚨 High-Risk Integration Zones (GUARDED)
|
|
|
|
These areas have historically caused drift across sessions:
|
|
|
|
### **1. Forge / NeoForge Installation Logic**
|
|
- **Risk**: Agent vs API confusion on who handles `run.sh` patching
|
|
- **Guard**: Agent owns ALL Forge installation, API just passes config
|
|
- **Test**: Can provision Forge 1.21.3 without API filesystem access?
|
|
|
|
### **2. Cloudflare SRV Deletion**
|
|
- **Risk**: Case sensitivity, subdomain normalization, record ID tracking
|
|
- **Guard**: API stores Cloudflare record IDs in EdgeState, deletes by ID
|
|
- **Test**: Create → Delete → Recreate same hostname without orphans?
|
|
|
|
### **3. Technitium DNS Zone Mismatch**
|
|
- **Risk**: Wrong zone selection, duplicate records
|
|
- **Guard**: API hardcodes zone as `zpack.zerolaghub.com`, validates before creation
|
|
- **Test**: No records created in wrong zones?
|
|
|
|
### **4. Velocity Registration Order**
|
|
- **Risk**: Registering before server ready, deregistering incorrectly
|
|
- **Guard**: API waits for agent RUNNING state, then registers Velocity
|
|
- **Test**: Player connection works immediately after provisioning complete?
|
|
|
|
### **5. PortPool Commit Logic**
|
|
- **Risk**: Race conditions, double-allocation, uncommitted ports
|
|
- **Guard**: API allocates → provisions → commits (rollback on failure)
|
|
- **Test**: Concurrent provisions don't collide on ports?
|
|
|
|
### **6. Agent READY Detection**
|
|
- **Risk**: False negatives, false positives, variant-specific patterns
|
|
- **Guard**: Agent uses variant-aware log parsing, multiple confirmation lines
|
|
- **Test**: All 6 variants correctly detect READY state?
|
|
|
|
### **7. Server Start-Up False Negatives**
|
|
- **Risk**: Timeout too short, log parsing too strict
|
|
- **Guard**: Agent increases timeout for Forge (90s), multiple log patterns
|
|
- **Test**: Forge installer completes without false failure?
|
|
|
|
### **8. IP Selection Logic**
|
|
- **Risk**: Confusing external (Cloudflare) vs internal (Velocity/Technitium) IPs
|
|
- **Guard**: API clearly separates: externalIP (139.64.165.248), internalIP (10.200.0.X), velocityIP (10.70.0.241)
|
|
- **Test**: DNS points to correct IPs, Velocity routes to correct internal IP?
|
|
|
|
---
|
|
|
|
## 📋 Architecture Decision Log (LOCKED) ⭐ NEW
|
|
|
|
**Purpose**: Records finalized architectural decisions that **must not be re-litigated** unless explicitly requested.
|
|
|
|
**Status**: LOCKED - These decisions are final and cannot be changed without user explicitly saying "Revisit decision X"
|
|
|
|
---
|
|
|
|
### **DEC-001: Templates vs Go Agent (FINAL)**
|
|
|
|
**Decision**: Hybrid model
|
|
- LXC templates define **base environment only**
|
|
- Go Agent is **authoritative execution layer** inside containers
|
|
|
|
**Rationale**:
|
|
- Templates alone cannot handle multi-variant logic (Forge, NeoForge, Fabric)
|
|
- Agent enables self-repair, async provisioning, runtime control
|
|
- Hybrid provides speed + flexibility without API container access
|
|
|
|
**Applies To**: API v2, Go Agent, Frontend
|
|
**Status**: ✅ LOCKED
|
|
|
|
---
|
|
|
|
### **DEC-002: Provisioning Authority**
|
|
|
|
**Decision**: API orchestrates, Agent executes
|
|
|
|
**Rationale**:
|
|
- API has global visibility (DB, DNS, Proxmox, Velocity)
|
|
- Agent is intentionally sandboxed to container filesystem + process
|
|
|
|
**Applies To**: All systems
|
|
**Status**: ✅ LOCKED
|
|
|
|
---
|
|
|
|
### **DEC-003: DNS & Edge Publishing Ownership**
|
|
|
|
**Decision**: API-only responsibility
|
|
|
|
**Rationale**:
|
|
- Requires external credentials (Cloudflare, Technitium)
|
|
- Must correlate DB state, record IDs, reconciliation jobs
|
|
|
|
**Applies To**: API v2
|
|
**Status**: ✅ LOCKED
|
|
|
|
---
|
|
|
|
### **DEC-004: Proxy Stack**
|
|
|
|
**Decision**: Traefik + Velocity only
|
|
|
|
**Rationale**:
|
|
- Traefik for HTTP/control-plane
|
|
- Velocity for Minecraft TCP routing
|
|
- **HAProxy explicitly deprecated** for ZeroLagHub
|
|
|
|
**Applies To**: Infrastructure, API
|
|
**Status**: ✅ LOCKED
|
|
|
|
---
|
|
|
|
### **DEC-005: State Persistence**
|
|
|
|
**Decision**: MariaDB is single source of truth
|
|
|
|
**Rationale**:
|
|
- Flat files caused race conditions and drift
|
|
- DB enables reconciliation, recovery, observability
|
|
|
|
**Applies To**: API v2
|
|
**Status**: ✅ LOCKED
|
|
|
|
---
|
|
|
|
### **DEC-006: Frontend Access Model**
|
|
|
|
**Decision**: Frontend communicates with API only
|
|
|
|
**Rationale**:
|
|
- Security boundary
|
|
- Prevents leaking infrastructure details
|
|
|
|
**Applies To**: Frontend
|
|
**Status**: ✅ LOCKED
|
|
|
|
---
|
|
|
|
### **DEC-007: Architecture Enforcement Policy**
|
|
|
|
**Decision**: Drift prevention is mandatory
|
|
|
|
**Rationale**:
|
|
- Prevents oscillation between alternatives
|
|
- Preserves velocity during late-stage development
|
|
|
|
**Applies To**: All work sessions
|
|
**Status**: ✅ LOCKED
|
|
|
|
---
|
|
|
|
## 📋 Canonical Architecture Anchors (ABSOLUTE)
|
|
|
|
These rules are **immutable** unless explicitly changed with full system review:
|
|
|
|
### **Anchor #1: Orchestration**
|
|
✅ API v2 orchestrates EVERYTHING (jobs, provisioning, DNS, proxy, lifecycle)
|
|
|
|
### **Anchor #2: Container-Internal**
|
|
✅ Agent performs EVERYTHING inside container (install, start, stop, detect)
|
|
|
|
### **Anchor #3: Templates**
|
|
✅ Templates contain agent + base environment (VMID 800 for game, 6000-series for dev)
|
|
|
|
### **Anchor #4: Job Queue**
|
|
✅ BullMQ/Redis drive job system (async provisioning, retries, reconciliation)
|
|
|
|
### **Anchor #5: Database**
|
|
✅ MariaDB holds all state (PortPool, ContainerInstance, EdgeState, etc.)
|
|
|
|
### **Anchor #6: Infrastructure**
|
|
✅ Proxmox API (not Ansible) for LXC management
|
|
|
|
### **Anchor #7: Routing**
|
|
✅ Traefik (HTTP) + Velocity (Minecraft) are ONLY routing/proxy systems
|
|
|
|
### **Anchor #8: DNS**
|
|
✅ Cloudflare = authoritative public DNS
|
|
✅ Technitium = authoritative internal DNS
|
|
|
|
### **Anchor #9: Frontend Isolation**
|
|
✅ Frontend speaks ONLY to API (no direct agent, Proxmox, DNS, Velocity)
|
|
|
|
### **Anchor #10: Directory Structure**
|
|
✅ `/opt/zlh/<game>/<variant>/world` is canonical game server path
|
|
|
|
---
|
|
|
|
## 🛡️ Enforcement Policies (ACTIVE)
|
|
|
|
When future instructions conflict with Canonical Architecture or Architecture Decision Log:
|
|
|
|
### **Step 1: STOP**
|
|
Immediately halt the task. Do not proceed with drift-inducing change.
|
|
|
|
### **Step 2: RAISE DRIFT WARNING**
|
|
```
|
|
⚠️ DRIFT WARNING ⚠️
|
|
|
|
Proposed change violates Canonical Architecture:
|
|
|
|
Rule Violated: [Rule #X: Description]
|
|
OR
|
|
Decision Violated: [DEC-XXX: Decision Name]
|
|
|
|
Violation: [Specific behavior that violates rule/decision]
|
|
Correct Path: [Architecture-aligned approach]
|
|
|
|
Impact: [What breaks if this drift is allowed]
|
|
|
|
Options:
|
|
1. Implement correct architecture-aligned path
|
|
2. Amend Canonical Architecture (requires full system review)
|
|
3. Request user to "Revisit decision X" (for ADL changes)
|
|
4. Cancel proposed change
|
|
```
|
|
|
|
### **Step 3: PROVIDE CORRECT PATH**
|
|
Show the architecture-aligned implementation that achieves the same goal.
|
|
|
|
### **Step 4: ASK FOR DIRECTION**
|
|
```
|
|
Should we:
|
|
A) Implement the correct architecture-aligned path?
|
|
B) Perform full system review to amend architecture?
|
|
C) Request user to explicitly revisit locked decision?
|
|
D) Cancel this change?
|
|
```
|
|
|
|
### **Step 5: ARCHITECTURE DECISION LOG SPECIAL RULE**
|
|
If violation is against a **LOCKED** Architecture Decision (DEC-001 through DEC-007):
|
|
|
|
**ADDITIONAL CHECK**:
|
|
```
|
|
⚠️ LOCKED DECISION WARNING ⚠️
|
|
|
|
This change conflicts with Architecture Decision Log entry: [DEC-XXX]
|
|
Status: LOCKED
|
|
|
|
This decision can ONLY be changed if the user explicitly says:
|
|
"Revisit decision [DEC-XXX]"
|
|
|
|
Without explicit user request to revisit, this decision is FINAL.
|
|
|
|
Proceeding with this change would violate architectural governance.
|
|
```
|
|
|
|
**Never proceed** with drift-inducing changes without explicit confirmation.
|
|
**Never re-litigate** locked decisions without user explicitly requesting revision.
|
|
|
|
---
|
|
|
|
## 📊 Drift Detection Examples
|
|
|
|
### **Example 1: Agent Port Allocation (VIOLATION)**
|
|
|
|
**Proposed Change**:
|
|
```go
|
|
// Agent code proposal
|
|
func (a *Agent) allocatePort() int {
|
|
// Find available port...
|
|
return port
|
|
}
|
|
```
|
|
|
|
**Drift Warning**:
|
|
```
|
|
⚠️ DRIFT WARNING ⚠️
|
|
|
|
Rule Violated: Rule #1 - Provisioning Ownership
|
|
Violation: Agent attempting to allocate ports
|
|
Correct Path: API allocates ports, agent receives them via /config
|
|
|
|
Impact: Port collisions, database inconsistency, broken PortPool
|
|
|
|
Recommendation: Remove port allocation from agent, ensure API sends ports in config payload
|
|
```
|
|
|
|
---
|
|
|
|
### **Example 2: Frontend Direct Agent Call (VIOLATION)**
|
|
|
|
**Proposed Change**:
|
|
```typescript
|
|
// Frontend code proposal
|
|
async function getServerLogs(vmid: number) {
|
|
const agentURL = `http://10.200.0.${vmid}:8080/logs`;
|
|
return await fetch(agentURL);
|
|
}
|
|
```
|
|
|
|
**Drift Warning**:
|
|
```
|
|
⚠️ DRIFT WARNING ⚠️
|
|
|
|
Rule Violated: Rule #3 - Direct Container Access
|
|
Violation: Frontend bypassing API to talk to agent
|
|
Correct Path: Frontend → API → Agent (API proxies logs)
|
|
|
|
Impact: Broken frontend if container IP changes, no auth/rate limiting, security risk
|
|
|
|
Recommendation: Add GET /api/containers/:vmid/logs endpoint that proxies to agent
|
|
```
|
|
|
|
---
|
|
|
|
### **Example 3: API Installing Java (VIOLATION)**
|
|
|
|
**Proposed Change**:
|
|
```javascript
|
|
// API code proposal
|
|
async function provisionServer(vmid, game, variant) {
|
|
await proxmox.exec(vmid, 'apt-get install openjdk-21-jdk');
|
|
await proxmox.exec(vmid, 'wget https://papermc.io/...');
|
|
}
|
|
```
|
|
|
|
**Drift Warning**:
|
|
```
|
|
⚠️ DRIFT WARNING ⚠️
|
|
|
|
Rule Violated: Rule #2 - Artifact Ownership
|
|
Violation: API performing in-container installation
|
|
Correct Path: API sends config to agent, agent handles installation
|
|
|
|
Impact: Breaks agent self-repair, variant-specific logic duplicated, no verification system
|
|
|
|
Recommendation: Remove installation logic from API, ensure agent receives proper config via POST /config
|
|
```
|
|
|
|
---
|
|
|
|
## 📁 Integration with Existing Documentation
|
|
|
|
### **Relationship to Master Bootstrap**
|
|
- Master Bootstrap: Strategic overview and business model
|
|
- **This Document**: Technical governance and boundary enforcement
|
|
- **Usage**: Consult this before implementing ANY code changes
|
|
|
|
### **Relationship to Complete Current State**
|
|
- Complete Current State: What's working, what's next
|
|
- **This Document**: How things MUST work (regardless of current state)
|
|
- **Usage**: This is the "law", current state is "status"
|
|
|
|
### **Relationship to Engineering Handover**
|
|
- Engineering Handover: Daily tactical tasks and sprint plan
|
|
- **This Document**: Constraints within which tasks must be implemented
|
|
- **Usage**: Check this before starting each handover task
|
|
|
|
---
|
|
|
|
## 🔄 Architecture Amendment Process
|
|
|
|
If legitimate need to change Canonical Architecture:
|
|
|
|
### **Step 1: Identify Change**
|
|
Document exactly what architectural boundary needs to change and why.
|
|
|
|
### **Step 2: Full System Impact Analysis**
|
|
- What breaks in API?
|
|
- What breaks in Agent?
|
|
- What breaks in Frontend?
|
|
- What changes to contracts?
|
|
- What database migrations needed?
|
|
|
|
### **Step 3: Update ALL Affected Documents**
|
|
- This document (Canonical Architecture)
|
|
- Master Bootstrap (if strategic impact)
|
|
- Complete Current State (implementation changes)
|
|
- Engineering Handover (sprint tasks)
|
|
- Agent Spec, Operational Guide (if affected)
|
|
|
|
### **Step 4: Update ALL Systems**
|
|
- API code + tests
|
|
- Agent code + tests
|
|
- Frontend code + tests
|
|
- Database schema (migration)
|
|
- Infrastructure config
|
|
|
|
### **Step 5: Validation**
|
|
- Integration tests pass?
|
|
- No new drift introduced?
|
|
- Documentation consistent?
|
|
- All AIs briefed on change?
|
|
|
|
**Only after ALL steps** is architecture amendment complete.
|
|
|
|
---
|
|
|
|
## 🎯 Quick Reference Card
|
|
|
|
### **API Owns**
|
|
- ✅ Provisioning orchestration
|
|
- ✅ Port allocation (PortPool)
|
|
- ✅ DNS (Cloudflare + Technitium)
|
|
- ✅ Velocity registration
|
|
- ✅ IP logic (external + internal)
|
|
- ✅ Job queue (BullMQ)
|
|
- ✅ Database state
|
|
|
|
### **Agent Owns**
|
|
- ✅ Container-internal installation
|
|
- ✅ Java runtime
|
|
- ✅ Artifact downloads
|
|
- ✅ Server process management
|
|
- ✅ READY detection
|
|
- ✅ Self-repair + verification
|
|
- ✅ Filesystem layout
|
|
|
|
### **Frontend Owns**
|
|
- ✅ User interaction
|
|
- ✅ Display logic
|
|
- ✅ Client state
|
|
- ✅ Form validation
|
|
|
|
### **Never**
|
|
- ❌ Agent allocates ports
|
|
- ❌ Agent talks to DNS/Velocity/Proxmox
|
|
- ❌ API installs server files
|
|
- ❌ API executes in-container commands
|
|
- ❌ Frontend talks to agent directly
|
|
- ❌ Frontend talks to infrastructure
|
|
|
|
---
|
|
|
|
## 📋 Session Start Checklist
|
|
|
|
Before every coding session with any AI:
|
|
|
|
- [ ] Read this document's Quick Reference Card
|
|
- [ ] Identify which system you're working on (API, Agent, Frontend)
|
|
- [ ] Review that system's "Owns" list
|
|
- [ ] Check High-Risk Integration Zones if touching those areas
|
|
- [ ] Verify no drift from previous session
|
|
- [ ] Confirm contracts unchanged since last session
|
|
|
|
**If ANY doubt**: Re-read full Canonical Architecture Anchors section.
|
|
|
|
---
|
|
|
|
## ✅ Document Status
|
|
|
|
**Status**: ACTIVE - Must be consulted before all code changes
|
|
**Enforcement**: MANDATORY - Drift violations must be caught
|
|
**Authority**: CANONICAL - Overrides conflicting guidance
|
|
**Updates**: Only via Architecture Amendment Process
|
|
|
|
---
|
|
|
|
🛡️ **This document prevents architectural drift. Violate at your own risk.**
|