zlh-grind/PORTAL_MIGRATION.md

# ZeroLagHub Portal Migration (v2)

This document tracks the migration from the legacy portal model to the **ZLH-native Portal v2**.

---

## Migration Summary

The portal is being rebuilt to support **heterogeneous workloads**:
- Game servers (Minecraft initially)
- Development servers (LXC-based)
- Future non-game services

This required abandoning several legacy assumptions.

---

## Key Architectural Shifts

### 1. Pterodactyl is no longer the control plane
- No Docker-centric lifecycle assumptions
- No monolithic server controllers
- No HUD-style control surface

---

### 2. Agent-first runtime model
- Each server/LXC runs a ZLH Agent
- Agent is authoritative for:
  - runtime state
  - service health
  - console output
- API v2 brokers access, not execution

---

### 3. Dashboard redesign (Completed)

Dashboard is now:
- Read-only
- Awareness-focused
- Non-operational

Features:
- System Health indicator (frontend ↔ backend connectivity)
- Notices panel with expandable timeline
- Resource summaries (no controls)

---

### 4. Servers page redesign (In progress)

Servers page now:
- Groups servers by type (GAME / DEV)
- Uses expandable cards
- Collapsed cards show:
  - status
  - uptime
  - identity
- Expanded cards show:
  - runtime context
  - metadata
  - escalation action

Only action exposed:
- **System View** (observation-first)

No start/stop/restart bulk actions exist.

---

## Explicitly Removed Concepts

- "Start All / Stop All / Restart All"
- HUD-style control buttons
- Console buttons on dashboard
- AWS-style terminal metaphors

These are intentional removals.

---

## Migration Status

- Auth v2: ✅ Complete
- Dashboard UX: ✅ Locked
- Servers page UX: 🔄 Active
- System View page: ⏳ Next
- Billing integration: ⏸ Deferred

---

## Live Status Model (Finalized)

The portal consumes **aggregated live state** from the API.
It does not directly query agents, Proxmox, or exporters.

### Status Layers

#### Host / Agent Health
- Source: `GET /health` on agent
- Cached in Redis
- Determines host availability

#### Service Runtime
- GAME:
  - Source: `GET /status`
  - Reflects actual game server lifecycle
- DEV:
  - No service runtime
  - Status inferred from host availability

### UI Mapping

| Container Type | Host Online | Agent State | UI Status |
|----------------|-------------|-------------|-----------|
| DEV            | true        | n/a         | running   |
| DEV            | false       | n/a         | offline   |
| GAME           | true        | running     | running   |
| GAME           | true        | idle        | stopped   |
| GAME           | false       | n/a         | offline   |

### Notes
- "Idle" does **not** mean broken
- Offline means host unreachable
- UI refresh reflects Redis state, not instant agent changes

This model intentionally mirrors Pterodactyl semantics.

---

## Console Responsibility Split (Authoritative)

### Agent
- Owns PTY lifecycle
- Owns process execution
- Owns security boundary
- Owns WebSocket console endpoint

### API
- Authenticates console access
- Provides server metadata (vmid, type)
- Does NOT proxy PTY traffic

### Portal
- Renders terminal UI
- Sends raw input bytes
- Displays streamed output
- Must not assume process state

This split is intentional and enforced to prevent drift.

---

## Architectural Boundaries (CRITICAL)

### What Frontend MUST NOT Do

**Never Call Agents Directly**
- Frontend cannot reach container IPs
- Frontend has no network path to agents
- All agent access flows through API
- This is non-negotiable architecture

**Why This Matters**
- Container IPs are internal-only (10.x network)
- No CORS headers on agents (they're not web services)
- Direct calls would fail and break the security model
- API enforces auth, rate limits, and access control

**Correct Flow**
```
User Action → Frontend → API → Agent → Response
```

**Incorrect Flow (FORBIDDEN)**
```
User Action → Frontend → Agent (FAILS - no network path)
```

### What Can Break This

**AI Coding Tools**
- May suggest "quick fixes" that call agents directly
- May treat agents as HTTP APIs with CORS
- May generate code that "just works" in wrong way

**Convenience Changes**
- Adding CORS headers to agents (never do this)
- Exposing agent ports through proxy (breaks security)
- Creating frontend → agent shortcuts (breaks architecture)

### Enforcement

If a change violates these boundaries:
- The change must be reverted
- The documentation takes precedence
- AI tools must be corrected

These constraints override convenience.