zlh-grind/PORTAL_MIGRATION.md

4.5 KiB

ZeroLagHub Portal Migration (v2)

This document tracks the migration from the legacy portal model to the ZLH-native Portal v2.


Migration Summary

The portal is being rebuilt to support heterogeneous workloads:

  • Game servers (Minecraft initially)
  • Development servers (LXC-based)
  • Future non-game services

This required abandoning several legacy assumptions.


Key Architectural Shifts

1. Pterodactyl is no longer the control plane

  • No Docker-centric lifecycle assumptions
  • No monolithic server controllers
  • No HUD-style control surface

2. Agent-first runtime model

  • Each server/LXC runs a ZLH Agent
  • Agent is authoritative for:
    • runtime state
    • service health
    • console output
  • API v2 brokers access, not execution

3. Dashboard redesign (Completed)

Dashboard is now:

  • Read-only
  • Awareness-focused
  • Non-operational

Features:

  • System Health indicator (frontend ↔ backend connectivity)
  • Notices panel with expandable timeline
  • Resource summaries (no controls)

4. Servers page redesign (In progress)

Servers page now:

  • Groups servers by type (GAME / DEV)
  • Uses expandable cards
  • Collapsed cards show:
    • status
    • uptime
    • identity
  • Expanded cards show:
    • runtime context
    • metadata
    • escalation action

Only action exposed:

  • System View (observation-first)

No start/stop/restart bulk actions exist.


Explicitly Removed Concepts

  • "Start All / Stop All / Restart All"
  • HUD-style control buttons
  • Console buttons on dashboard
  • AWS-style terminal metaphors

These are intentional removals.


Migration Status

  • Auth v2: Complete
  • Dashboard UX: Locked
  • Servers page UX: 🔄 Active
  • System View page: Next
  • Billing integration: ⏸ Deferred

Live Status Model (Finalized)

The portal consumes aggregated live state from the API. It does not directly query agents, Proxmox, or exporters.

Status Layers

Host / Agent Health

  • Source: GET /health on agent
  • Cached in Redis
  • Determines host availability

Service Runtime

  • GAME:
    • Source: GET /status
    • Reflects actual game server lifecycle
  • DEV:
    • No service runtime
    • Status inferred from host availability

UI Mapping

Container Type Host Online Agent State UI Status
DEV true n/a running
DEV false n/a offline
GAME true running running
GAME true idle stopped
GAME false n/a offline

Notes

  • "Idle" does not mean broken
  • Offline means host unreachable
  • UI refresh reflects Redis state, not instant agent changes

This model intentionally mirrors Pterodactyl semantics.


Console Responsibility Split (Authoritative)

Agent

  • Owns PTY lifecycle
  • Owns process execution
  • Owns security boundary
  • Owns WebSocket console endpoint

API

  • Authenticates console access
  • Provides server metadata (vmid, type)
  • Does NOT proxy PTY traffic

Portal

  • Renders terminal UI
  • Sends raw input bytes
  • Displays streamed output
  • Must not assume process state

This split is intentional and enforced to prevent drift.


Architectural Boundaries (CRITICAL)

What Frontend MUST NOT Do

Never Call Agents Directly

  • Frontend cannot reach container IPs
  • Frontend has no network path to agents
  • All agent access flows through API
  • This is non-negotiable architecture

Why This Matters

  • Container IPs are internal-only (10.x network)
  • No CORS headers on agents (they're not web services)
  • Direct calls would fail and break the security model
  • API enforces auth, rate limits, and access control

Correct Flow

User Action → Frontend → API → Agent → Response

Incorrect Flow (FORBIDDEN)

User Action → Frontend → Agent (FAILS - no network path)

What Can Break This

AI Coding Tools

  • May suggest "quick fixes" that call agents directly
  • May treat agents as HTTP APIs with CORS
  • May generate code that "just works" in wrong way

Convenience Changes

  • Adding CORS headers to agents (never do this)
  • Exposing agent ports through proxy (breaks security)
  • Creating frontend → agent shortcuts (breaks architecture)

Enforcement

If a change violates these boundaries:

  • The change must be reverted
  • The documentation takes precedence
  • AI tools must be corrected

These constraints override convenience.