jester/zlh-grind

Fork 0

jester b03836057f docs: add architectural boundaries to prevent frontend-agent drift

2026-01-18 23:57:55 +00:00

4.5 KiB

Raw Blame History

ZeroLagHub Portal Migration (v2)

This document tracks the migration from the legacy portal model to the ZLH-native Portal v2.

Migration Summary

The portal is being rebuilt to support heterogeneous workloads:

Game servers (Minecraft initially)
Development servers (LXC-based)
Future non-game services

This required abandoning several legacy assumptions.

Key Architectural Shifts

1. Pterodactyl is no longer the control plane

No Docker-centric lifecycle assumptions
No monolithic server controllers
No HUD-style control surface

2. Agent-first runtime model

Each server/LXC runs a ZLH Agent
Agent is authoritative for:
- runtime state
- service health
- console output
API v2 brokers access, not execution

3. Dashboard redesign (Completed)

Dashboard is now:

Read-only
Awareness-focused
Non-operational

Features:

System Health indicator (frontend ↔ backend connectivity)
Notices panel with expandable timeline
Resource summaries (no controls)

4. Servers page redesign (In progress)

Servers page now:

Groups servers by type (GAME / DEV)
Uses expandable cards
Collapsed cards show:
- status
- uptime
- identity
Expanded cards show:
- runtime context
- metadata
- escalation action

Only action exposed:

System View (observation-first)

No start/stop/restart bulk actions exist.

Explicitly Removed Concepts

"Start All / Stop All / Restart All"
HUD-style control buttons
Console buttons on dashboard
AWS-style terminal metaphors

These are intentional removals.

Migration Status

Auth v2: ✅ Complete
Dashboard UX: ✅ Locked
Servers page UX: 🔄 Active
System View page: ⏳ Next
Billing integration: ⏸ Deferred

Live Status Model (Finalized)

The portal consumes aggregated live state from the API. It does not directly query agents, Proxmox, or exporters.

Status Layers

Host / Agent Health

Source: GET /health on agent
Cached in Redis
Determines host availability

Service Runtime

GAME:
- Source: GET /status
- Reflects actual game server lifecycle
DEV:
- No service runtime
- Status inferred from host availability

UI Mapping

Container Type	Host Online	Agent State	UI Status
DEV	true	n/a	running
DEV	false	n/a	offline
GAME	true	running	running
GAME	true	idle	stopped
GAME	false	n/a	offline

Notes

"Idle" does not mean broken
Offline means host unreachable
UI refresh reflects Redis state, not instant agent changes

This model intentionally mirrors Pterodactyl semantics.

Console Responsibility Split (Authoritative)

Agent

Owns PTY lifecycle
Owns process execution
Owns security boundary
Owns WebSocket console endpoint

API

Authenticates console access
Provides server metadata (vmid, type)
Does NOT proxy PTY traffic

Portal

Renders terminal UI
Sends raw input bytes
Displays streamed output
Must not assume process state

This split is intentional and enforced to prevent drift.

Architectural Boundaries (CRITICAL)

What Frontend MUST NOT Do

Never Call Agents Directly

Frontend cannot reach container IPs
Frontend has no network path to agents
All agent access flows through API
This is non-negotiable architecture

Why This Matters

Container IPs are internal-only (10.x network)
No CORS headers on agents (they're not web services)
Direct calls would fail and break the security model
API enforces auth, rate limits, and access control

Correct Flow

User Action → Frontend → API → Agent → Response

Incorrect Flow (FORBIDDEN)

User Action → Frontend → Agent (FAILS - no network path)

What Can Break This

AI Coding Tools

May suggest "quick fixes" that call agents directly
May treat agents as HTTP APIs with CORS
May generate code that "just works" in wrong way

Convenience Changes

Adding CORS headers to agents (never do this)
Exposing agent ports through proxy (breaks security)
Creating frontend → agent shortcuts (breaks architecture)

Enforcement

If a change violates these boundaries:

The change must be reverted
The documentation takes precedence
AI tools must be corrected

These constraints override convenience.

4.5 KiB Raw Blame History