201 lines
4.5 KiB
Markdown
201 lines
4.5 KiB
Markdown
# ZeroLagHub Portal Migration (v2)
|
|
|
|
This document tracks the migration from the legacy portal model to the **ZLH-native Portal v2**.
|
|
|
|
---
|
|
|
|
## Migration Summary
|
|
|
|
The portal is being rebuilt to support **heterogeneous workloads**:
|
|
- Game servers (Minecraft initially)
|
|
- Development servers (LXC-based)
|
|
- Future non-game services
|
|
|
|
This required abandoning several legacy assumptions.
|
|
|
|
---
|
|
|
|
## Key Architectural Shifts
|
|
|
|
### 1. Pterodactyl is no longer the control plane
|
|
- No Docker-centric lifecycle assumptions
|
|
- No monolithic server controllers
|
|
- No HUD-style control surface
|
|
|
|
---
|
|
|
|
### 2. Agent-first runtime model
|
|
- Each server/LXC runs a ZLH Agent
|
|
- Agent is authoritative for:
|
|
- runtime state
|
|
- service health
|
|
- console output
|
|
- API v2 brokers access, not execution
|
|
|
|
---
|
|
|
|
### 3. Dashboard redesign (Completed)
|
|
|
|
Dashboard is now:
|
|
- Read-only
|
|
- Awareness-focused
|
|
- Non-operational
|
|
|
|
Features:
|
|
- System Health indicator (frontend ↔ backend connectivity)
|
|
- Notices panel with expandable timeline
|
|
- Resource summaries (no controls)
|
|
|
|
---
|
|
|
|
### 4. Servers page redesign (In progress)
|
|
|
|
Servers page now:
|
|
- Groups servers by type (GAME / DEV)
|
|
- Uses expandable cards
|
|
- Collapsed cards show:
|
|
- status
|
|
- uptime
|
|
- identity
|
|
- Expanded cards show:
|
|
- runtime context
|
|
- metadata
|
|
- escalation action
|
|
|
|
Only action exposed:
|
|
- **System View** (observation-first)
|
|
|
|
No start/stop/restart bulk actions exist.
|
|
|
|
---
|
|
|
|
## Explicitly Removed Concepts
|
|
|
|
- "Start All / Stop All / Restart All"
|
|
- HUD-style control buttons
|
|
- Console buttons on dashboard
|
|
- AWS-style terminal metaphors
|
|
|
|
These are intentional removals.
|
|
|
|
---
|
|
|
|
## Migration Status
|
|
|
|
- Auth v2: ✅ Complete
|
|
- Dashboard UX: ✅ Locked
|
|
- Servers page UX: 🔄 Active
|
|
- System View page: ⏳ Next
|
|
- Billing integration: ⏸ Deferred
|
|
|
|
---
|
|
|
|
## Live Status Model (Finalized)
|
|
|
|
The portal consumes **aggregated live state** from the API.
|
|
It does not directly query agents, Proxmox, or exporters.
|
|
|
|
### Status Layers
|
|
|
|
#### Host / Agent Health
|
|
- Source: `GET /health` on agent
|
|
- Cached in Redis
|
|
- Determines host availability
|
|
|
|
#### Service Runtime
|
|
- GAME:
|
|
- Source: `GET /status`
|
|
- Reflects actual game server lifecycle
|
|
- DEV:
|
|
- No service runtime
|
|
- Status inferred from host availability
|
|
|
|
### UI Mapping
|
|
|
|
| Container Type | Host Online | Agent State | UI Status |
|
|
|----------------|-------------|-------------|-----------|
|
|
| DEV | true | n/a | running |
|
|
| DEV | false | n/a | offline |
|
|
| GAME | true | running | running |
|
|
| GAME | true | idle | stopped |
|
|
| GAME | false | n/a | offline |
|
|
|
|
### Notes
|
|
- "Idle" does **not** mean broken
|
|
- Offline means host unreachable
|
|
- UI refresh reflects Redis state, not instant agent changes
|
|
|
|
This model intentionally mirrors Pterodactyl semantics.
|
|
|
|
---
|
|
|
|
## Console Responsibility Split (Authoritative)
|
|
|
|
### Agent
|
|
- Owns PTY lifecycle
|
|
- Owns process execution
|
|
- Owns security boundary
|
|
- Owns WebSocket console endpoint
|
|
|
|
### API
|
|
- Authenticates console access
|
|
- Provides server metadata (vmid, type)
|
|
- Does NOT proxy PTY traffic
|
|
|
|
### Portal
|
|
- Renders terminal UI
|
|
- Sends raw input bytes
|
|
- Displays streamed output
|
|
- Must not assume process state
|
|
|
|
This split is intentional and enforced to prevent drift.
|
|
|
|
---
|
|
|
|
## Architectural Boundaries (CRITICAL)
|
|
|
|
### What Frontend MUST NOT Do
|
|
|
|
**Never Call Agents Directly**
|
|
- Frontend cannot reach container IPs
|
|
- Frontend has no network path to agents
|
|
- All agent access flows through API
|
|
- This is non-negotiable architecture
|
|
|
|
**Why This Matters**
|
|
- Container IPs are internal-only (10.x network)
|
|
- No CORS headers on agents (they're not web services)
|
|
- Direct calls would fail and break the security model
|
|
- API enforces auth, rate limits, and access control
|
|
|
|
**Correct Flow**
|
|
```
|
|
User Action → Frontend → API → Agent → Response
|
|
```
|
|
|
|
**Incorrect Flow (FORBIDDEN)**
|
|
```
|
|
User Action → Frontend → Agent (FAILS - no network path)
|
|
```
|
|
|
|
### What Can Break This
|
|
|
|
**AI Coding Tools**
|
|
- May suggest "quick fixes" that call agents directly
|
|
- May treat agents as HTTP APIs with CORS
|
|
- May generate code that "just works" in wrong way
|
|
|
|
**Convenience Changes**
|
|
- Adding CORS headers to agents (never do this)
|
|
- Exposing agent ports through proxy (breaks security)
|
|
- Creating frontend → agent shortcuts (breaks architecture)
|
|
|
|
### Enforcement
|
|
|
|
If a change violates these boundaries:
|
|
- The change must be reverted
|
|
- The documentation takes precedence
|
|
- AI tools must be corrected
|
|
|
|
These constraints override convenience.
|