Add stabilization session findings and Fabric readiness timing root cause

This commit is contained in:
jester 2026-04-07 12:14:21 +00:00
parent 6d16e3e00e
commit f55dfac860

View File

@ -0,0 +1,72 @@
# ZeroLagHub System Stabilization & Fabric Proxy Findings
## Current State
Following API and frontend revisions, the system is now stable and deterministic:
- Backend lifecycle handling hardened — idempotent create/delete/start/stop
- Redis and database verified healthy
- Duplicate server creation traced to frontend (not API)
- Console routing corrected (browser → API → agent)
- Internal domain routing removed in favor of explicit IP-based communication
---
## Key Architectural Improvements
- Control plane now uses direct IP-based service communication
- Data plane remains domain-based (Traefik)
- Velocity rehydration now uses DB + Redis instead of Proxmox live state
- API now acts as the central authority for routing, console, and orchestration
---
## Fabric / Velocity Issue — Root Cause
The remaining issue with Fabric servers is NOT related to:
- Fabric API version
- FabricProxy-Lite version
- Mod installation
The issue is caused by **timing of server readiness vs Velocity registration**:
- Server is registered with Velocity before it is fully ready to accept proxy traffic
- This results in "proxy starting" errors until Velocity is restarted
- After restart, the server is already fully initialized and works correctly
**Conclusion:** The system lacks a reliable readiness signal for Fabric-based servers.
---
## Required Fix
Agent (or API) must delay Velocity registration until true readiness is confirmed.
### Recommended approach
1. Pre-seed Fabric API and FabricProxy config before first run ✅ (already done)
2. Do not rely solely on "Done" log output
3. Introduce readiness gating — one of:
- Short delay buffer after startup
- TCP port readiness check (port 25565 accepting connections)
- Log-based readiness confirmation (watch for specific "done" log line)
TCP port check is the most reliable — if the port is accepting connections, the server is ready. This is already what the agent's probe does. The issue is likely that Velocity registration happens before the probe confirms readiness.
### Fix location
Agent-level orchestration change — do not register with Velocity until the readiness probe returns success.
Avoid modifying Velocity itself — the issue is orchestration timing, not proxy configuration.
---
## Summary
| Issue | Status |
|-------|--------|
| Duplicate server creation | ✅ Fixed — was frontend, not API |
| DB/Redis state drift | ✅ Resolved — verified healthy |
| Console routing | ✅ Fixed |
| Internal DNS timing | ✅ Resolved — replaced with IP env vars |
| Fabric readiness timing | 🔧 Pending — readiness gating needed in agent |