Add stabilization session findings and Fabric readiness timing root cause
This commit is contained in:
parent
6d16e3e00e
commit
f55dfac860
72
SCRATCH/session-stabilization-fabric-findings.md
Normal file
72
SCRATCH/session-stabilization-fabric-findings.md
Normal file
@ -0,0 +1,72 @@
|
||||
# ZeroLagHub – System Stabilization & Fabric Proxy Findings
|
||||
|
||||
## Current State
|
||||
|
||||
Following API and frontend revisions, the system is now stable and deterministic:
|
||||
|
||||
- Backend lifecycle handling hardened — idempotent create/delete/start/stop
|
||||
- Redis and database verified healthy
|
||||
- Duplicate server creation traced to frontend (not API)
|
||||
- Console routing corrected (browser → API → agent)
|
||||
- Internal domain routing removed in favor of explicit IP-based communication
|
||||
|
||||
---
|
||||
|
||||
## Key Architectural Improvements
|
||||
|
||||
- Control plane now uses direct IP-based service communication
|
||||
- Data plane remains domain-based (Traefik)
|
||||
- Velocity rehydration now uses DB + Redis instead of Proxmox live state
|
||||
- API now acts as the central authority for routing, console, and orchestration
|
||||
|
||||
---
|
||||
|
||||
## Fabric / Velocity Issue — Root Cause
|
||||
|
||||
The remaining issue with Fabric servers is NOT related to:
|
||||
- Fabric API version
|
||||
- FabricProxy-Lite version
|
||||
- Mod installation
|
||||
|
||||
The issue is caused by **timing of server readiness vs Velocity registration**:
|
||||
|
||||
- Server is registered with Velocity before it is fully ready to accept proxy traffic
|
||||
- This results in "proxy starting" errors until Velocity is restarted
|
||||
- After restart, the server is already fully initialized and works correctly
|
||||
|
||||
**Conclusion:** The system lacks a reliable readiness signal for Fabric-based servers.
|
||||
|
||||
---
|
||||
|
||||
## Required Fix
|
||||
|
||||
Agent (or API) must delay Velocity registration until true readiness is confirmed.
|
||||
|
||||
### Recommended approach
|
||||
|
||||
1. Pre-seed Fabric API and FabricProxy config before first run ✅ (already done)
|
||||
2. Do not rely solely on "Done" log output
|
||||
3. Introduce readiness gating — one of:
|
||||
- Short delay buffer after startup
|
||||
- TCP port readiness check (port 25565 accepting connections)
|
||||
- Log-based readiness confirmation (watch for specific "done" log line)
|
||||
|
||||
TCP port check is the most reliable — if the port is accepting connections, the server is ready. This is already what the agent's probe does. The issue is likely that Velocity registration happens before the probe confirms readiness.
|
||||
|
||||
### Fix location
|
||||
|
||||
Agent-level orchestration change — do not register with Velocity until the readiness probe returns success.
|
||||
|
||||
Avoid modifying Velocity itself — the issue is orchestration timing, not proxy configuration.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Issue | Status |
|
||||
|-------|--------|
|
||||
| Duplicate server creation | ✅ Fixed — was frontend, not API |
|
||||
| DB/Redis state drift | ✅ Resolved — verified healthy |
|
||||
| Console routing | ✅ Fixed |
|
||||
| Internal DNS timing | ✅ Resolved — replaced with IP env vars |
|
||||
| Fabric readiness timing | 🔧 Pending — readiness gating needed in agent |
|
||||
Loading…
Reference in New Issue
Block a user