2.2 KiB
Monitoring Codex
This folder tracks the ZeroLagHub monitoring and observability workstream.
Monitoring implementation lives across infrastructure, API, Agent, Prometheus, Grafana, Alloy, and future log collection. This folder is the coordination layer for the monitoring contract, launch-readiness status, and validation plan.
Files
CURRENT_STATE.md- observed monitoring state and launch-readiness postureCONTRACT.md- intended monitoring/discovery/label contractOPEN_ITEMS.md- active monitoring blockers and follow-up workVALIDATION.md- smoke-test and acceptance checklist
Ownership boundary
zlh-grind tracks coordination and decisions only. Implementation belongs in the relevant source/config locations:
zpack-apifor monitoring discovery endpoints, health endpoints, app metrics, and auth boundarieszlh-agentfor Alloy config/labels, structured logs, and container-local telemetry behavior- monitoring host configuration for Prometheus, Grafana, Alloy, dashboards, firewall/bind policy, and file_sd sync
- infrastructure layer for OPNsense/PBS monitoring exceptions
Current launch posture
Core lifecycle monitoring is launch-ready for game/dev add-remove visibility and basic metrics debugging.
The operational source of truth is /etc/zlh-monitor. Prometheus, Grafana provisioning, dashboard JSON, file_sd discovery output, and monitoring-host Alloy config should be treated from that layout rather than legacy default paths.
Validated launch behavior includes:
- API discovery -> monitor sync -> Prometheus file_sd for game/dev lifecycle inventory
- new game/dev containers appear as
game-dev-alloytargets - deleted game/dev containers disappear and file_sd can intentionally become
[] - container Alloy remote-writes metrics to Prometheus at
10.60.0.25:9090 - container Alloy direct scrape health works on
0.0.0.0:12345 - Grafana dashboards are provisioned from
/etc/zlh-monitor/grafana/provisioning
Remaining future work is tracked in OPEN_ITEMS.md, especially centralized logs/Loki and optional OPNsense router-only monitoring. Those are known follow-ups, not blockers to the current core lifecycle monitoring path unless launch policy changes.