38 lines
2.2 KiB
Markdown
38 lines
2.2 KiB
Markdown
# Monitoring Codex
|
|
|
|
This folder tracks the ZeroLagHub monitoring and observability workstream.
|
|
|
|
Monitoring implementation lives across infrastructure, API, Agent, Prometheus, Grafana, Alloy, and future log collection. This folder is the coordination layer for the monitoring contract, launch-readiness status, and validation plan.
|
|
|
|
## Files
|
|
|
|
- `CURRENT_STATE.md` - observed monitoring state and launch-readiness posture
|
|
- `CONTRACT.md` - intended monitoring/discovery/label contract
|
|
- `OPEN_ITEMS.md` - active monitoring blockers and follow-up work
|
|
- `VALIDATION.md` - smoke-test and acceptance checklist
|
|
|
|
## Ownership boundary
|
|
|
|
`zlh-grind` tracks coordination and decisions only. Implementation belongs in the relevant source/config locations:
|
|
|
|
- `zpack-api` for monitoring discovery endpoints, health endpoints, app metrics, and auth boundaries
|
|
- `zlh-agent` for Alloy config/labels, structured logs, and container-local telemetry behavior
|
|
- monitoring host configuration for Prometheus, Grafana, Alloy, dashboards, firewall/bind policy, and file_sd sync
|
|
- infrastructure layer for OPNsense/PBS monitoring exceptions
|
|
|
|
## Current launch posture
|
|
|
|
Core lifecycle monitoring is launch-ready for game/dev add-remove visibility and basic metrics debugging.
|
|
|
|
The operational source of truth is `/etc/zlh-monitor`. Prometheus, Grafana provisioning, dashboard JSON, file_sd discovery output, and monitoring-host Alloy config should be treated from that layout rather than legacy default paths.
|
|
|
|
Validated launch behavior includes:
|
|
|
|
- API discovery -> monitor sync -> Prometheus file_sd for game/dev lifecycle inventory
|
|
- new game/dev containers appear as `game-dev-alloy` targets
|
|
- deleted game/dev containers disappear and file_sd can intentionally become `[]`
|
|
- container Alloy remote-writes metrics to Prometheus at `10.60.0.25:9090`
|
|
- container Alloy direct scrape health works on `0.0.0.0:12345`
|
|
- Grafana dashboards are provisioned from `/etc/zlh-monitor/grafana/provisioning`
|
|
|
|
Remaining future work is tracked in `OPEN_ITEMS.md`, especially centralized logs/Loki and optional OPNsense router-only monitoring. Those are known follow-ups, not blockers to the current core lifecycle monitoring path unless launch policy changes. |