From 2b3a5523587a8848e14c6b651b06d3c9063eca84 Mon Sep 17 00:00:00 2001 From: jester Date: Fri, 1 May 2026 21:05:56 +0000 Subject: [PATCH] Update monitoring launch posture --- Codex/Monitoring/README.md | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/Codex/Monitoring/README.md b/Codex/Monitoring/README.md index 8b54809..ab3b519 100644 --- a/Codex/Monitoring/README.md +++ b/Codex/Monitoring/README.md @@ -6,10 +6,10 @@ Monitoring implementation lives across infrastructure, API, Agent, Prometheus, G ## Files -- `CURRENT_STATE.md` — observed monitoring state and launch-readiness posture -- `CONTRACT.md` — intended monitoring/discovery/label contract -- `OPEN_ITEMS.md` — active monitoring blockers and follow-up work -- `VALIDATION.md` — smoke-test and acceptance checklist +- `CURRENT_STATE.md` - observed monitoring state and launch-readiness posture +- `CONTRACT.md` - intended monitoring/discovery/label contract +- `OPEN_ITEMS.md` - active monitoring blockers and follow-up work +- `VALIDATION.md` - smoke-test and acceptance checklist ## Ownership boundary @@ -22,4 +22,17 @@ Monitoring implementation lives across infrastructure, API, Agent, Prometheus, G ## Current launch posture -As of the latest monitoring audit, monitoring is **not launch-ready**. Core services are running, but public exposure, stale/failing discovery, missing dashboards, and missing centralized logs block launch-debug readiness. +Core lifecycle monitoring is launch-ready for game/dev add-remove visibility and basic metrics debugging. + +The operational source of truth is `/etc/zlh-monitor`. Prometheus, Grafana provisioning, dashboard JSON, file_sd discovery output, and monitoring-host Alloy config should be treated from that layout rather than legacy default paths. + +Validated launch behavior includes: + +- API discovery -> monitor sync -> Prometheus file_sd for game/dev lifecycle inventory +- new game/dev containers appear as `game-dev-alloy` targets +- deleted game/dev containers disappear and file_sd can intentionally become `[]` +- container Alloy remote-writes metrics to Prometheus at `10.60.0.25:9090` +- container Alloy direct scrape health works on `0.0.0.0:12345` +- Grafana dashboards are provisioned from `/etc/zlh-monitor/grafana/provisioning` + +Remaining future work is tracked in `OPEN_ITEMS.md`, especially centralized logs/Loki and optional OPNsense router-only monitoring. Those are known follow-ups, not blockers to the current core lifecycle monitoring path unless launch policy changes. \ No newline at end of file