# Monitoring — Decisions Settled monitoring choices / do-not-re-litigate notes. ## Runtime source of truth `/etc/zlh-monitor` is the operational source of truth for the monitoring host. - Prometheus runtime config: `/etc/zlh-monitor/prometheus/prometheus.yml` - Prometheus service wiring: `/etc/default/prometheus` - Grafana provisioning: `/etc/zlh-monitor/grafana/provisioning` - Grafana dashboards: `/etc/zlh-monitor/grafana/dashboards` Do not treat legacy `/etc/prometheus` or default Grafana sample provisioning as authoritative once the zlh-monitor layout is in place. ## Metrics model Monitoring is not API-only. - API discovery + monitor sync + file_sd owns lifecycle inventory and add/remove validation. - Container Alloy remote-write to Prometheus is the canonical game/dev metrics path. - Prometheus listens on `10.60.0.25:9090` so Alloy remote-write from hosts/containers can reach it. - Grafana reads Prometheus for dashboards. ## Container Alloy model Alloy is the standard collector for game/dev containers. - Container Alloy listens on `0.0.0.0:12345`. - `game-dev-alloy` scrape health is valid because container Alloy is reachable on the container IP. - Container Alloy also remote-writes metrics to Prometheus. - `node_exporter` is not used in game/dev containers. Template note should reflect: ```text Alloy Metrics/UI: :12345/metrics Alloy Listen Addr: 0.0.0.0:12345 Node Exporter: not used ``` ## Discovery behavior Dynamic game/dev lifecycle discovery excludes non-lifecycle core dev systems by CIDR. Core/dev infrastructure such as `9091 / zpack-dev-velocity / 10.60.0.220` must not appear as a lifecycle dev container. Monitor sync allows empty discovery: ```text ALLOW_EMPTY="true" ``` This is intentional so deleting the last game/dev container clears: ```text /etc/zlh-monitor/prometheus/file_sd/game-dev-alloy.json -> [] ``` ## Grafana model Grafana provisions from `/etc/zlh-monitor/grafana/provisioning` and dashboards from `/etc/zlh-monitor/grafana/dashboards`. Live dashboards: - Core Services - Overview - Core Service - Detail - Game Containers - Dev Containers The old node_exporter router dashboard is archived at: ```text /etc/zlh-monitor/grafana/dashboards-archive/core-routers.json ``` ## Logging model No Loki is currently enabled. Future direction: - Loki should be centralized shared infrastructure. - Do not run Loki per game/dev container. - Containers should ship selected logs through Alloy when centralized logging is enabled. ## Platform exceptions OPNsense and PBS remain explicit platform monitoring exceptions. OPNsense may later need a router-only `node_exporter` job or another router-specific monitoring path. That should stay separate from the game/dev container Alloy contract.