diff --git a/Codex/Monitoring/CONTRACT.md b/Codex/Monitoring/CONTRACT.md index a2c2f0c..36a9af4 100644 --- a/Codex/Monitoring/CONTRACT.md +++ b/Codex/Monitoring/CONTRACT.md @@ -148,7 +148,8 @@ Agent/container-side monitoring responsibilities: - keep local Alloy HTTP bound to loopback unless direct scrape health is intentionally adopted - write or refresh Alloy config after `POST /config` - keep labels minimal and consistent with dashboards/discovery -- eventually emit structured logs suitable for Loki/Alloy log ingestion +- eventually ship selected container logs through Alloy to centralized Loki +- do not run Loki itself inside each game/dev container ## Monitoring host responsibilities @@ -162,6 +163,7 @@ The monitoring host must: - alert on missing/stale remote-write metrics by VMID/type - alert on API discovery endpoint failure - store bearer tokens with tight permissions or systemd credentials +- host or reach the centralized Loki service once log collection is enabled ## Grafana responsibilities @@ -173,11 +175,33 @@ Grafana must have: - dev container dashboard installed - dashboard queries aligned with the canonical label contract - dashboards based on remote-write metric freshness for game/dev container health unless direct scrape health is intentionally enabled +- Loki datasource installed once centralized logs are enabled Dashboard JSON existing on disk is not sufficient; dashboards must be visible in Grafana. ## Logs contract +Loki should be centralized, not installed per game/dev container. + +Preferred launch direction: + +```text +container/app logs + -> container Alloy log collection + -> centralized Loki + -> Grafana Explore / dashboard links +``` + +Container Alloy should tail only selected, useful logs and attach stable labels such as: + +- `vmid` +- `instance` +- `container_type` +- `server_id` where safe/useful +- `log_source`, for example `agent`, `minecraft`, `codeserver`, `provisioning` + +Do not turn every container into a Loki server. Loki belongs on the monitoring/logging side as shared infrastructure. Containers should run Alloy as the lightweight collector/shipper. + Launch-ready debugging requires centralized logs for at least: - API application logs @@ -189,7 +213,17 @@ Launch-ready debugging requires centralized logs for at least: - DNS/edge publish and cleanup actions - monitoring discovery sync -Preferred direction: Loki or Alloy log ingestion. Until this exists, launch-debug readiness remains incomplete. +Initial container log candidates: + +- Agent service logs +- Minecraft/server stdout logs where useful +- code-server logs for dev containers +- provisioning/config application logs +- backup/restore operation logs + +Avoid collecting noisy or sensitive logs by default. Review anything that may include customer secrets, tokens, console input, file contents, or environment variables before shipping to Loki. + +Until centralized Loki/Alloy log shipping exists, launch-debug readiness remains incomplete. ## Security contract @@ -201,3 +235,4 @@ Required before launch: - API monitoring endpoints fail closed without token - no monitoring bearer token in world-readable files - no token leakage in labels, target URLs, dashboards, or logs +- Loki, once added, is not publicly reachable and accepts writes only from intended Alloy collectors or trusted internal paths