Monitoring — Validation Checklist

Use this checklist to decide whether launch monitoring is ready.

For each item record:

Security/access validation

Prometheus is not reachable from untrusted networks.
Grafana is not reachable from untrusted networks except through intended authenticated/admin path.
node_exporter is not reachable from untrusted networks.
UFW/nftables or equivalent host/network policy restricts monitoring ports.
/sd/exporters returns 401 without token.
/monitoring/game-dev returns 401 without token.
bearer token is not present in labels, scrape URLs, dashboard JSON, or logs.
bearer token files have tight permissions or use systemd credentials.

authenticated /sd/exporters returns current active game/dev targets.
authenticated /monitoring/game-dev returns current active game/dev containers.
file_sd output is updated after sync.
file_sd output is not empty unless there truly are no active game/dev containers.
stale VMIDs are absent from file_sd after deletion/sync.
no old down target remains without an explicit reason.

Create a test game container and verify:

Create a test dev container and verify:

API host/system metrics are present.
API application health target exists or equivalent health metric exists.
API discovery endpoint failures are visible in Prometheus or logs.
API create/provision/delete/backup/restore failures are visible in centralized logs or accepted as a launch gap.

Monitoring can be considered launch-ready only when:

monitoring endpoints are not publicly exposed
discovery add/remove works for game and dev containers
stale file_sd entries do not persist after successful sync
dashboards are installed and return useful data
API/app health is visible
logs are centralized or the missing log surface is explicitly accepted as a launch risk
discovery/auth token handling is locked down