Monitoring — Open Items

Only keep unfinished monitoring/observability work here.

Active

Add centralized logs.
- No Loki currently.
- Future direction: centralized Loki as shared infrastructure.
- Containers should ship selected logs through Alloy; do not run Loki per game/dev container.
- Desired sources:
  - API application logs
  - Agent logs
  - provisioning logs
  - backup/restore logs
  - delete/teardown logs
  - Velocity/DNS/edge action logs
  - discovery sync logs
Decide whether OPNsense needs router-only monitoring.
- Current game/dev monitoring intentionally uses Alloy, not node_exporter.
- OPNsense may still need a router-only node_exporter job or another router-specific monitoring path later.
- Keep this separate from the game/dev container Alloy contract.

Add API app health scrape or equivalent application-level health signal.
Add lifecycle/debug telemetry panels or metrics for:
- ready
- connectable
- operationInProgress
- operationType
- backup/restore state
- code-server state
Standardize labels: prefer canonical container_type over mixed ctype / container_type.
Decide canonical label spelling for hostname/name/server identity.
Keep required game/dev labels minimal: vmid, instance, container_type.
Review any user/customer-identifying labels before dashboard or alert exposure.
Add runbooks for discovery sync failure and failed game/dev provisioning debugging from monitoring only.