zlh-grind/Session_Summaries/2026-03-10_Agent_Observability_Update.md

960 B

Agent Observability Update

Date: 2026-03-10

This session improved agent observability without altering crash semantics or restart behavior.


Structured Logging

Operational logs now use component tags:

  • [process]
  • [http]
  • [files]
  • [console]
  • [state]

Each log entry includes the VMID when available.


Crash Metadata

The agent now records and exposes via /status:

  • lastCrashTime
  • lastCrashExitCode
  • lastCrashSignal
  • lastCrashUptimeSeconds
  • lastCrashLogTail

Readiness Timeout

A constant was introduced:

const ReadinessTimeout = 60 * time.Second

This centralizes the readiness policy and allows tuning without searching for hardcoded values.


Crash Semantics

Crash classification unchanged:

unexpected exit  →  crashed
intentional stop →  idle
readiness failure → error

This ensures crash metrics remain accurate and are not inflated by intentional operations.