zlh-grind/Session_Summaries/2026-03-10_Agent_Observability_Update.md

58 lines
960 B
Markdown

# Agent Observability Update
Date: 2026-03-10
This session improved agent observability without altering crash semantics or restart behavior.
---
## Structured Logging
Operational logs now use component tags:
- `[process]`
- `[http]`
- `[files]`
- `[console]`
- `[state]`
Each log entry includes the VMID when available.
---
## Crash Metadata
The agent now records and exposes via `/status`:
- `lastCrashTime`
- `lastCrashExitCode`
- `lastCrashSignal`
- `lastCrashUptimeSeconds`
- `lastCrashLogTail`
---
## Readiness Timeout
A constant was introduced:
```go
const ReadinessTimeout = 60 * time.Second
```
This centralizes the readiness policy and allows tuning without searching for hardcoded values.
---
## Crash Semantics
Crash classification unchanged:
```
unexpected exit → crashed
intentional stop → idle
readiness failure → error
```
This ensures crash metrics remain accurate and are not inflated by intentional operations.