Agent auth smoke results and remaining runtime deviations #3

Open
opened 2026-05-01 14:10:15 +00:00 by jester · 0 comments
Owner

Status

Agent fail-closed auth work has been implemented and verified.

Completed

  • internal/auth/auth.go: protected routes now return 401 when ZLH_AGENT_TOKEN is unset.
  • internal/http/agent_test.go: added route auth tests for /status, /ready, /config, lifecycle endpoints, files, backups, mods, update, metrics, and code-server endpoints.
  • Added WebSocket console unauthorized handshake test.

Verification

  • go test ./...: PASS
  • Live localhost smoke with ZLH_AGENT_TOKEN=live-token: PASS
  • Missing-token requests returned 401 for protected HTTP routes and /console/stream.
  • Tokened smoke requests reached handlers and returned expected handler-level statuses such as 200, 400, 404, or 503, but not 401.

PASS

  • /status, /ready, /config, /start, /stop, /restart
  • /game/files/*, /game/backups/*, /game/mods/*
  • /agent/update, /metrics/process
  • code-server endpoints /dev/codeserver/start, /dev/codeserver/stop, /dev/codeserver/restart
  • WebSocket console requires token and rejects unauthorized access
  • Protected endpoints fail closed without token in production
  • Backup restore creates a pre-restore checkpoint and restart path is tested

Remaining deviations

  • Automatic rollback after restore/start failure is not implemented. Current restore code returns the checkpoint but does not automatically roll back if restoreArchive or startServerReady fails.
  • Process reattachment after Agent restart is not implemented. Process state is held in memory via serverCmd/serverPTY; no persisted PID/PTY reattach path was found for game processes after Agent restart.

Notes

  • logs/backup_restore.log and state/update.json were already dirty before verification.
  • Live /agent/update smoke touched state/update.json; pre-existing dirty files were not reverted.

Next steps

  • Decide whether automatic rollback is launch-required or follow-up hardening.
  • Add persistent process awareness/reattachment design for Agent restart recovery.
  • Keep fail-closed auth tests in the permanent launch regression suite.
## Status Agent fail-closed auth work has been implemented and verified. ## Completed - `internal/auth/auth.go`: protected routes now return `401` when `ZLH_AGENT_TOKEN` is unset. - `internal/http/agent_test.go`: added route auth tests for `/status`, `/ready`, `/config`, lifecycle endpoints, files, backups, mods, update, metrics, and code-server endpoints. - Added WebSocket console unauthorized handshake test. ## Verification - `go test ./...`: PASS - Live localhost smoke with `ZLH_AGENT_TOKEN=live-token`: PASS - Missing-token requests returned `401` for protected HTTP routes and `/console/stream`. - Tokened smoke requests reached handlers and returned expected handler-level statuses such as `200`, `400`, `404`, or `503`, but not `401`. ## PASS - `/status`, `/ready`, `/config`, `/start`, `/stop`, `/restart` - `/game/files/*`, `/game/backups/*`, `/game/mods/*` - `/agent/update`, `/metrics/process` - code-server endpoints `/dev/codeserver/start`, `/dev/codeserver/stop`, `/dev/codeserver/restart` - WebSocket console requires token and rejects unauthorized access - Protected endpoints fail closed without token in production - Backup restore creates a pre-restore checkpoint and restart path is tested ## Remaining deviations - Automatic rollback after restore/start failure is not implemented. Current restore code returns the checkpoint but does not automatically roll back if `restoreArchive` or `startServerReady` fails. - Process reattachment after Agent restart is not implemented. Process state is held in memory via `serverCmd`/`serverPTY`; no persisted PID/PTY reattach path was found for game processes after Agent restart. ## Notes - `logs/backup_restore.log` and `state/update.json` were already dirty before verification. - Live `/agent/update` smoke touched `state/update.json`; pre-existing dirty files were not reverted. ## Next steps - Decide whether automatic rollback is launch-required or follow-up hardening. - Add persistent process awareness/reattachment design for Agent restart recovery. - Keep fail-closed auth tests in the permanent launch regression suite.
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: jester/zlh-agent#3
No description provided.