Agent Token Future State

Static ZLH_AGENT_TOKEN is acceptable as a launch hardening layer, but it should not be treated as the final trust model.

Launch / short term

For launch, keep the trust model simple and reliable:

API refuses to start in production without ZLH_AGENT_TOKEN.
Agent refuses protected requests in production without a configured ZLH_AGENT_TOKEN.
Portal never sees the Agent token.
API injects the token into every Agent request.
Protected Agent routes remain behind the token; only intentionally public endpoints such as /health and /version should be public.

This closes the immediate API-to-Agent control-plane gap without adding clock, issuer, audience, or scope drift risk during final launch validation.

Longer-term target

The better long-term pattern is short-lived signed API-to-Agent request tokens.

In that model:

API keeps a signing secret or private key.
Agent knows only the verification secret or public key.
API signs a short-lived token per Agent request, usually 30-120 seconds.
Agent verifies issuer, audience, expiry, VMID, and scope before executing the request.

Example claims:

{
  "iss": "zpack-api",
  "aud": "zlh-agent",
  "vmid": 5202,
  "scope": "files:read",
  "iat": 1714500000,
  "exp": 1714500060,
  "jti": "request-id"
}

This limits blast radius if a token is exposed. A stolen short-lived files:read token for VMID 5202 cannot be reused later to stop another server, restore a backup, or mutate files on a different container.

Roadmap

Phase 1: static ZLH_AGENT_TOKEN, fail-closed in production.
Phase 2: token rotation and deployment validation so stale or missing tokens are caught before traffic.
Phase 3: short-lived API-signed request tokens for API-to-Agent calls.
Phase 4: optional mTLS between API and Agent for transport-level service identity.

JWT vs HMAC request signatures

JWT is useful when explicit claims and public-key verification are desired.

A compact HMAC request-signing scheme may be simpler for the Agent:

X-ZLH-Agent-Timestamp: <unix timestamp>
X-ZLH-Agent-Vmid: <vmid>
X-ZLH-Agent-Scope: lifecycle:stop
X-ZLH-Agent-Signature: HMAC(secret, method + path + timestamp + vmid + scope + bodyHash)

Either is better than a forever bearer token once the launch system is stable.

Not a launch blocker

Do not jump directly to short-lived signed tokens before launch unless required. They add operational complexity:

clock drift
issuer/audience mismatch
token scope bugs
signing-key/config drift
harder provisioning debugging

For launch, static token plus fail-closed production behavior is the correct low-risk hardening step. For scale, short-lived signed API-to-Agent tokens are the cleaner end state.

2.7 KiB Raw Blame History