Add Agent token future-state guidance

2026-04-30 22:01:12 +00:00 · 2026-04-30 22:01:12 +00:00 · 8e520d35cc
commit 8e520d35cc
parent 32c281585f
1 changed files with 76 additions and 0 deletions
--- a/Codex/Agent/AGENT_TOKEN_FUTURE.md
+++ b/Codex/Agent/AGENT_TOKEN_FUTURE.md
@ -0,0 +1,76 @@
+# Agent Token Future State
+
+Static `ZLH_AGENT_TOKEN` is acceptable as a launch hardening layer, but it should not be treated as the final trust model.
+
+## Launch / short term
+
+For launch, keep the trust model simple and reliable:
+
+- API refuses to start in production without `ZLH_AGENT_TOKEN`.
+- Agent refuses protected requests in production without a configured `ZLH_AGENT_TOKEN`.
+- Portal never sees the Agent token.
+- API injects the token into every Agent request.
+- Protected Agent routes remain behind the token; only intentionally public endpoints such as `/health` and `/version` should be public.
+
+This closes the immediate API-to-Agent control-plane gap without adding clock, issuer, audience, or scope drift risk during final launch validation.
+
+## Longer-term target
+
+The better long-term pattern is short-lived signed API-to-Agent request tokens.
+
+In that model:
+
+1. API keeps a signing secret or private key.
+2. Agent knows only the verification secret or public key.
+3. API signs a short-lived token per Agent request, usually 30-120 seconds.
+4. Agent verifies issuer, audience, expiry, VMID, and scope before executing the request.
+
+Example claims:
+
+```json
+{
+  "iss": "zpack-api",
+  "aud": "zlh-agent",
+  "vmid": 5202,
+  "scope": "files:read",
+  "iat": 1714500000,
+  "exp": 1714500060,
+  "jti": "request-id"
+}
+```
+
+This limits blast radius if a token is exposed. A stolen short-lived `files:read` token for VMID `5202` cannot be reused later to stop another server, restore a backup, or mutate files on a different container.
+
+## Roadmap
+
+- Phase 1: static `ZLH_AGENT_TOKEN`, fail-closed in production.
+- Phase 2: token rotation and deployment validation so stale or missing tokens are caught before traffic.
+- Phase 3: short-lived API-signed request tokens for API-to-Agent calls.
+- Phase 4: optional mTLS between API and Agent for transport-level service identity.
+
+## JWT vs HMAC request signatures
+
+JWT is useful when explicit claims and public-key verification are desired.
+
+A compact HMAC request-signing scheme may be simpler for the Agent:
+
+```text
+X-ZLH-Agent-Timestamp: <unix timestamp>
+X-ZLH-Agent-Vmid: <vmid>
+X-ZLH-Agent-Scope: lifecycle:stop
+X-ZLH-Agent-Signature: HMAC(secret, method + path + timestamp + vmid + scope + bodyHash)
+```
+
+Either is better than a forever bearer token once the launch system is stable.
+
+## Not a launch blocker
+
+Do not jump directly to short-lived signed tokens before launch unless required. They add operational complexity:
+
+- clock drift
+- issuer/audience mismatch
+- token scope bugs
+- signing-key/config drift
+- harder provisioning debugging
+
+For launch, static token plus fail-closed production behavior is the correct low-risk hardening step. For scale, short-lived signed API-to-Agent tokens are the cleaner end state.