# 🚀 Agent Release Runbook **Applies To**: `zlh-agent` Go binary **Artifact Server**: `10.60.0.251` (`/opt/zlh/agents/`) **Agent HTTP Port**: `18888` **Last Updated**: February 22, 2026 --- ## Overview This runbook covers the full process for building, uploading, and rolling out a new `zlh-agent` release. Follow all steps in order. Do not skip the canary validation step before rolling out to remaining nodes. --- ## Step 1 — Choose Next Version Always bump the version. Never reuse an existing version number or folder. Use semantic versioning: `MAJOR.MINOR.PATCH` Example: if current is `1.0.7`, next is `1.0.8`. Check the current manifest to confirm what's already published: ```bash curl -s http://10.60.0.251:8080/agents/manifest.json | jq '.latest' ``` --- ## Step 2 — Build the Release Artifact Always build via the release script. Do **not** use `go build -o zlh-agent` directly. ```bash cd /opt/zlh-agent ./scripts/build-release.sh 1.0.8 ``` This produces the binary and `.sha256` checksum under `dist/1.0.8/`. --- ## Step 3 — Verify Built Binary Version Confirm the binary reports the correct version before uploading: ```bash timeout 2s ./dist/1.0.8/zlh-agent-linux-amd64 2>&1 | head -n 1 ``` **Expected output contains**: `starting ZeroLagHub Agent v1.0.8` If the version string is wrong, stop here. Do not upload a mismatched binary. --- ## Step 4 — Upload Release Folder to Artifact Server ```bash scp -r -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ dist/1.0.8 root@10.60.0.251:/opt/zlh/agents/versions/ ``` --- ## Step 5 — Verify Files on Artifact Server ```bash ssh root@10.60.0.251 'ls -lah /opt/zlh/agents/versions/1.0.8' ``` The directory must contain **both**: - `zlh-agent-linux-amd64` - `zlh-agent-linux-amd64.sha256` If either is missing, re-upload before proceeding. --- ## Step 6 — Update Manifest on Artifact Server Edit `/opt/zlh/agents/manifest.json` on the artifact server: ```bash ssh root@10.60.0.251 nano /opt/zlh/agents/manifest.json ``` Set the following fields: - `"latest"` → `"1.0.8"` - `"channels": { "stable": "1.0.8" }` - Add a new entry under `"artifacts"`: ```json "1.0.8": { "linux_amd64": { "binary": "versions/1.0.8/zlh-agent-linux-amd64", "sha256": "versions/1.0.8/zlh-agent-linux-amd64.sha256" } } ``` Do not remove old version entries from `artifacts` — keep the full history. --- ## Step 7 — Validate Manifest Remotely ```bash curl -s http://10.60.0.251:8080/agents/manifest.json | jq ``` Check: - `channels.stable` matches an existing key in `artifacts` - `latest` matches `channels.stable` - The new version entry has the correct paths If anything looks wrong, fix the manifest before triggering any updates. --- ## Step 8 — Canary: Trigger Update on a Test Node Pick one non-production or low-traffic node as the canary. ```bash # Trigger the update curl -s -X POST http://127.0.0.1:18888/agent/update | jq # Wait for the agent to restart sleep 3 # Confirm new version is running curl -s http://127.0.0.1:18888/version | jq ``` **Expected**: version field shows `v1.0.8` Do not proceed to Step 9 until the canary confirms the correct version. --- ## Step 9 — Verify Service Health and Logs on Canary ```bash systemctl status zlh-agent --no-pager journalctl -u zlh-agent -n 80 --no-pager ``` Look for: - Service shows `active (running)` - No crash/restart loops in logs - No unexpected errors in the first 80 log lines --- ## Step 10 — Roll Out to Remaining Nodes Trigger `/agent/update` from the API backend or orchestrator in batches. Do not broadcast to all nodes simultaneously — stagger to catch any issues early. The agent's `ZLH_AGENT_UPDATE_MODE` controls behavior: - `auto` — agent self-updates when triggered - `notify` — agent logs that an update is available but waits - `off` — agent ignores update signals --- ## Important Rules | Rule | Detail | |------|--------| | ✅ Always use build script | `./scripts/build-release.sh ` | | ✅ Always upload binary + `.sha256` as a pair | Never upload one without the other | | ✅ Validate manifest before triggering updates | `channels.stable` must match an `artifacts` key | | ✅ Canary first, fleet second | Always validate on one node before rolling out | | ❌ Never use `go build -o zlh-agent` for releases | Bypasses version embedding and checksum generation | | ❌ Never reuse a version number or path | Always bump; never overwrite an existing release folder | | ❌ Never remove old `artifacts` entries from manifest | Keep full history for rollback reference | --- ## Rollback Procedure If a release is bad, update the manifest to point `channels.stable` and `latest` back to the last known-good version. Trigger `/agent/update` on affected nodes. The old binary is still present on the artifact server under its original version folder, so no re-upload is needed for a rollback. --- ## Related - Agent endpoint specs: `Agent_Endpoint_Specifications_Phase1.md` - Agent update mode env var: `ZLH_AGENT_UPDATE_MODE` (`auto|notify|off`) - Update check interval: `ZLH_AGENT_UPDATE_INTERVAL` (default `30m`) - Artifact server location: `10.60.0.251:/opt/zlh/agents/`