5.1 KiB
🚀 Agent Release Runbook
Applies To: zlh-agent Go binary
Artifact Server: 10.60.0.251 (/opt/zlh/agents/)
Agent HTTP Port: 18888
Last Updated: February 22, 2026
Overview
This runbook covers the full process for building, uploading, and rolling out a new zlh-agent release. Follow all steps in order. Do not skip the canary validation step before rolling out to remaining nodes.
Step 1 — Choose Next Version
Always bump the version. Never reuse an existing version number or folder.
Use semantic versioning: MAJOR.MINOR.PATCH
Example: if current is 1.0.7, next is 1.0.8.
Check the current manifest to confirm what's already published:
curl -s http://10.60.0.251:8080/agents/manifest.json | jq '.latest'
Step 2 — Build the Release Artifact
Always build via the release script. Do not use go build -o zlh-agent directly.
cd /opt/zlh-agent
./scripts/build-release.sh 1.0.8
This produces the binary and .sha256 checksum under dist/1.0.8/.
Step 3 — Verify Built Binary Version
Confirm the binary reports the correct version before uploading:
timeout 2s ./dist/1.0.8/zlh-agent-linux-amd64 2>&1 | head -n 1
Expected output contains: starting ZeroLagHub Agent v1.0.8
If the version string is wrong, stop here. Do not upload a mismatched binary.
Step 4 — Upload Release Folder to Artifact Server
scp -r -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
dist/1.0.8 root@10.60.0.251:/opt/zlh/agents/versions/
Step 5 — Verify Files on Artifact Server
ssh root@10.60.0.251 'ls -lah /opt/zlh/agents/versions/1.0.8'
The directory must contain both:
zlh-agent-linux-amd64zlh-agent-linux-amd64.sha256
If either is missing, re-upload before proceeding.
Step 6 — Update Manifest on Artifact Server
Edit /opt/zlh/agents/manifest.json on the artifact server:
ssh root@10.60.0.251
nano /opt/zlh/agents/manifest.json
Set the following fields:
"latest"→"1.0.8""channels": { "stable": "1.0.8" }- Add a new entry under
"artifacts":
"1.0.8": {
"linux_amd64": {
"binary": "versions/1.0.8/zlh-agent-linux-amd64",
"sha256": "versions/1.0.8/zlh-agent-linux-amd64.sha256"
}
}
Do not remove old version entries from artifacts — keep the full history.
Step 7 — Validate Manifest Remotely
curl -s http://10.60.0.251:8080/agents/manifest.json | jq
Check:
channels.stablematches an existing key inartifactslatestmatcheschannels.stable- The new version entry has the correct paths
If anything looks wrong, fix the manifest before triggering any updates.
Step 8 — Canary: Trigger Update on a Test Node
Pick one non-production or low-traffic node as the canary.
# Trigger the update
curl -s -X POST http://127.0.0.1:18888/agent/update | jq
# Wait for the agent to restart
sleep 3
# Confirm new version is running
curl -s http://127.0.0.1:18888/version | jq
Expected: version field shows v1.0.8
Do not proceed to Step 9 until the canary confirms the correct version.
Step 9 — Verify Service Health and Logs on Canary
systemctl status zlh-agent --no-pager
journalctl -u zlh-agent -n 80 --no-pager
Look for:
- Service shows
active (running) - No crash/restart loops in logs
- No unexpected errors in the first 80 log lines
Step 10 — Roll Out to Remaining Nodes
Trigger /agent/update from the API backend or orchestrator in batches. Do not broadcast to all nodes simultaneously — stagger to catch any issues early.
The agent's ZLH_AGENT_UPDATE_MODE controls behavior:
auto— agent self-updates when triggerednotify— agent logs that an update is available but waitsoff— agent ignores update signals
Important Rules
| Rule | Detail |
|---|---|
| ✅ Always use build script | ./scripts/build-release.sh <version> |
✅ Always upload binary + .sha256 as a pair |
Never upload one without the other |
| ✅ Validate manifest before triggering updates | channels.stable must match an artifacts key |
| ✅ Canary first, fleet second | Always validate on one node before rolling out |
❌ Never use go build -o zlh-agent for releases |
Bypasses version embedding and checksum generation |
| ❌ Never reuse a version number or path | Always bump; never overwrite an existing release folder |
❌ Never remove old artifacts entries from manifest |
Keep full history for rollback reference |
Rollback Procedure
If a release is bad, update the manifest to point channels.stable and latest back to the last known-good version. Trigger /agent/update on affected nodes.
The old binary is still present on the artifact server under its original version folder, so no re-upload is needed for a rollback.
Related
- Agent endpoint specs:
Agent_Endpoint_Specifications_Phase1.md - Agent update mode env var:
ZLH_AGENT_UPDATE_MODE(auto|notify|off) - Update check interval:
ZLH_AGENT_UPDATE_INTERVAL(default30m) - Artifact server location:
10.60.0.251:/opt/zlh/agents/