knowledge-base/Runbooks/Agent_Release_Runbook.md

5.1 KiB

🚀 Agent Release Runbook

Applies To: zlh-agent Go binary
Artifact Server: 10.60.0.251 (/opt/zlh/agents/)
Agent HTTP Port: 18888
Last Updated: February 22, 2026


Overview

This runbook covers the full process for building, uploading, and rolling out a new zlh-agent release. Follow all steps in order. Do not skip the canary validation step before rolling out to remaining nodes.


Step 1 — Choose Next Version

Always bump the version. Never reuse an existing version number or folder.

Use semantic versioning: MAJOR.MINOR.PATCH
Example: if current is 1.0.7, next is 1.0.8.

Check the current manifest to confirm what's already published:

curl -s http://10.60.0.251:8080/agents/manifest.json | jq '.latest'

Step 2 — Build the Release Artifact

Always build via the release script. Do not use go build -o zlh-agent directly.

cd /opt/zlh-agent
./scripts/build-release.sh 1.0.8

This produces the binary and .sha256 checksum under dist/1.0.8/.


Step 3 — Verify Built Binary Version

Confirm the binary reports the correct version before uploading:

timeout 2s ./dist/1.0.8/zlh-agent-linux-amd64 2>&1 | head -n 1

Expected output contains: starting ZeroLagHub Agent v1.0.8

If the version string is wrong, stop here. Do not upload a mismatched binary.


Step 4 — Upload Release Folder to Artifact Server

scp -r -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
  dist/1.0.8 root@10.60.0.251:/opt/zlh/agents/versions/

Step 5 — Verify Files on Artifact Server

ssh root@10.60.0.251 'ls -lah /opt/zlh/agents/versions/1.0.8'

The directory must contain both:

  • zlh-agent-linux-amd64
  • zlh-agent-linux-amd64.sha256

If either is missing, re-upload before proceeding.


Step 6 — Update Manifest on Artifact Server

Edit /opt/zlh/agents/manifest.json on the artifact server:

ssh root@10.60.0.251
nano /opt/zlh/agents/manifest.json

Set the following fields:

  • "latest""1.0.8"
  • "channels": { "stable": "1.0.8" }
  • Add a new entry under "artifacts":
"1.0.8": {
  "linux_amd64": {
    "binary": "versions/1.0.8/zlh-agent-linux-amd64",
    "sha256": "versions/1.0.8/zlh-agent-linux-amd64.sha256"
  }
}

Do not remove old version entries from artifacts — keep the full history.


Step 7 — Validate Manifest Remotely

curl -s http://10.60.0.251:8080/agents/manifest.json | jq

Check:

  • channels.stable matches an existing key in artifacts
  • latest matches channels.stable
  • The new version entry has the correct paths

If anything looks wrong, fix the manifest before triggering any updates.


Step 8 — Canary: Trigger Update on a Test Node

Pick one non-production or low-traffic node as the canary.

# Trigger the update
curl -s -X POST http://127.0.0.1:18888/agent/update | jq

# Wait for the agent to restart
sleep 3

# Confirm new version is running
curl -s http://127.0.0.1:18888/version | jq

Expected: version field shows v1.0.8

Do not proceed to Step 9 until the canary confirms the correct version.


Step 9 — Verify Service Health and Logs on Canary

systemctl status zlh-agent --no-pager
journalctl -u zlh-agent -n 80 --no-pager

Look for:

  • Service shows active (running)
  • No crash/restart loops in logs
  • No unexpected errors in the first 80 log lines

Step 10 — Roll Out to Remaining Nodes

Trigger /agent/update from the API backend or orchestrator in batches. Do not broadcast to all nodes simultaneously — stagger to catch any issues early.

The agent's ZLH_AGENT_UPDATE_MODE controls behavior:

  • auto — agent self-updates when triggered
  • notify — agent logs that an update is available but waits
  • off — agent ignores update signals

Important Rules

Rule Detail
Always use build script ./scripts/build-release.sh <version>
Always upload binary + .sha256 as a pair Never upload one without the other
Validate manifest before triggering updates channels.stable must match an artifacts key
Canary first, fleet second Always validate on one node before rolling out
Never use go build -o zlh-agent for releases Bypasses version embedding and checksum generation
Never reuse a version number or path Always bump; never overwrite an existing release folder
Never remove old artifacts entries from manifest Keep full history for rollback reference

Rollback Procedure

If a release is bad, update the manifest to point channels.stable and latest back to the last known-good version. Trigger /agent/update on affected nodes.

The old binary is still present on the artifact server under its original version folder, so no re-upload is needed for a rollback.


  • Agent endpoint specs: Agent_Endpoint_Specifications_Phase1.md
  • Agent update mode env var: ZLH_AGENT_UPDATE_MODE (auto|notify|off)
  • Update check interval: ZLH_AGENT_UPDATE_INTERVAL (default 30m)
  • Artifact server location: 10.60.0.251:/opt/zlh/agents/