4.6 KiB
4.6 KiB
Session Log – zlh-grind
Append-only execution log for GPT-assisted development work.
Do not rewrite or reorder past entries.
2025-12-14
- Goal: Stabilize zlh-agent provisioning pipeline for game + dev containers + addons without regressing game provisioning.
- Scope: Agent routing (ctype=dev), local artifacts strategy, addon skeleton, initial end-to-end tests.
- Adopted single base template approach (zlh-agent template) where agent installs roles at runtime.
2025-12-20
- Goal: Restore reliable end-to-end provisioning for game + dev without regressions; preserve the explicit orchestration steps/logging.
Current Architecture Reality (IMPORTANT)
- We are on single base template:
AGENT_TEMPLATE_VMIDis the only template in.env. - In this workflow, any zlh-agent code change requires rebuilding + pushing the binary into the template container.
- That effectively means: agent change = new template promotion.
Key Findings / Fixes
- Game provisioning steps are functioning again and show clear step-by-step orchestration output (allocate VMID → clone → configure → start → IP → agent config → wait → DB save → edge publish).
- Dev provisioning failure was traced to a wire contract mismatch between API JSON and agent
state.Configstruct tags:- Agent expects:
container_type(snake_case) on the JSON wire - API refactors were sending:
ctype/containerType(camelCase) in some variants - Result:
cfg.ContainerType == ""on decode → agent routes into game path → errors like:unsupported container identity (containerType="" game="")
- Agent expects:
Decision (to minimize churn)
- Because agent changes force a template promotion in this pipeline, the correct short-term move is:
- Update API to emit
container_typeto match the existing agent contract - Avoid touching agent code until we intentionally rev the template
- Update API to emit
Operational Guardrails
- Do NOT remove or "simplify" the explicit provisioning steps/logging; they are required for debugging and operator confidence.
- Treat
container_typeas the canonical wire key until the next planned template rev.
2025-12-21
- Goal: Restore reliable end-to-end provisioning for devcontainers and agent-managed installs.
- Observed repeated failures during devcontainer runtime installation (node, python, go, java).
- Initial assumption was installer regression; investigation showed installers were enforcing contract correctly.
- Root cause identified: agent was not exporting required runtime environment variables (notably
RUNTIME_VERSION).
Devcontainer provisioning investigation
- Deep dive on zlh-agent devcontainer provisioning flow.
- Confirmed that all devcontainer installers intentionally require
RUNTIME_VERSIONand fail fast if missing. - Clarified that payload JSON is not read by installers; agent must project intent via environment variables.
- Verified that installer logic itself (artifact naming, extraction, symlink layout) was correct.
Embedded installer execution findings
- Agent executes installers as embedded scripts, not filesystem paths.
- Identified critical requirement: shared installer logic (
common.sh) and runtime installer must execute in the same shell session. - Failure mode observed:
install_runtime: command not found→ caused by running runtime installer withoutcommon.shloaded. - Confirmed this explains missing runtime directories and lack of artifact downloads.
Installer architecture changes
- Refactored installer model to:
common.sh: shared, strict, embedded-safe installation logic- per-runtime installers (
node,python,go,java) as declarative descriptors only
- Established that runtime installers are intentionally minimal and declarative by design.
- Confirmed that this preserves existing runtime layout:
/opt/zlh/runtime/<language>/<version>/current
Artifact layout update
- Artifact naming and layout standardized to simplified form:
node-24.tar.xzpython-3.12.tar.xzgo-1.22.tar.gzjdk-21.tar.gz
- Identified mismatch between runtime name and archive prefix (notably Java).
- Introduced
ARCHIVE_PREFIXas a runtime-level variable to resolve naming cleanly.
Final conclusions
- No regression in installer logic; failures were execution-order and environment-projection issues.
- Correct fix is agent-side:
- concatenate
common.sh+ runtime installer into one bash invocation - inject
RUNTIME_VERSION(and related vars) into environment
- concatenate
- Architecture now supports deterministic, artifact-driven, embedded-safe installs.
Status: Root cause resolved; implementation pending agent patch & installer updates.