zlh-grind/SESSION_LOG.md

87 lines
4.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Session Log zlh-grind
Append-only execution log for GPT-assisted development work.
Do not rewrite or reorder past entries.
---
## 2025-12-14
- Goal: Stabilize zlh-agent provisioning pipeline for game + dev containers + addons without regressing game provisioning.
- Scope: Agent routing (ctype=dev), local artifacts strategy, addon skeleton, initial end-to-end tests.
- Adopted single base template approach (zlh-agent template) where agent installs roles at runtime.
---
## 2025-12-20
- Goal: Restore reliable end-to-end provisioning for **game + dev** without regressions; preserve the explicit orchestration steps/logging.
### Current Architecture Reality (IMPORTANT)
- We are on **single base template**: `AGENT_TEMPLATE_VMID` is the only template in `.env`.
- In this workflow, **any zlh-agent code change requires rebuilding + pushing the binary into the template container**.
- That effectively means: **agent change = new template promotion**.
### Key Findings / Fixes
- Game provisioning steps are functioning again and show clear step-by-step orchestration output (allocate VMID → clone → configure → start → IP → agent config → wait → DB save → edge publish).
- Dev provisioning failure was traced to a **wire contract mismatch** between API JSON and agent `state.Config` struct tags:
- Agent expects: `container_type` (snake_case) on the JSON wire
- API refactors were sending: `ctype` / `containerType` (camelCase) in some variants
- Result: `cfg.ContainerType == ""` on decode → agent routes into game path → errors like:
- `unsupported container identity (containerType="" game="")`
### Decision (to minimize churn)
- Because agent changes force a template promotion in this pipeline, the correct short-term move is:
- **Update API to emit `container_type`** to match the existing agent contract
- Avoid touching agent code until we intentionally rev the template
### Operational Guardrails
- Do NOT remove or "simplify" the explicit provisioning steps/logging; they are required for debugging and operator confidence.
- Treat `container_type` as the canonical wire key until the next planned template rev.
---
## 2025-12-21
- Goal: Restore reliable end-to-end provisioning for devcontainers and agent-managed installs.
- Observed repeated failures during devcontainer runtime installation (node, python, go, java).
- Initial assumption was installer regression; investigation showed installers were enforcing contract correctly.
- Root cause identified: agent was not exporting required runtime environment variables (notably `RUNTIME_VERSION`).
### Devcontainer provisioning investigation
- Deep dive on zlh-agent devcontainer provisioning flow.
- Confirmed that all devcontainer installers intentionally require `RUNTIME_VERSION` and fail fast if missing.
- Clarified that payload JSON is not read by installers; agent must project intent via environment variables.
- Verified that installer logic itself (artifact naming, extraction, symlink layout) was correct.
### Embedded installer execution findings
- Agent executes installers as **embedded scripts**, not filesystem paths.
- Identified critical requirement: shared installer logic (`common.sh`) and runtime installer must execute in the **same shell session**.
- Failure mode observed: `install_runtime: command not found` → caused by running runtime installer without `common.sh` loaded.
- Confirmed this explains missing runtime directories and lack of artifact downloads.
### Installer architecture changes
- Refactored installer model to:
- `common.sh`: shared, strict, embedded-safe installation logic
- per-runtime installers (`node`, `python`, `go`, `java`) as declarative descriptors only
- Established that runtime installers are intentionally minimal and declarative by design.
- Confirmed that this preserves existing runtime layout:
`/opt/zlh/runtime/<language>/<version>/current`
### Artifact layout update
- Artifact naming and layout standardized to simplified form:
- `node-24.tar.xz`
- `python-3.12.tar.xz`
- `go-1.22.tar.gz`
- `jdk-21.tar.gz`
- Identified mismatch between runtime name and archive prefix (notably Java).
- Introduced `ARCHIVE_PREFIX` as a runtime-level variable to resolve naming cleanly.
### Final conclusions
- No regression in installer logic; failures were execution-order and environment-projection issues.
- Correct fix is agent-side:
- concatenate `common.sh` + runtime installer into one bash invocation
- inject `RUNTIME_VERSION` (and related vars) into environment
- Architecture now supports deterministic, artifact-driven, embedded-safe installs.
Status: **Root cause resolved; implementation pending agent patch & installer updates.**
---