87 lines
4.6 KiB
Markdown
87 lines
4.6 KiB
Markdown
# Session Log – zlh-grind
|
||
|
||
Append-only execution log for GPT-assisted development work.
|
||
Do not rewrite or reorder past entries.
|
||
|
||
---
|
||
|
||
## 2025-12-14
|
||
- Goal: Stabilize zlh-agent provisioning pipeline for game + dev containers + addons without regressing game provisioning.
|
||
- Scope: Agent routing (ctype=dev), local artifacts strategy, addon skeleton, initial end-to-end tests.
|
||
- Adopted single base template approach (zlh-agent template) where agent installs roles at runtime.
|
||
|
||
---
|
||
|
||
## 2025-12-20
|
||
- Goal: Restore reliable end-to-end provisioning for **game + dev** without regressions; preserve the explicit orchestration steps/logging.
|
||
|
||
### Current Architecture Reality (IMPORTANT)
|
||
- We are on **single base template**: `AGENT_TEMPLATE_VMID` is the only template in `.env`.
|
||
- In this workflow, **any zlh-agent code change requires rebuilding + pushing the binary into the template container**.
|
||
- That effectively means: **agent change = new template promotion**.
|
||
|
||
### Key Findings / Fixes
|
||
- Game provisioning steps are functioning again and show clear step-by-step orchestration output (allocate VMID → clone → configure → start → IP → agent config → wait → DB save → edge publish).
|
||
- Dev provisioning failure was traced to a **wire contract mismatch** between API JSON and agent `state.Config` struct tags:
|
||
- Agent expects: `container_type` (snake_case) on the JSON wire
|
||
- API refactors were sending: `ctype` / `containerType` (camelCase) in some variants
|
||
- Result: `cfg.ContainerType == ""` on decode → agent routes into game path → errors like:
|
||
- `unsupported container identity (containerType="" game="")`
|
||
|
||
### Decision (to minimize churn)
|
||
- Because agent changes force a template promotion in this pipeline, the correct short-term move is:
|
||
- **Update API to emit `container_type`** to match the existing agent contract
|
||
- Avoid touching agent code until we intentionally rev the template
|
||
|
||
### Operational Guardrails
|
||
- Do NOT remove or "simplify" the explicit provisioning steps/logging; they are required for debugging and operator confidence.
|
||
- Treat `container_type` as the canonical wire key until the next planned template rev.
|
||
|
||
---
|
||
|
||
## 2025-12-21
|
||
- Goal: Restore reliable end-to-end provisioning for devcontainers and agent-managed installs.
|
||
- Observed repeated failures during devcontainer runtime installation (node, python, go, java).
|
||
- Initial assumption was installer regression; investigation showed installers were enforcing contract correctly.
|
||
- Root cause identified: agent was not exporting required runtime environment variables (notably `RUNTIME_VERSION`).
|
||
|
||
### Devcontainer provisioning investigation
|
||
- Deep dive on zlh-agent devcontainer provisioning flow.
|
||
- Confirmed that all devcontainer installers intentionally require `RUNTIME_VERSION` and fail fast if missing.
|
||
- Clarified that payload JSON is not read by installers; agent must project intent via environment variables.
|
||
- Verified that installer logic itself (artifact naming, extraction, symlink layout) was correct.
|
||
|
||
### Embedded installer execution findings
|
||
- Agent executes installers as **embedded scripts**, not filesystem paths.
|
||
- Identified critical requirement: shared installer logic (`common.sh`) and runtime installer must execute in the **same shell session**.
|
||
- Failure mode observed: `install_runtime: command not found` → caused by running runtime installer without `common.sh` loaded.
|
||
- Confirmed this explains missing runtime directories and lack of artifact downloads.
|
||
|
||
### Installer architecture changes
|
||
- Refactored installer model to:
|
||
- `common.sh`: shared, strict, embedded-safe installation logic
|
||
- per-runtime installers (`node`, `python`, `go`, `java`) as declarative descriptors only
|
||
- Established that runtime installers are intentionally minimal and declarative by design.
|
||
- Confirmed that this preserves existing runtime layout:
|
||
`/opt/zlh/runtime/<language>/<version>/current`
|
||
|
||
### Artifact layout update
|
||
- Artifact naming and layout standardized to simplified form:
|
||
- `node-24.tar.xz`
|
||
- `python-3.12.tar.xz`
|
||
- `go-1.22.tar.gz`
|
||
- `jdk-21.tar.gz`
|
||
- Identified mismatch between runtime name and archive prefix (notably Java).
|
||
- Introduced `ARCHIVE_PREFIX` as a runtime-level variable to resolve naming cleanly.
|
||
|
||
### Final conclusions
|
||
- No regression in installer logic; failures were execution-order and environment-projection issues.
|
||
- Correct fix is agent-side:
|
||
- concatenate `common.sh` + runtime installer into one bash invocation
|
||
- inject `RUNTIME_VERSION` (and related vars) into environment
|
||
- Architecture now supports deterministic, artifact-driven, embedded-safe installs.
|
||
|
||
Status: **Root cause resolved; implementation pending agent patch & installer updates.**
|
||
|
||
---
|