diff --git a/SESSION_LOG.md b/SESSION_LOG.md index e5b30c3..6d2a422 100644 --- a/SESSION_LOG.md +++ b/SESSION_LOG.md @@ -41,36 +41,37 @@ Blocker: ## 2026-03-15 -Architecture review session. No code changes. Key decisions and findings: +Architecture review session. Key decisions and findings: Dev container model: - 1 server / 1 container / 1 world confirmed as correct model - Dev containers: full R/W access under /home/dev/workspace, no allowlist - Multiverse/multi-world via plugins is customer-managed, not a platform concern - Port exposure (dev-.zerolaghub.com) identified as next major dev feature — future work -- Wildcard DNS pre-planning needed before port exposure implementation - dotnet SDK covers all C# game modding (Valheim, Core Keeper, Vintage Story, Rust/Oxide) - Code Server confirmed as correct browser IDE approach given single public IP constraint +- Traefik dynamic file provider confirmed as correct routing approach — no plugin needed, no SRV records needed Agent review (zlh-agent commit 6019d0bc — 2026-03-15): - Catalog transition confirmed correct — ValidateRuntimeSelection gates all dev provisions - Scripts unchanged — embedded script execution via bash stdin pipe, no 126 risk from runtime installs - devcontainer/common.go is clean and complete -- node/verify.go has hardcoded /opt/zlh/runtime/node/bin/node — wrong path, pre-existing issue, not a regression -- node/python/go/java install packages still use old version-unaware marker pattern — pre-existing, not a regression from catalog work -- node exporter baked into containers — disk/CPU/memory/network already covered by Prometheus -- Promtail present but status unknown +- node/verify.go has hardcoded /opt/zlh/runtime/node/bin/node — wrong path, pre-existing issue +- node/python/go/java install packages still use old version-unaware marker pattern — pre-existing, not a regression -Agent future work identified (not yet implemented, priority order): -1. Unified structured logging (slog) — Promtail/Loki integration needs structured fields to be queryable -2. Dev container status/readiness — /status needs provisioningComplete + provisioningError for dev containers -3. Crash recovery with backoff — auto-restart on crash with increasing delay (30s/60s/120s), max 3 attempts, then error state -4. Graceful shutdown verification — confirm SIGTERM + wait before SIGKILL for Minecraft world save safety -5. Agent restart/process reattachment — detect existing process on agent restart, reattach rather than double-start -6. Disk pressure warning in /status — agent-level signal before node exporter threshold alerts +Agent future work (priority order): +1. Unified structured logging (slog) — Promtail/Loki integration needs structured fields +2. Dev container /status — provisioningComplete + provisioningError fields +3. Crash recovery with backoff — 30s/60s/120s, max 3 attempts, then error state +4. Graceful shutdown verification — SIGTERM + wait before SIGKILL for Minecraft world save safety +5. Agent restart/process reattachment — detect existing process on restart -Explicitly out of scope (not Pterodactyl): -- User management, permissions, billing -- Multi-container orchestration -- Plugin/extension systems -- Anything owned by API or portal +Code-server routing: +- Artifact fix confirmed working 2026-03-15 +- Binary confirmed present at /opt/zlh/services/code-server/bin/code-server +- Root cause of ERR_CONNECTION_CLOSED identified: code-server is installed but never launched +- Port conflict: Node runtime is binding 6000, code-server cannot share the port +- Two fixes needed: + 1. Assign code-server a port that won't conflict with Node (6000 taken) + 2. Add launch step to addon install script — install != start, binary must be daemonized after provisioning +- Suggested launch: nohup /opt/zlh/services/code-server/bin/code-server --bind-addr 0.0.0.0: --auth none /home/dev/workspace > /opt/zlh-agent/logs/code-server.log 2>&1 &