# Session Log — zlh-grind Append-only execution log for GPT-assisted development work. Do not rewrite or reorder past entries. --- ## 2025-12-20 - Goal: Restore reliable end-to-end provisioning for devcontainers and agent-managed installs. - Observed repeated failures during devcontainer runtime installation (node, python, go, java). - Initial assumption was installer regression; investigation showed installers were enforcing contract correctly. - Root cause identified: agent was not exporting required runtime environment variables (notably `RUNTIME_VERSION`). --- ## 2025-12-21 - Deep dive on zlh-agent devcontainer provisioning flow. - Confirmed that all devcontainer installers intentionally require `RUNTIME_VERSION` and fail fast if missing. - Clarified that payload JSON is not read by installers; agent must project intent via environment variables. - Verified that installer logic itself (artifact naming, extraction, symlink layout) was correct. ### Embedded installer execution findings - Agent executes installers as **embedded scripts**, not filesystem paths. - Identified critical requirement: shared installer logic (`common.sh`) and runtime installer must execute in the **same shell session**. - Failure mode observed: `install_runtime: command not found` → caused by running runtime installer without `common.sh` loaded. - Confirmed this explains missing runtime directories and lack of artifact downloads. ### Installer architecture changes - Refactored installer model to: - `common.sh`: shared, strict, embedded-safe installation logic - per-runtime installers (`node`, `python`, `go`, `java`) as declarative descriptors only - Established that runtime installers are intentionally minimal and declarative by design. - Confirmed that this preserves existing runtime layout: `/opt/zlh/runtime///current` ### Artifact layout update - Artifact naming and layout changed to simplified form: - `node-24.tar.xz` - `python-3.12.tar.xz` - `go-1.22.tar.gz` - `jdk-21.tar.gz` - Identified mismatch between runtime name and archive prefix (notably Java). - Introduced `ARCHIVE_PREFIX` as a runtime-level variable to resolve naming cleanly. ### Final conclusions - No regression in installer logic; failures were execution-order and environment-projection issues. - Correct fix is agent-side: - concatenate `common.sh` + runtime installer into one bash invocation - inject `RUNTIME_VERSION` (and related vars) into environment - Architecture now supports deterministic, artifact-driven, embedded-safe installs. Status: **Root cause resolved; implementation pending agent patch & installer updates.** --- ## 2025-12-26 - Goal: Enable SSH access to dev containers via bastion for external users. - Split DNS architecture confirmed working: Cloudflare (public) + Technitium (internal). ### Dev container SSH (internal success) - Root cause of initial SSH failures: **missing SSH host keys** in unprivileged LXC containers. - Fixed by ensuring host keys exist before sshd starts. - Verified internal SSH access works from WireGuard/LAN: ``` ssh -J zlh@10.100.0.48 root@dev-6038.zerolaghub.dev ``` - Confirmed not a firewall issue (iptables default ACCEPT, sshd listening on :22). ### Agent SSH provisioning requirements identified - Agent must automate SSH setup for new containers: - Install and enable sshd - Generate SSH host keys if missing (add to `common.sh` or bootstrap) - Create `devuser` with sudo access - Configure authorized_keys for key-based auth ### zlh-cli progress - Built successfully in Go, deployed to bastion at `/usr/local/bin/zlh`. - **Known bugs when running ON bastion**: - Incorrectly attempts to jump via `zlh-bastion.zerolaghub.dev` (should use localhost/direct) - User/host targeting logic needs fixes (was targeting bastion instead of dev container) - Goal: reduce `ssh -J` complexity to simple `zlh ssh 6038` command. ### Bastion public SSH blocker (ACTIVE) - **Critical issue**: Public SSH to bastion fails immediately. - Symptoms: - TCP connection succeeds (SYN/ACK completes) - SSH handshake never proceeds: `kex_exchange_identification: Connection closed by remote host` - Not authentication failure - connection closes before banner exchange - Verified: - Cloudflare DNS: `zlh-bastion.zerolaghub.dev` → `139.64.165.248` (DNS-only) - OPNsense NAT rule forwards :22 to 10.100.0.48 - Internal SSH to bastion works perfectly - Fail2ban shows no blocks - Tried disabling OPNsense SSH, changing ports - same failure - **This is NOT the container host-key issue** (that was internal and fixed). ### Next debug steps for bastion blocker 1. tcpdump on bastion during external connection attempt 2. OPNsense live firewall log during external attempt 3. Confirm NAT truly reaches bastion sshd (not terminating upstream) 4. Check for ISP/modem interference or hairpin NAT issues Status: **Dev container SSH working internally; bastion public access blocked at network layer.** --- ## 2025-12-28 — APIv2 Auth + Portal Alignment Session ### Work Completed - APIv2 auth route verified functional (JWT-based) - bcrypt password verification confirmed - `/api/instances` endpoint verified working without auth - Portal/API boundary clarified: portal owns identity UX, API owns validation + DB - Confirmed no CSRF or cookie-based auth required (stateless JWT) ### Key Findings - Portal still contains APIv1 / Pterodactyl assumptions - `zlh-grind` is documentation + constraint repo only (no code) - Instances endpoint behavior was correct; earlier failures were route misuse ### Decisions - APIv2 auth will remain stateless (JWT only) - No CSRF protection will be implemented - Portal must fully remove APIv1 and Pterodactyl patterns ### Next Actions - Enforce `requireAuth` selectively in APIv2 - Update portal login to match APIv2 contract - Track portal migration progress in OPEN_THREADS