zlh-grind/SESSION_LOG/SESSION_LOG.md

6.4 KiB
Raw Permalink Blame History

Session Log ZeroLagHub (Execution Ledger)

Chronological ledger of notable execution work. Keep this high signal.


2026-03-01 Upload Pipeline + Filesystem Consolidation

Completed

  • Implemented direct runtime upload model
  • Removed staging and symlink exploration
  • Added .zlh_metadata.json provenance tracking
  • Extended stat response with source
  • Implemented raw http.request streaming in API
  • Confirmed agent upload via direct curl
  • Identified API upload proxy transport timing issue
  • Switched from fetch() to native piping
  • Consolidated filesystem architecture docs (filesystem-and-file-browser.md, mod-deployment-safety.md)
  • Full grind repo consolidation pass (README, CONSTRAINTS, OPEN_THREADS, SESSION_START, ANTI_DRIFT, UPSTREAMS)

System State

  • Read: Stable
  • Write: Stable with shadow backup
  • Delete: Constrained and safe
  • Upload: Implemented, transport tuning in progress
  • Console: Stable

Remaining Focus

  • Improve upload transport diagnostics
  • Increase upload timeout
  • Dev server filesystem model planning

2026-02-23 Mod System Now Operational

Full mod lifecycle implemented and confirmed end-to-end.

What shipped

  • Modrinth search integrated (GET /api/game/mods/search)
  • Full install/list/enable/disable/delete routes live behind auth + ownership
  • Agent host allowlist expanded to include cdn.modrinth.com
  • Filename validation updated to allow + (required for Modrinth filenames like sodium-neoforge-0.6.13+mc1.21.4.jar)
  • Engine metadata bug fixed: engineType="neoforge", engineVersion="1.21.4" (was "minecraft" / "neoforge-1.21.4")
  • API → Agent payload contract corrected (field names and source field)
  • Frontend: mod search drawer, installed panel, enable/disable/delete, toast notifications

System state

  • Full lifecycle via API:
  • Frontend search + install:
  • Enable/disable/delete:
  • Installed flag displayed:
  • Soft delete to /mods-removed:
  • No retention automation: intentional
  • No install queue: intentional
  • Filesystem authoritative: confirmed

Known tradeoffs locked in

  • No deterministic Modrinth project ID persistence (installed detection is heuristic)
  • No DB tracking of mod state
  • Soft delete is permanent until file browser enables manual restore

Pending

  • Response corruption investigation (one early curl output appeared malformed)
  • API error mapping: "mod already exists"409 (currently 502)
  • Portal UI wiring for install flow

2026-02-22 Architecture strategy session (Claude)

Artifact server strategy decided

  • Moved away from self-hosting all mod artifacts.
  • zlh-artifactd Phase 1 scope: mod resolver only (Modrinth API + NeoForge maven).
    • No local mod storage in Phase 1; agent fetches from upstream, verifies hash.
    • Local caching added in Phase 2 based on real usage patterns.
  • Full artifact orchestrator (game jars, runtimes, mods unified) is the Phase 3 vision.
  • Rationale: single-operator sustainability; learn ingestion patterns from mods before automating game jars.

Mod platform decision

  • Modrinth as primary source (open API, no key required, hash provided, NeoForge/Fabric/Forge/Quilt filter support).
  • CurseForge: manual SFTP upload by user (their friction, not ours).
  • Modrinth API base: api.modrinth.com/v2, rate limit 300 req/min, requires User-Agent header.

NeoForge versions for artifact server

  • 1.21.1 highest mod ecosystem density (16,000+ mods)
  • 1.21.4 next significant stable adoption
  • 1.21.11 current latest
  • 1.20.x NOT needed for NeoForge; existing Forge covers that ecosystem.

Vanilla MC versions to add

  • Currently at 1.21.7; need to add 1.21.8 through 1.21.11.
  • Source: Mojang version manifest at piston-meta.mojang.com/mc/game/version_manifest_v2.json.

Confirmed loader coverage at launch

  • Vanilla, Paper, Purpur, Forge, Quilt (covers Fabric mods too) already in place.
  • NeoForge added (3 versions above).
  • No Fabric standalone needed; Quilt provides compatibility.

2026-02-21 → 2026-02-22 Game telemetry routing + agent Phase 1 feature set

API (control plane)

  • Added src/routes/game.js, mounted at /api/game.
  • Added:
    • GET /api/game/servers/:id/players (Redis-backed, normalized response)
    • GET /api/game/servers/:id/update-status (agent version + update state)
  • Extended src/utils/agentPoller.js:
    • poll agent GET /game/players for game containers
    • poll agent GET /version + GET /agent/update/status
    • cache agentVersion, updateStatus, updateAvailable
  • Persisted engineType/engineVersion in ContainerInstance on provisioning.
  • Rolled back provisioning-time agent update triggers (back to direct /config flow).

Frontend (portal)

  • Game console polling switched to:
    • GET /api/game/servers/:id/players
    • GET /api/game/servers/:id/update-status
  • UI shows player count + optional name list.
  • Version context rendered as Version: X or Version: X -> Y.

Agent (runtime)

  • /game/players hardened for more reliable Minecraft presence (port/protocol fallbacks).
  • Added periodic update checks:
    • ZLH_AGENT_UPDATE_MODE=notify|auto|off
    • ZLH_AGENT_UPDATE_INTERVAL (default 30m)
  • Implemented Phase 1 mod management:
    • GET /game/mods
    • POST /game/mods/install (allowlist + redirect constraints + size cap + sha256 verify)
    • PATCH /game/mods/:mod_id (enable/disable)
    • DELETE /game/mods/:mod_id (moves to mods-removed/)
  • Implemented GET /metrics/process (PID + VmRSS/VmSize; restart_count mapped to crashCount).
  • Added Minecraft readiness probe after start/restart and stop-wait-before-restart sync to reduce restart races.
  • Added lifecycle.log for lifecycle/probe diagnostics.

Artifacts

  • Expanded Minecraft artifact tree to include NeoForge under /opt/zlh/zpacks/minecraft/neoforge/<mc_version>/.
  • Encountered installer naming contract mismatch; Phase 1 fix is to normalize the NeoForge installer filename for predictable installs.

2026-02-07 → 2026-02-08 Console stabilization

  • Migrated console from log tailing to PTY-backed interactive console over WebSockets.
  • Hardened WS stability:
    • moved to a single-writer pattern
    • ensured exactly one PTY read loop per {vmid, container_type}
  • Fixed "it works but not on prod" issues caused by stale binaries still running.
  • Confirmed systemd points to /opt/zlh-agent/current/zlh-agent with current -> releases/<ver> layout.