docs: v2 — incorporate direct upload model, provenance metadata, API streaming, failure categories
This commit is contained in:
parent
7d7e2378f5
commit
ca105227c3
@ -1,238 +1,216 @@
|
|||||||
# ZeroLagHub – Mod Deployment Safety & Self-Healing Specification
|
# Mod Deployment Safety Model
|
||||||
|
|
||||||
**Version:** 1.0
|
**Version:** 2.0
|
||||||
**Phase:** Phase 1 Core Stability
|
**Updated:** 2026-03-01
|
||||||
|
**Phase:** Phase 1 — Active
|
||||||
**Applies To:** Game Containers
|
**Applies To:** Game Containers
|
||||||
**Owner:** Agent Layer
|
**Owner:** Agent Layer
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 1. Purpose
|
## Goal
|
||||||
|
|
||||||
Define the deterministic, self-healing deployment model for:
|
Allow user uploads and automated mod installs while:
|
||||||
|
|
||||||
- Modrinth-based automated mod installs
|
- Preventing path escape
|
||||||
- Dev artifact promotion installs
|
- Preserving runtime integrity
|
||||||
|
- Tracking provenance
|
||||||
The system must:
|
- Avoiding hidden deployment layers
|
||||||
|
|
||||||
- Prevent catastrophic mod deployment failures
|
|
||||||
- Automatically recover from crash loops
|
|
||||||
- Avoid operator intervention
|
|
||||||
- Preserve world/player data
|
|
||||||
- Avoid state explosion and disk bloat
|
|
||||||
|
|
||||||
> This system is **not** a backup solution. It is a **deployment safety mechanism**.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 2. Scope
|
## Deployment Types
|
||||||
|
|
||||||
### Included in Safety Snapshot
|
| Type | Path | Extension |
|
||||||
|
|------|------|-----------|
|
||||||
- `/mods`
|
| Mod (Modrinth) | `mods/<file>.jar` | `.jar` |
|
||||||
- `/config`
|
| Mod (user upload) | `mods/<file>.jar` | `.jar` |
|
||||||
- `server.properties`
|
| Datapack | `world/datapacks/<file>.zip` | `.zip` |
|
||||||
|
|
||||||
### Explicitly Excluded
|
|
||||||
|
|
||||||
- `/world`
|
|
||||||
- `/logs`
|
|
||||||
- Cache directories
|
|
||||||
- Runtime temp files
|
|
||||||
|
|
||||||
World backup is handled by a separate backup system.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 3. Core Concepts
|
## Direct Runtime Writes
|
||||||
|
|
||||||
### 3.1 Shadow Copy (File-Level Protection)
|
Uploads and automated installs are written directly into runtime directories.
|
||||||
|
|
||||||
Used for fast rollback of a single mod change.
|
No:
|
||||||
|
- Staging
|
||||||
|
- Symlinking
|
||||||
|
- Delayed deployment
|
||||||
|
|
||||||
- Only applies to the most recently modified mod
|
---
|
||||||
- Only one shadow exists at any time
|
|
||||||
- Shadow lifetime is limited to the stabilization window
|
|
||||||
|
|
||||||
**Example:**
|
## Atomic Write Process
|
||||||
|
|
||||||
```
|
1. Create temp file in target directory
|
||||||
mods/
|
2. Stream multipart body (or downloaded artifact) into temp file
|
||||||
coolmod.jar
|
3. Enforce size limit while streaming
|
||||||
coolmod.jar.shadow
|
4. `os.Rename()` temp → final filename
|
||||||
|
5. Update `.zlh_metadata.json`
|
||||||
|
|
||||||
|
If `overwrite=false` and file exists → `409`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Size Limits
|
||||||
|
|
||||||
|
Configured in agent:
|
||||||
|
|
||||||
|
| Type | Limit |
|
||||||
|
|------|-------|
|
||||||
|
| Mods | 250MB |
|
||||||
|
| Datapacks | 100MB |
|
||||||
|
|
||||||
|
Enforced server-side during streaming. Not at the API layer.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overwrite Behavior
|
||||||
|
|
||||||
|
- `overwrite=false` (default) → reject if file exists → `409`
|
||||||
|
- `overwrite=true` → replace file, update `uploaded_at` in metadata
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Agent-Side Guarantees
|
||||||
|
|
||||||
|
The agent guarantees:
|
||||||
|
|
||||||
|
- Final path is a file (not directory)
|
||||||
|
- Target is not a symlink
|
||||||
|
- Path resolves within runtime root
|
||||||
|
- Parent directory exists
|
||||||
|
- `.zlh_metadata.json` is hidden from file listing API
|
||||||
|
- `.zlh-shadow` is hidden from file listing API
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Provenance Metadata
|
||||||
|
|
||||||
|
All user uploads are marked `source: "user"` in `.zlh_metadata.json`.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"mods/sodium.jar": {
|
||||||
|
"source": "user",
|
||||||
|
"uploaded_at": "2026-03-01T22:37:01Z"
|
||||||
|
}
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3.2 Snapshot (State-Level Protection)
|
Modrinth-installed mods do not currently write provenance. Future curated installs may optionally write `source: "curated"`. Not currently implemented.
|
||||||
|
|
||||||
Lightweight archive of mod/config state. Created before any mod deployment.
|
No automatic inference of source from filename or path.
|
||||||
|
|
||||||
**Example path:**
|
---
|
||||||
|
|
||||||
```
|
## Failure Categories
|
||||||
.zlh_snapshots/deploy-<timestamp>.tar.gz
|
|
||||||
|
| Code | Meaning |
|
||||||
|
|------|---------|
|
||||||
|
| `409` | File exists (`overwrite=false`) |
|
||||||
|
| `413` | Size limit exceeded |
|
||||||
|
| `403` | Path rejected by allowlist |
|
||||||
|
| `502` | API transport failure (not agent validation) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## API Transport Considerations
|
||||||
|
|
||||||
|
The API acts strictly as a streaming proxy for uploads.
|
||||||
|
|
||||||
|
```js
|
||||||
|
req.pipe(proxyReq)
|
||||||
|
proxyRes.pipe(res)
|
||||||
```
|
```
|
||||||
|
|
||||||
**Contains:**
|
- Uses raw `http.request` piping — not `fetch`
|
||||||
|
- Does not buffer file contents
|
||||||
|
- Does not re-validate upload policy
|
||||||
|
- Upload timeout must be substantially larger than normal routes
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Mod Lifecycle (Current Phase 1 State)
|
||||||
|
|
||||||
|
**Install (Modrinth):**
|
||||||
|
Modrinth resolver → API → Agent → Verified download → `<serverRoot>/mods`
|
||||||
|
|
||||||
|
**Install (User Upload):**
|
||||||
|
Portal → API (streaming proxy) → Agent → Atomic write → `<serverRoot>/mods`
|
||||||
|
|
||||||
|
**Enable/Disable:**
|
||||||
|
Filesystem rename: `.jar` ↔ `.jar.disabled`
|
||||||
|
|
||||||
|
**Delete:**
|
||||||
|
Soft delete to `<serverRoot>/mods-removed` (no auto-purge, no retention policy)
|
||||||
|
|
||||||
|
Filesystem is canonical state. Agent cache invalidated after every mutation.
|
||||||
|
|
||||||
|
Restore of deleted mods handled manually via file browser (see `OPEN_THREADS.md`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Self-Healing Model (Automated Installs)
|
||||||
|
|
||||||
|
Applies to Modrinth installs and dev artifact promotions. Does **not** apply to user uploads — users assume responsibility for files they upload directly.
|
||||||
|
|
||||||
|
### Deployment States
|
||||||
|
|
||||||
|
```
|
||||||
|
IDLE → DEPLOYING → STABILIZING → STABLE → IDLE
|
||||||
|
↓
|
||||||
|
ROLLBACK_FILE → ROLLBACK_SNAPSHOT → FAILED_RECOVERY
|
||||||
|
```
|
||||||
|
|
||||||
|
### Snapshot Scope
|
||||||
|
|
||||||
|
Included:
|
||||||
- `mods/`
|
- `mods/`
|
||||||
- `config/`
|
- `config/`
|
||||||
- `server.properties`
|
- `server.properties`
|
||||||
|
|
||||||
Only one active snapshot exists at a time.
|
Excluded:
|
||||||
|
- `world/` (separate backup system)
|
||||||
|
- `logs/`
|
||||||
|
- Cache and temp files
|
||||||
|
|
||||||
|
### Stabilization Window
|
||||||
|
|
||||||
|
Hardcoded Phase 1: **3–5 minutes**
|
||||||
|
|
||||||
|
Server is stable when:
|
||||||
|
- Readiness probe succeeds
|
||||||
|
- No crash restart during the window
|
||||||
|
- Server runs continuously for full window duration
|
||||||
|
|
||||||
|
### Failure Triggers
|
||||||
|
|
||||||
|
| Trigger | Action |
|
||||||
|
|---------|--------|
|
||||||
|
| Early boot crash (exit within ~30s) | → `ROLLBACK_FILE` |
|
||||||
|
| Crash loop (≥3 crashes in window) | → `ROLLBACK_SNAPSHOT` |
|
||||||
|
| Readiness timeout (full window) | → `ROLLBACK_SNAPSHOT` |
|
||||||
|
|
||||||
|
### Recovery
|
||||||
|
|
||||||
|
**File rollback:** Restore `.jar.shadow` → `.jar`, monitor again. If stable → `IDLE`. If not → escalate.
|
||||||
|
|
||||||
|
**Snapshot restore:** Extract `.zlh_snapshots/deploy-<timestamp>.tar.gz`, monitor. If stable → `IDLE`. If not → `FAILED_RECOVERY`.
|
||||||
|
|
||||||
|
### Safety Constraints
|
||||||
|
|
||||||
|
- Only one file rollback attempt
|
||||||
|
- Only one snapshot restore attempt
|
||||||
|
- Both fail → `FAILED_RECOVERY`, stop automation, surface to API + UI
|
||||||
|
- World data never touched
|
||||||
|
- Only one active snapshot at a time
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 4. Deployment State Machine
|
## Metadata Tracking (Automated Installs)
|
||||||
|
|
||||||
### 4.1 States
|
Agent-level fields, cleared after stabilization success:
|
||||||
|
|
||||||
```
|
|
||||||
IDLE
|
|
||||||
DEPLOYING
|
|
||||||
STABILIZING
|
|
||||||
STABLE
|
|
||||||
ROLLBACK_FILE
|
|
||||||
ROLLBACK_SNAPSHOT
|
|
||||||
FAILED_RECOVERY
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4.2 Deployment Flow
|
|
||||||
|
|
||||||
**Step 1 – Stop Server**
|
|
||||||
- Ensure clean shutdown
|
|
||||||
- Reset crash counters
|
|
||||||
|
|
||||||
**Step 2 – Create Snapshot**
|
|
||||||
- Archive `mods/`, `config/`, `server.properties`
|
|
||||||
- Mark snapshot status = `pending`
|
|
||||||
|
|
||||||
**Step 3 – Prepare Shadow** *(if replacing existing mod)*
|
|
||||||
- If target mod exists, rename:
|
|
||||||
```
|
|
||||||
coolmod.jar → coolmod.jar.shadow
|
|
||||||
```
|
|
||||||
|
|
||||||
**Step 4 – Install New Mod**
|
|
||||||
- Validate SHA256 (if Modrinth source)
|
|
||||||
- Write file to `/mods`
|
|
||||||
- Set deployment metadata:
|
|
||||||
- `lastChangedMod`
|
|
||||||
- `changeTimestamp`
|
|
||||||
- `changeSource`
|
|
||||||
|
|
||||||
**Step 5 – Start Server**
|
|
||||||
- Transition to `STABILIZING` state
|
|
||||||
- Begin stabilization timer
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 5. Stabilization Window
|
|
||||||
|
|
||||||
### Duration
|
|
||||||
|
|
||||||
Hardcoded Phase 1 value: **3–5 minutes**
|
|
||||||
|
|
||||||
### Stability Conditions
|
|
||||||
|
|
||||||
Server is considered stable when:
|
|
||||||
|
|
||||||
- Readiness probe returns success
|
|
||||||
- Server remains running continuously for the full window duration
|
|
||||||
- No crash restart occurred during the window
|
|
||||||
|
|
||||||
### On Stability Confirmed
|
|
||||||
|
|
||||||
- Delete snapshot
|
|
||||||
- Delete shadow copy
|
|
||||||
- Clear deployment metadata
|
|
||||||
- Transition: `STABLE` → `IDLE`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 6. Failure Detection
|
|
||||||
|
|
||||||
### 6.1 Early Boot Crash
|
|
||||||
|
|
||||||
**Condition:** Process exits within X seconds (e.g., 30s) during stabilization window
|
|
||||||
|
|
||||||
**Action:** Transition to `ROLLBACK_FILE`
|
|
||||||
|
|
||||||
### 6.2 Crash Loop
|
|
||||||
|
|
||||||
**Condition:** ≥ N crashes (e.g., 3) within stabilization window
|
|
||||||
|
|
||||||
**Action:** Transition to `ROLLBACK_SNAPSHOT`
|
|
||||||
|
|
||||||
### 6.3 Readiness Timeout
|
|
||||||
|
|
||||||
**Condition:** Server fails readiness probe for the entire stabilization window
|
|
||||||
|
|
||||||
**Action:** Transition to `ROLLBACK_SNAPSHOT`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 7. Recovery Logic
|
|
||||||
|
|
||||||
### 7.1 File-Level Rollback (`ROLLBACK_FILE`)
|
|
||||||
|
|
||||||
**Conditions:**
|
|
||||||
- `lastChangedMod` exists
|
|
||||||
- Shadow copy exists
|
|
||||||
|
|
||||||
**Action:**
|
|
||||||
1. Stop server
|
|
||||||
2. Replace: `coolmod.jar.shadow` → `coolmod.jar`
|
|
||||||
3. Start server
|
|
||||||
4. Monitor again
|
|
||||||
|
|
||||||
**If stable:**
|
|
||||||
- Delete snapshot
|
|
||||||
- Delete shadow
|
|
||||||
- Return to `IDLE`
|
|
||||||
|
|
||||||
**If still unstable:**
|
|
||||||
- Escalate to snapshot restore
|
|
||||||
|
|
||||||
### 7.2 Snapshot Restore (`ROLLBACK_SNAPSHOT`)
|
|
||||||
|
|
||||||
**Conditions:**
|
|
||||||
- Snapshot exists
|
|
||||||
- Snapshot has not already been restored
|
|
||||||
|
|
||||||
**Action:**
|
|
||||||
1. Stop server
|
|
||||||
2. Extract snapshot archive
|
|
||||||
3. Start server
|
|
||||||
4. Mark snapshot as `restored`
|
|
||||||
|
|
||||||
**If stable:**
|
|
||||||
- Delete snapshot
|
|
||||||
- Clear metadata
|
|
||||||
- Return to `IDLE`
|
|
||||||
|
|
||||||
**If still unstable:**
|
|
||||||
- Transition to `FAILED_RECOVERY`
|
|
||||||
- Stop all automation
|
|
||||||
- Surface failure state to API and UI
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 8. Safety Constraints
|
|
||||||
|
|
||||||
- Never attempt infinite recovery loops
|
|
||||||
- Only **one** file rollback attempt allowed
|
|
||||||
- Only **one** snapshot restore attempt allowed
|
|
||||||
- If both fail → mark `FAILED_RECOVERY`
|
|
||||||
- No world data is touched under any circumstance
|
|
||||||
- No multiple snapshot retention
|
|
||||||
- Only one snapshot exists during the stabilization window
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 9. Metadata Tracking (Agent-Level)
|
|
||||||
|
|
||||||
Track the following fields (cleared after stabilization success):
|
|
||||||
|
|
||||||
```
|
```
|
||||||
lastChangedMod
|
lastChangedMod
|
||||||
@ -245,9 +223,22 @@ snapshotId
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 10. Logging Requirements
|
## Why No Symlinks
|
||||||
|
|
||||||
Structured logs must emit events for:
|
Symlink-based deployment was rejected because:
|
||||||
|
|
||||||
|
- Complicates mod loader behavior
|
||||||
|
- Breaks server-side loader compatibility
|
||||||
|
- Introduces unexpected runtime indirection
|
||||||
|
- Makes provenance ambiguous
|
||||||
|
|
||||||
|
Direct writes + metadata is simpler and safer.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Logging Requirements
|
||||||
|
|
||||||
|
Structured logs must emit:
|
||||||
|
|
||||||
- Deployment started
|
- Deployment started
|
||||||
- Snapshot created
|
- Snapshot created
|
||||||
@ -258,41 +249,45 @@ Structured logs must emit events for:
|
|||||||
- Snapshot restore triggered
|
- Snapshot restore triggered
|
||||||
- Deployment stabilized
|
- Deployment stabilized
|
||||||
- Recovery failed
|
- Recovery failed
|
||||||
|
- User upload received
|
||||||
|
- User upload rejected (with reason)
|
||||||
|
|
||||||
All events must be visible via agent logs and surfaced through API status endpoints.
|
All events visible via agent logs and surfaced through API status endpoints.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 11. Non-Goals (Phase 1)
|
## Operational Guarantees
|
||||||
|
|
||||||
- No multi-mod version history
|
- A broken automated mod install will not permanently brick the server
|
||||||
- No dependency resolution
|
- Most automated failures auto-recover without operator intervention
|
||||||
- No world snapshotting
|
- User-uploaded files bypass self-healing (user responsibility)
|
||||||
- No long-term snapshot retention
|
|
||||||
- No manual snapshot management UI
|
|
||||||
- No automatic dependency analysis
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 12. Operational Guarantees
|
|
||||||
|
|
||||||
This system guarantees:
|
|
||||||
|
|
||||||
- A broken mod will not permanently brick the server
|
|
||||||
- Most failures will auto-recover without operator intervention
|
|
||||||
- Deployment state will not accumulate disk artifacts
|
- Deployment state will not accumulate disk artifacts
|
||||||
- The system remains deterministic and bounded
|
- The system remains deterministic and bounded
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 13. Future Extensions (Phase 2+)
|
## Non-Goals (Phase 1)
|
||||||
|
|
||||||
|
- No multi-mod version history
|
||||||
|
- No dependency resolution
|
||||||
|
- No world snapshotting
|
||||||
|
- No long-term snapshot retention
|
||||||
|
- No manual snapshot UI
|
||||||
|
- No automatic dependency analysis
|
||||||
|
- No virus/malware scanning of user uploads
|
||||||
|
- No retention policy for `mods-removed/`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Future Extensions (Phase 2+)
|
||||||
|
|
||||||
- Snapshot retention policies
|
- Snapshot retention policies
|
||||||
- Manual snapshot restore via UI
|
- Manual snapshot restore via UI
|
||||||
- Multi-mod change grouping
|
- Multi-mod change grouping
|
||||||
- Deployment audit history
|
- Deployment audit history
|
||||||
- Version diffing
|
- Version diffing
|
||||||
- Dependency resolution
|
- Provenance for Modrinth installs
|
||||||
|
- Retention automation for `mods-removed/`
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -300,31 +295,13 @@ This system guarantees:
|
|||||||
|
|
||||||
| Decision | Value |
|
| Decision | Value |
|
||||||
|----------|-------|
|
|----------|-------|
|
||||||
| Shadow copies | Single, file-level only |
|
| Upload model | Direct runtime write, atomic via `os.Rename()` |
|
||||||
| Snapshots | Single, temporary only |
|
| Staging | None |
|
||||||
|
| Symlinks | None |
|
||||||
|
| Shadow copies | Single, automated installs only |
|
||||||
|
| Snapshots | Single, temporary, automated installs only |
|
||||||
| Escalation model | File rollback → snapshot restore → FAILED_RECOVERY |
|
| Escalation model | File rollback → snapshot restore → FAILED_RECOVERY |
|
||||||
| World data | Excluded, never touched |
|
| World data | Excluded, never touched |
|
||||||
| Stabilization window | Fixed, hardcoded Phase 1 |
|
| Stabilization window | Fixed, hardcoded Phase 1 |
|
||||||
| Autonomy | Agent-level, no operator required |
|
| Provenance | `.zlh_metadata.json` at runtime root |
|
||||||
|
| User upload self-healing | Not applicable — user responsibility |
|
||||||
---
|
|
||||||
|
|
||||||
## Current Mod Lifecycle Model (2026-02)
|
|
||||||
|
|
||||||
This section reflects the **implemented Phase 1 state** as of February 2026.
|
|
||||||
|
|
||||||
**Install:**
|
|
||||||
Modrinth resolver → API → Agent → Verified download → `<serverRoot>/mods`
|
|
||||||
|
|
||||||
**Enable/Disable:**
|
|
||||||
Filesystem rename: `.jar` ↔ `.jar.disabled`
|
|
||||||
|
|
||||||
**Delete:**
|
|
||||||
Soft delete to `<serverRoot>/mods-removed` (no auto-purge, no retention policy)
|
|
||||||
|
|
||||||
**Filesystem is canonical state.**
|
|
||||||
Agent cache invalidated after every mutation.
|
|
||||||
|
|
||||||
Restore of deleted mods handled manually via file browser (planned — see `OPEN_THREADS.md`).
|
|
||||||
|
|
||||||
No retention policy implemented. No install queue. No DB tracking of mod state.
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user