308 lines
6.9 KiB
Markdown
308 lines
6.9 KiB
Markdown
# Mod Deployment Safety Model
|
||
|
||
**Version:** 2.0
|
||
**Updated:** 2026-03-01
|
||
**Phase:** Phase 1 — Active
|
||
**Applies To:** Game Containers
|
||
**Owner:** Agent Layer
|
||
|
||
---
|
||
|
||
## Goal
|
||
|
||
Allow user uploads and automated mod installs while:
|
||
|
||
- Preventing path escape
|
||
- Preserving runtime integrity
|
||
- Tracking provenance
|
||
- Avoiding hidden deployment layers
|
||
|
||
---
|
||
|
||
## Deployment Types
|
||
|
||
| Type | Path | Extension |
|
||
|------|------|-----------|
|
||
| Mod (Modrinth) | `mods/<file>.jar` | `.jar` |
|
||
| Mod (user upload) | `mods/<file>.jar` | `.jar` |
|
||
| Datapack | `world/datapacks/<file>.zip` | `.zip` |
|
||
|
||
---
|
||
|
||
## Direct Runtime Writes
|
||
|
||
Uploads and automated installs are written directly into runtime directories.
|
||
|
||
No:
|
||
- Staging
|
||
- Symlinking
|
||
- Delayed deployment
|
||
|
||
---
|
||
|
||
## Atomic Write Process
|
||
|
||
1. Create temp file in target directory
|
||
2. Stream multipart body (or downloaded artifact) into temp file
|
||
3. Enforce size limit while streaming
|
||
4. `os.Rename()` temp → final filename
|
||
5. Update `.zlh_metadata.json`
|
||
|
||
If `overwrite=false` and file exists → `409`
|
||
|
||
---
|
||
|
||
## Size Limits
|
||
|
||
Configured in agent:
|
||
|
||
| Type | Limit |
|
||
|------|-------|
|
||
| Mods | 250MB |
|
||
| Datapacks | 100MB |
|
||
|
||
Enforced server-side during streaming. Not at the API layer.
|
||
|
||
---
|
||
|
||
## Overwrite Behavior
|
||
|
||
- `overwrite=false` (default) → reject if file exists → `409`
|
||
- `overwrite=true` → replace file, update `uploaded_at` in metadata
|
||
|
||
---
|
||
|
||
## Agent-Side Guarantees
|
||
|
||
The agent guarantees:
|
||
|
||
- Final path is a file (not directory)
|
||
- Target is not a symlink
|
||
- Path resolves within runtime root
|
||
- Parent directory exists
|
||
- `.zlh_metadata.json` is hidden from file listing API
|
||
- `.zlh-shadow` is hidden from file listing API
|
||
|
||
---
|
||
|
||
## Provenance Metadata
|
||
|
||
All user uploads are marked `source: "user"` in `.zlh_metadata.json`.
|
||
|
||
```json
|
||
{
|
||
"mods/sodium.jar": {
|
||
"source": "user",
|
||
"uploaded_at": "2026-03-01T22:37:01Z"
|
||
}
|
||
}
|
||
```
|
||
|
||
Modrinth-installed mods do not currently write provenance. Future curated installs may optionally write `source: "curated"`. Not currently implemented.
|
||
|
||
No automatic inference of source from filename or path.
|
||
|
||
---
|
||
|
||
## Failure Categories
|
||
|
||
| Code | Meaning |
|
||
|------|---------|
|
||
| `409` | File exists (`overwrite=false`) |
|
||
| `413` | Size limit exceeded |
|
||
| `403` | Path rejected by allowlist |
|
||
| `502` | API transport failure (not agent validation) |
|
||
|
||
---
|
||
|
||
## API Transport Considerations
|
||
|
||
The API acts strictly as a streaming proxy for uploads.
|
||
|
||
```js
|
||
req.pipe(proxyReq)
|
||
proxyRes.pipe(res)
|
||
```
|
||
|
||
- Uses raw `http.request` piping — not `fetch`
|
||
- Does not buffer file contents
|
||
- Does not re-validate upload policy
|
||
- Upload timeout must be substantially larger than normal routes
|
||
|
||
---
|
||
|
||
## Mod Lifecycle (Current Phase 1 State)
|
||
|
||
**Install (Modrinth):**
|
||
Modrinth resolver → API → Agent → Verified download → `<serverRoot>/mods`
|
||
|
||
**Install (User Upload):**
|
||
Portal → API (streaming proxy) → Agent → Atomic write → `<serverRoot>/mods`
|
||
|
||
**Enable/Disable:**
|
||
Filesystem rename: `.jar` ↔ `.jar.disabled`
|
||
|
||
**Delete:**
|
||
Soft delete to `<serverRoot>/mods-removed` (no auto-purge, no retention policy)
|
||
|
||
Filesystem is canonical state. Agent cache invalidated after every mutation.
|
||
|
||
Restore of deleted mods handled manually via file browser (see `OPEN_THREADS.md`).
|
||
|
||
---
|
||
|
||
## Self-Healing Model (Automated Installs)
|
||
|
||
Applies to Modrinth installs and dev artifact promotions. Does **not** apply to user uploads — users assume responsibility for files they upload directly.
|
||
|
||
### Deployment States
|
||
|
||
```
|
||
IDLE → DEPLOYING → STABILIZING → STABLE → IDLE
|
||
↓
|
||
ROLLBACK_FILE → ROLLBACK_SNAPSHOT → FAILED_RECOVERY
|
||
```
|
||
|
||
### Snapshot Scope
|
||
|
||
Included:
|
||
- `mods/`
|
||
- `config/`
|
||
- `server.properties`
|
||
|
||
Excluded:
|
||
- `world/` (separate backup system)
|
||
- `logs/`
|
||
- Cache and temp files
|
||
|
||
### Stabilization Window
|
||
|
||
Hardcoded Phase 1: **3–5 minutes**
|
||
|
||
Server is stable when:
|
||
- Readiness probe succeeds
|
||
- No crash restart during the window
|
||
- Server runs continuously for full window duration
|
||
|
||
### Failure Triggers
|
||
|
||
| Trigger | Action |
|
||
|---------|--------|
|
||
| Early boot crash (exit within ~30s) | → `ROLLBACK_FILE` |
|
||
| Crash loop (≥3 crashes in window) | → `ROLLBACK_SNAPSHOT` |
|
||
| Readiness timeout (full window) | → `ROLLBACK_SNAPSHOT` |
|
||
|
||
### Recovery
|
||
|
||
**File rollback:** Restore `.jar.shadow` → `.jar`, monitor again. If stable → `IDLE`. If not → escalate.
|
||
|
||
**Snapshot restore:** Extract `.zlh_snapshots/deploy-<timestamp>.tar.gz`, monitor. If stable → `IDLE`. If not → `FAILED_RECOVERY`.
|
||
|
||
### Safety Constraints
|
||
|
||
- Only one file rollback attempt
|
||
- Only one snapshot restore attempt
|
||
- Both fail → `FAILED_RECOVERY`, stop automation, surface to API + UI
|
||
- World data never touched
|
||
- Only one active snapshot at a time
|
||
|
||
---
|
||
|
||
## Metadata Tracking (Automated Installs)
|
||
|
||
Agent-level fields, cleared after stabilization success:
|
||
|
||
```
|
||
lastChangedMod
|
||
lastChangeTimestamp
|
||
lastChangeSource # modrinth | dev
|
||
deploymentState
|
||
crashCount
|
||
snapshotId
|
||
```
|
||
|
||
---
|
||
|
||
## Why No Symlinks
|
||
|
||
Symlink-based deployment was rejected because:
|
||
|
||
- Complicates mod loader behavior
|
||
- Breaks server-side loader compatibility
|
||
- Introduces unexpected runtime indirection
|
||
- Makes provenance ambiguous
|
||
|
||
Direct writes + metadata is simpler and safer.
|
||
|
||
---
|
||
|
||
## Logging Requirements
|
||
|
||
Structured logs must emit:
|
||
|
||
- Deployment started
|
||
- Snapshot created
|
||
- Shadow created
|
||
- Stabilization started
|
||
- Crash detected
|
||
- File rollback triggered
|
||
- Snapshot restore triggered
|
||
- Deployment stabilized
|
||
- Recovery failed
|
||
- User upload received
|
||
- User upload rejected (with reason)
|
||
|
||
All events visible via agent logs and surfaced through API status endpoints.
|
||
|
||
---
|
||
|
||
## Operational Guarantees
|
||
|
||
- A broken automated mod install will not permanently brick the server
|
||
- Most automated failures auto-recover without operator intervention
|
||
- User-uploaded files bypass self-healing (user responsibility)
|
||
- Deployment state will not accumulate disk artifacts
|
||
- The system remains deterministic and bounded
|
||
|
||
---
|
||
|
||
## Non-Goals (Phase 1)
|
||
|
||
- No multi-mod version history
|
||
- No dependency resolution
|
||
- No world snapshotting
|
||
- No long-term snapshot retention
|
||
- No manual snapshot UI
|
||
- No automatic dependency analysis
|
||
- No virus/malware scanning of user uploads
|
||
- No retention policy for `mods-removed/`
|
||
|
||
---
|
||
|
||
## Future Extensions (Phase 2+)
|
||
|
||
- Snapshot retention policies
|
||
- Manual snapshot restore via UI
|
||
- Multi-mod change grouping
|
||
- Deployment audit history
|
||
- Version diffing
|
||
- Provenance for Modrinth installs
|
||
- Retention automation for `mods-removed/`
|
||
|
||
---
|
||
|
||
## Final Decision Lock
|
||
|
||
| Decision | Value |
|
||
|----------|-------|
|
||
| Upload model | Direct runtime write, atomic via `os.Rename()` |
|
||
| Staging | None |
|
||
| Symlinks | None |
|
||
| Shadow copies | Single, automated installs only |
|
||
| Snapshots | Single, temporary, automated installs only |
|
||
| Escalation model | File rollback → snapshot restore → FAILED_RECOVERY |
|
||
| World data | Excluded, never touched |
|
||
| Stabilization window | Fixed, hardcoded Phase 1 |
|
||
| Provenance | `.zlh_metadata.json` at runtime root |
|
||
| User upload self-healing | Not applicable — user responsibility |
|