Add self-healing as core design principle - solo operator requirement

This commit is contained in:
jester 2026-04-07 22:08:33 +00:00
parent d58713bb45
commit 1bcee5f180

View File

@ -68,6 +68,29 @@ Closer to a lightweight Render, Railway, or Fly.io — but:
--- ---
## Self-Healing as a Core Design Principle
Ghost Shell is being built by a solo operator. That constraint shapes every architectural decision in the platform and must be carried forward into Ghost Shell itself.
A platform designed for solo operation must run without continuous human supervision. This means self-healing is not a feature — it is a foundational requirement of the architecture.
**The three properties every layer must have:**
1. **Self-diagnosing** — the system knows when something is wrong without a human watching
2. **Self-recovering** — common failure modes are handled automatically
3. **Loud escalation** — the system pages a human only when it genuinely cannot recover
This is the same philosophy used in industrial control systems: design for the absence of the operator, not their presence. The operator is the exception handler, not the normal execution path.
What this means for Ghost Shell specifically:
- Agent supervision and crash recovery must be generic, not game-specific
- Lifecycle state must always be recoverable from DB alone
- Any component restart must trigger automatic re-registration and re-routing
- Quota and resource enforcement must be automatic — unbounded consumption is a 3am problem
- Notifications must surface failures without requiring dashboard monitoring
---
## Strategic Direction ## Strategic Direction
**The honest answer as of April 2026: finish ZLH first, then decide.** **The honest answer as of April 2026: finish ZLH first, then decide.**
@ -104,6 +127,7 @@ For Ghost Shell to remain viable, these rules apply to ZLH development today:
2. **Keep the agent generic** — it runs any workload, not just game servers 2. **Keep the agent generic** — it runs any workload, not just game servers
3. **Keep template definitions abstract** — game types are plugins, not hardcoded paths 3. **Keep template definitions abstract** — game types are plugins, not hardcoded paths
4. **Treat application layer as pluggable** — Layer 3 must be swappable 4. **Treat application layer as pluggable** — Layer 3 must be swappable
5. **Build self-healing at every layer** — the system must operate without continuous supervision
If Minecraft becomes embedded in lifecycle logic, Ghost Shell dies. If Minecraft becomes embedded in lifecycle logic, Ghost Shell dies.
If Minecraft is a plug-in workload definition, Ghost Shell lives. If Minecraft is a plug-in workload definition, Ghost Shell lives.
@ -118,9 +142,11 @@ These constraints should be checked in every architectural review.
Phase 1 — ZLH as application layer (NOW) Phase 1 — ZLH as application layer (NOW)
Build ZLH on top of what will become Ghost Shell. Build ZLH on top of what will become Ghost Shell.
Do not abstract yet. Do not abstract yet.
Close the self-healing gaps (pre-launch blockers).
Phase 2 — Stabilize core (NEXT) Phase 2 — Stabilize core (NEXT)
ZLH reaches: stable, operational, revenue generating. ZLH reaches: stable, operational, revenue generating.
Runs without continuous supervision.
Minimal rewrite pressure. Minimal rewrite pressure.
Phase 3 — Extract core into Ghost Shell Phase 3 — Extract core into Ghost Shell
@ -138,6 +164,8 @@ Phase 4 — ZLH becomes one implementation of Ghost Shell
## Connection to Owner Background ## Connection to Owner Background
This architecture emerged naturally from controls engineering thinking applied to cloud infrastructure. The layer model above is structurally identical to how industrial control systems are organized (field devices → controllers → supervisory → operator interface). That is not accidental — it is trained instinct applied to a new domain. This architecture emerged naturally from controls engineering thinking applied to cloud infrastructure. The layer model above is structurally identical to how industrial control systems are organized (field devices → controllers → supervisory → operator interface). The self-healing requirement mirrors how industrial systems are designed: the operator is the exception handler, not the normal execution path.
That is not accidental — it is trained instinct applied to a new domain.
See `OWNER_PROFILE.md` for full context. See `OWNER_PROFILE.md` for full context.