zlh-grind/DEV_CONTAINER_SPEC.md

367 lines
6.7 KiB
Markdown

# ZeroLagHub Developer Container Architecture
This document describes how developer containers are provisioned and managed by the zlh-agent.
It is intended for engineers and AI assistants working on the ZeroLagHub platform.
---
## Overview
Developer containers provide ephemeral development environments for building and testing software.
They support multiple runtimes and optional development tooling.
Provisioning is performed by the zlh-agent using an artifact-driven runtime system.
---
## Dev Container Lifecycle
Provisioning flow:
1. Portal sends dev container request
2. API builds agent provisioning payload
3. Agent validates request
4. Agent creates dev environment
5. Agent installs runtime from artifact server
6. Agent optionally installs addons
7. Agent marks container ready
8. API applies dev routing if code-server is enabled
High-level architecture:
```
Portal
zlh-api
zlh-agent
Artifact Server
```
If `enable_code_server=true`, the API additionally performs:
- Cloudflare DNS record creation
- Technitium DNS record creation
- Traefik dynamic config write on `zlh-zpack-proxy`
This routing path is additive and does not modify the game publish flow.
---
## Dev Provisioning Payload
The API sends configuration to the agent via:
```
POST http://<agent-ip>:18888/config
```
Dev payload example:
```json
{
"container_type": "dev",
"runtime": "node",
"version": "22",
"memory_mb": 2048,
"enable_code_server": true
}
```
Fields:
| Field | Description |
|-------|-------------|
| `container_type` | must be `"dev"` |
| `runtime` | runtime id |
| `version` | runtime version |
| `memory_mb` | container memory |
| `enable_code_server` | optional addon |
---
## Runtime Catalog
All dev runtimes are defined by the artifact server catalog.
Catalog location:
```
http://<artifact-server>/devcontainer/_catalog.json
```
Example catalog:
```json
{
"runtimes": [
{ "id": "go", "versions": ["1.22", "1.25"] },
{ "id": "java", "versions": ["17", "19", "21"] },
{ "id": "node", "versions": ["20", "22", "24"] },
{ "id": "python", "versions": ["3.10", "3.11", "3.12", "3.13"] },
{ "id": "dotnet", "versions": ["8.0", "10.0"] }
]
}
```
The agent validates runtime/version against this catalog before installation.
Invalid combinations cause provisioning to fail.
---
## Artifact Server Layout
Dev runtime artifacts:
```
devcontainer/
_catalog.json
go/
node/
python/
java/
dotnet/
```
Example runtime artifact:
```
devcontainer/node/22/node-22.tar.xz
```
Addon artifacts:
```
addons/
code-server/
```
Artifacts are downloaded at provisioning time.
Nothing is preinstalled inside containers.
---
## Runtime Installation
Runtime install path:
```
/opt/zlh/runtimes/<runtime>/<version>
```
Examples:
```
/opt/zlh/runtimes/node/22
/opt/zlh/runtimes/python/3.12
/opt/zlh/runtimes/go/1.25
/opt/zlh/runtimes/dotnet/8.0
```
Install guards prevent reinstall — if the directory already exists, installation is skipped.
---
## Dev Environment
Every dev container has a dedicated development user.
```
user: dev
home: /home/dev
workspace: /home/dev/workspace
ownership: dev:dev
```
The workspace is where developers store source code.
---
## Console Behavior
Dev console sessions run as the dev user.
Shell properties:
```
user: dev
cwd: /home/dev/workspace
HOME=/home/dev
TERM=xterm-256color
```
This prevents root access in development environments.
---
## File System Access
Dev containers expose a file browser rooted at:
```
/home/dev/workspace
```
Portal displays this as `workspace/`.
Uploads and file operations are restricted to this directory.
Dev containers have unrestricted read/write access inside `/home/dev/workspace`. No allowlist. The only hard rule is the root sandbox — nothing can escape the workspace.
---
## Dotnet Runtime
Dotnet uses the official installer script.
Installer source:
```
http://artifact-server/devcontainer/dotnet/dotnet-install.sh
```
Installation:
```bash
./dotnet-install.sh --channel 8.0
```
Installed to:
```
/opt/zlh/runtimes/dotnet/8.0
```
Supported channels: `8.0`, `10.0`
---
## Code Server Addon
Code-server provides a browser IDE.
Enabled via provisioning flag:
```json
"enable_code_server": true
```
Artifact location:
```
http://artifact-server/addons/code-server/code-server.zip
```
Installed to:
```
/opt/zlh/services/code-server
```
Launch behavior:
- process runs inside the container
- binds to `0.0.0.0:6000`
- workspace root is `/home/dev/workspace`
- current auth mode observed in runtime is password-based
Observed process shape:
```bash
/opt/zlh/services/code-server/lib/node /opt/zlh/services/code-server \
--bind-addr 0.0.0.0:6000 \
--auth password \
/home/dev/workspace
```
---
## Dev Routing
When code-server is enabled, the API creates a dev-only routing path.
Current implementation:
- creates Technitium A record for `dev-<vmid>.<domain>`
- creates Cloudflare A record for `dev-<vmid>.<domain>`
- writes Traefik dynamic config on the proxy VM via SSH service account `zlh`
- dynamic file path: `/etc/traefik/dynamic/dev-<vmid>.yml`
Current backend target model:
```
Host(`dev-<vmid>.<domain>`)
→ Traefik (websecure)
→ http://<container-ip>:6000
```
Status: routing generation is implemented, but external browser access remains under active validation.
---
## Agent Status Model
Status delivery model is unchanged:
- API polls agent `/status`
- agent does not push state to API
Status content now includes dev/container fields:
- `workspaceRoot`
- `serverRoot`
- `runtimeInstallPath`
- `runtimeInstalled`
- `devProvisioned`
- `devReadyAt`
- `codeServerInstalled`
- `codeServerRunning`
- `lastCrashClassification`
The API now exposes this polled state back to the frontend through a server status endpoint so console and host-state UI can update correctly.
---
## Security Model
Dev containers are isolated LXC containers.
Security controls:
- Runtime installs limited to `/opt/zlh`
- File browser limited to workspace
- Shell runs as non-root `dev` user
- Artifacts fetched only from trusted artifact server
---
## Design Principles
1. No local runtime assumptions
2. All runtimes are artifact-driven
3. Runtime catalog is the source of truth
4. Installs must be idempotent
5. Containers must remain reproducible
---
## Future Enhancements
- Runtime checksum validation
- Runtime upgrades / removal
- Artifact metadata support
- Service port auto-detection
- Dev service routing / proxy exposure
- IDE launch integration from portal
---
## Summary
Developer containers in ZeroLagHub provide isolated development environments with multiple runtime support, artifact-driven installs, optional browser IDE on port 6000, and consistent reproducible provisioning.