Update DEV_CONTAINER_SPEC — remove Traefik model, document API proxy + Headscale access

This commit is contained in:
jester 2026-03-15 22:59:20 +00:00
parent d128d92d15
commit 1c28ecbb7c

View File

@ -1,18 +1,22 @@
# ZeroLagHub Developer Container Architecture
This document describes how developer containers are provisioned and managed by the zlh-agent.
It is intended for engineers and AI assistants working on the ZeroLagHub platform.
This document describes how developer containers are provisioned and accessed.
---
## Overview
Developer containers provide ephemeral development environments for building and testing software.
Developer containers provide ephemeral development environments.
They support multiple runtimes and optional development tooling.
Provisioning is performed by the **zlh-agent** using runtime artifacts.
Provisioning is performed by the zlh-agent using an artifact-driven runtime system.
Supported runtimes:
- node
- python
- go
- java
- dotnet
---
@ -20,135 +24,41 @@ Provisioning is performed by the zlh-agent using an artifact-driven runtime syst
Provisioning flow:
1. Portal sends dev container request
2. API builds agent provisioning payload
3. Agent validates request
4. Agent creates dev environment
5. Agent installs runtime from artifact server
6. Agent optionally installs addons
7. Agent marks container ready
8. API applies dev routing if code-server is enabled
1. Portal sends dev request
2. API builds provisioning payload
3. Agent installs runtime
4. Agent installs addons
5. Agent marks container ready
High-level architecture:
Architecture:
```
Portal
zlh-api
zpack-api
zlh-agent
Artifact Server
```
If `enable_code_server=true`, the API additionally performs:
- Cloudflare DNS record creation
- Technitium DNS record creation
- Traefik dynamic config write on `zlh-zpack-proxy`
This routing path is additive and does not modify the game publish flow.
---
## Dev Provisioning Payload
The API sends configuration to the agent via:
## Dev Environment
```
POST http://<agent-ip>:18888/config
user: dev
home: /home/dev
workspace: /home/dev/workspace
```
Dev payload example:
```json
{
"container_type": "dev",
"runtime": "node",
"version": "22",
"memory_mb": 2048,
"enable_code_server": true
}
```
Fields:
| Field | Description |
|-------|-------------|
| `container_type` | must be `"dev"` |
| `runtime` | runtime id |
| `version` | runtime version |
| `memory_mb` | container memory |
| `enable_code_server` | optional addon |
---
## Runtime Catalog
All dev runtimes are defined by the artifact server catalog.
Catalog location:
```
http://<artifact-server>/devcontainer/_catalog.json
```
Example catalog:
```json
{
"runtimes": [
{ "id": "go", "versions": ["1.22", "1.25"] },
{ "id": "java", "versions": ["17", "19", "21"] },
{ "id": "node", "versions": ["20", "22", "24"] },
{ "id": "python", "versions": ["3.10", "3.11", "3.12", "3.13"] },
{ "id": "dotnet", "versions": ["8.0", "10.0"] }
]
}
```
The agent validates runtime/version against this catalog before installation.
Invalid combinations cause provisioning to fail.
---
## Artifact Server Layout
Dev runtime artifacts:
```
devcontainer/
_catalog.json
go/
node/
python/
java/
dotnet/
```
Example runtime artifact:
```
devcontainer/node/22/node-22.tar.xz
```
Addon artifacts:
```
addons/
code-server/
```
Artifacts are downloaded at provisioning time.
Nothing is preinstalled inside containers.
Console sessions run as the dev user.
---
## Runtime Installation
Runtime install path:
Runtime path:
```
/opt/zlh/runtimes/<runtime>/<version>
@ -167,110 +77,19 @@ Install guards prevent reinstall — if the directory already exists, installati
---
## Dev Environment
Every dev container has a dedicated development user.
```
user: dev
home: /home/dev
workspace: /home/dev/workspace
ownership: dev:dev
```
The workspace is where developers store source code.
---
## Console Behavior
Dev console sessions run as the dev user.
Shell properties:
```
user: dev
cwd: /home/dev/workspace
HOME=/home/dev
TERM=xterm-256color
```
This prevents root access in development environments.
---
## File System Access
Dev containers expose a file browser rooted at:
```
/home/dev/workspace
```
Portal displays this as `workspace/`.
Uploads and file operations are restricted to this directory.
Dev containers have unrestricted read/write access inside `/home/dev/workspace`. No allowlist. The only hard rule is the root sandbox — nothing can escape the workspace.
---
## Dotnet Runtime
Dotnet uses the official installer script.
Installer source:
```
http://artifact-server/devcontainer/dotnet/dotnet-install.sh
```
Installation:
```bash
./dotnet-install.sh --channel 8.0
```
Installed to:
```
/opt/zlh/runtimes/dotnet/8.0
```
Supported channels: `8.0`, `10.0`
---
## Code Server Addon
Code-server provides a browser IDE.
Code-server provides browser IDE access.
Enabled via provisioning flag:
```json
"enable_code_server": true
```
Artifact location:
```
http://artifact-server/addons/code-server/code-server.zip
```
Installed to:
Install location:
```
/opt/zlh/services/code-server
```
Launch behavior:
Port: `6000`
- process runs inside the container
- binds to `0.0.0.0:6000`
- workspace root is `/home/dev/workspace`
- current auth mode observed in runtime is password-based
Observed process shape:
Observed process:
```bash
/opt/zlh/services/code-server/lib/node /opt/zlh/services/code-server \
@ -281,62 +100,100 @@ Observed process shape:
---
## Dev Routing
## Dev IDE Access Model
When code-server is enabled, the API creates a dev-only routing path.
Current implementation:
- creates Technitium A record for `dev-<vmid>.<domain>`
- creates Cloudflare A record for `dev-<vmid>.<domain>`
- writes Traefik dynamic config on the proxy VM via SSH service account `zlh`
- dynamic file path: `/etc/traefik/dynamic/dev-<vmid>.yml`
Current backend target model:
```
Host(`dev-<vmid>.<domain>`)
→ Traefik (websecure)
→ http://<container-ip>:6000
```
Status: routing generation is implemented, but external browser access remains under active validation.
The previous model using Cloudflare DNS, Traefik, and per-container subdomains has been removed.
---
## Agent Status Model
### Browser IDE Access (Primary)
Status delivery model is unchanged:
IDE is accessed through the API proxy.
- API polls agent `/status`
- agent does not push state to API
```
Browser
Portal
API
container:6000
```
Status content now includes dev/container fields:
URL format: `/dev/<vmid>/ide`
- `workspaceRoot`
- `serverRoot`
- `runtimeInstallPath`
- `runtimeInstalled`
- `devProvisioned`
- `devReadyAt`
- `codeServerInstalled`
- `codeServerRunning`
- `lastCrashClassification`
code-server launch flags required:
The API now exposes this polled state back to the frontend through a server status endpoint so console and host-state UI can update correctly.
```
--base-path /dev/<vmid>/ide --auth none
```
Portal JWT authentication gates access. The API verifies container ownership before proxying.
WebSocket support is mandatory — code-server is heavily WebSocket-based:
- `http-proxy-middleware` with `ws: true`
- `server.on('upgrade', proxy.upgrade)` must be wired up
---
### Local Development Access (Advanced Users)
Advanced users may connect via **Headscale/Tailscale**.
Benefits:
- SSH
- VS Code Remote
- full local tooling
Implementation:
- dev container installs `tailscaled` as an addon
- API generates Headscale auth key on provisioning
- customer joins tailnet once, gets stable container IP
Restrictions:
- no exit nodes
- `magic_dns: false`
- no DNS takeover on customer machine
Headscale server: `zlh-ctl` (status to be confirmed)
---
## Security Model
Dev containers are isolated LXC containers.
Security controls:
- Runtime installs limited to `/opt/zlh`
- File browser limited to workspace
- Portal authentication controls IDE access
- API verifies container ownership before proxying
- Containers are never exposed directly to the public internet
- Shell runs as non-root `dev` user
- Artifacts fetched only from trusted artifact server
- File browser limited to workspace root
---
## File Browser
Workspace root: `/home/dev/workspace`
Portal displays this as `workspace/`.
Uploads cannot escape this directory.
---
## Agent Status
Status is polled by API via `/status`. Agent does not push state.
Dev fields in `/status`:
- `workspaceRoot`
- `runtimeInstallPath`
- `runtimeInstalled`
- `codeServerInstalled`
- `codeServerRunning`
---
@ -347,6 +204,7 @@ Security controls:
3. Runtime catalog is the source of truth
4. Installs must be idempotent
5. Containers must remain reproducible
6. Dev services are never publicly exposed directly
---
@ -355,12 +213,5 @@ Security controls:
- Runtime checksum validation
- Runtime upgrades / removal
- Artifact metadata support
- Service port auto-detection
- Dev service routing / proxy exposure
- IDE launch integration from portal
---
## Summary
Developer containers in ZeroLagHub provide isolated development environments with multiple runtime support, artifact-driven installs, optional browser IDE on port 6000, and consistent reproducible provisioning.
- Tailscale addon implementation
- Headscale auth key portal UI