Integrated Portal/API/Agent smoke test plan #4

New Issue

jester · 2026-05-01T14:17:57Z

jester commented

2026-05-01 14:17:57 +00:00

Goal

Run the remaining launch validation as an integrated system smoke test rather than isolated repo checks.

Portal, API, and Agent generally work together, so testing one repo alone can produce misleading failures. For example, Portal lifecycle checks can be blocked without auth/test credentials, API probes can disagree with the actual Portal route contract, and Agent handlers may return handler-level errors unless invoked through the API with the right ownership and token context.

Source repos

jester/zpack-portal
jester/zpack-api
jester/zlh-agent

Current code state

All known code-side changes are committed.

Portal

Branding work is complete and related issues were closed.
Server create endpoint regression/fix has been committed.
Portal smoke issue remains open because authenticated lifecycle verification is still needed.
Game and dev servers have been observed visible/ready in Portal screenshots.
Remaining Portal follow-ups:
- authenticated lifecycle verification
- console WebSocket unauth probe timeout
- console WebSocket session token in query string

API

Creation concern is fixed/superseded by successful game/dev creation verification.
Remaining API deviations:
- duplicate provisioning is not truly blocked/idempotent
- failed provisioning after DB persistence can leave stale DB rows

Agent

Fail-closed ZLH_AGENT_TOKEN auth is implemented and verified.
Remaining Agent deviations:
- automatic rollback after restore/start failure is not implemented
- process reattachment after Agent restart is not implemented

Integrated smoke sequence

Use a real test user or test bearer token and run these through the actual deployed path:

Login/auth
- Verify Portal login/refresh works.
- Capture/confirm valid test bearer token.
Create game server
- Create a Minecraft/game server through Portal.
- Confirm API creates/provisions successfully.
- Confirm Agent receives config via authenticated API-to-Agent call.
- Confirm Portal shows Ready/Host Online/Connect hostname only when appropriate.
Game operations
- Open console via Portal.
- Confirm WebSocket works for authenticated owner.
- Confirm unauth/non-owner console fails cleanly.
- Upload/download/list files.
- Install/remove mod or datapack where applicable.
- Create backup.
- Restore backup.
- Delete backup.
- Start/stop/restart server.
- Delete server.
- Confirm DB, Proxmox/LXC, DNS/edge/Velocity cleanup.
Create dev server
- Create a dev container through Portal.
- Confirm API/Agent provisioning succeeds.
- Open hosted IDE/code-server.
- Stop/restart code-server.
- Stop/restart/delete dev container.
Negative/security checks
- Direct Agent protected route without token returns 401.
- API internal routes without internal token fail closed.
- Non-owner user cannot access another user's server/files/backups/console.
- Portal/browser does not expose ZLH_AGENT_TOKEN, internal token, Prometheus token, Velocity shared secret, or infra credentials.
- Console WebSocket unauth behavior should fail cleanly, not hang.
Provisioning consistency checks
- Attempt duplicate create/retry and verify it does not create duplicate LXCs.
- Simulate or force a provisioning failure after DB persistence and verify row is marked failed or cleaned.

Output format

For each step record:

PASS / FAIL / BLOCKED
repo(s) involved
endpoint/action
observed response/status
created VMID/server id if applicable
cleanup result
follow-up issue link if needed

Success criteria

One full game lifecycle passes from Portal to API to Agent and cleanup.
One full dev lifecycle passes from Portal to API to Agent and cleanup.
Internal auth boundaries remain fail-closed.
No duplicate/stale provisioning artifacts remain after test.
Any remaining deviations are explicitly accepted as follow-up hardening rather than launch blockers.

## Goal Run the remaining launch validation as an integrated system smoke test rather than isolated repo checks. Portal, API, and Agent generally work together, so testing one repo alone can produce misleading failures. For example, Portal lifecycle checks can be blocked without auth/test credentials, API probes can disagree with the actual Portal route contract, and Agent handlers may return handler-level errors unless invoked through the API with the right ownership and token context. ## Source repos - `jester/zpack-portal` - `jester/zpack-api` - `jester/zlh-agent` ## Current code state All known code-side changes are committed. ### Portal - Branding work is complete and related issues were closed. - Server create endpoint regression/fix has been committed. - Portal smoke issue remains open because authenticated lifecycle verification is still needed. - Game and dev servers have been observed visible/ready in Portal screenshots. - Remaining Portal follow-ups: - authenticated lifecycle verification - console WebSocket unauth probe timeout - console WebSocket session token in query string ### API - Creation concern is fixed/superseded by successful game/dev creation verification. - Remaining API deviations: - duplicate provisioning is not truly blocked/idempotent - failed provisioning after DB persistence can leave stale DB rows ### Agent - Fail-closed `ZLH_AGENT_TOKEN` auth is implemented and verified. - Remaining Agent deviations: - automatic rollback after restore/start failure is not implemented - process reattachment after Agent restart is not implemented ## Integrated smoke sequence Use a real test user or test bearer token and run these through the actual deployed path: 1. Login/auth - Verify Portal login/refresh works. - Capture/confirm valid test bearer token. 2. Create game server - Create a Minecraft/game server through Portal. - Confirm API creates/provisions successfully. - Confirm Agent receives config via authenticated API-to-Agent call. - Confirm Portal shows Ready/Host Online/Connect hostname only when appropriate. 3. Game operations - Open console via Portal. - Confirm WebSocket works for authenticated owner. - Confirm unauth/non-owner console fails cleanly. - Upload/download/list files. - Install/remove mod or datapack where applicable. - Create backup. - Restore backup. - Delete backup. - Start/stop/restart server. - Delete server. - Confirm DB, Proxmox/LXC, DNS/edge/Velocity cleanup. 4. Create dev server - Create a dev container through Portal. - Confirm API/Agent provisioning succeeds. - Open hosted IDE/code-server. - Stop/restart code-server. - Stop/restart/delete dev container. 5. Negative/security checks - Direct Agent protected route without token returns `401`. - API internal routes without internal token fail closed. - Non-owner user cannot access another user's server/files/backups/console. - Portal/browser does not expose `ZLH_AGENT_TOKEN`, internal token, Prometheus token, Velocity shared secret, or infra credentials. - Console WebSocket unauth behavior should fail cleanly, not hang. 6. Provisioning consistency checks - Attempt duplicate create/retry and verify it does not create duplicate LXCs. - Simulate or force a provisioning failure after DB persistence and verify row is marked failed or cleaned. ## Output format For each step record: - PASS / FAIL / BLOCKED - repo(s) involved - endpoint/action - observed response/status - created VMID/server id if applicable - cleanup result - follow-up issue link if needed ## Success criteria - One full game lifecycle passes from Portal to API to Agent and cleanup. - One full dev lifecycle passes from Portal to API to Agent and cleanup. - Internal auth boundaries remain fail-closed. - No duplicate/stale provisioning artifacts remain after test. - Any remaining deviations are explicitly accepted as follow-up hardening rather than launch blockers.

Sign in to join this conversation.

No Label

No Milestone

No project

No Assignees

1 Participants

Notifications

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: jester/zlh-grind#4