Add launch autonomy billing support update summary
This commit is contained in:
parent
b77f68688d
commit
6eba66e317
@ -0,0 +1,245 @@
|
||||
# ZLH Session Summary — Launch Autonomy, Billing, and Support Update
|
||||
|
||||
Date: 2026-05-03
|
||||
|
||||
## Summary
|
||||
|
||||
This session moved several major launch blockers from design/risk into implemented and validated status. The biggest completed areas were asynchronous provisioning, controller/reconciler foundation, billing enforcement, and the support ticket path.
|
||||
|
||||
## Completed / Validated
|
||||
|
||||
### Provisioning worker / async create
|
||||
|
||||
Status: launch-ready.
|
||||
|
||||
Implemented and validated:
|
||||
|
||||
```text
|
||||
POST /api/instances now creates/reuses a durable ProvisioningOperation
|
||||
API returns 202 Accepted quickly
|
||||
BullMQ provisioning worker consumes jobs from provisioning queue
|
||||
zpack-provision-worker.service installed and running under systemd
|
||||
Portal async pending cards show queued/running phases and replace with real server cards
|
||||
Game provisioning through worker validated
|
||||
Dev provisioning through worker validated
|
||||
API teardown still works for worker-created servers
|
||||
Duplicate/idempotency guards validated
|
||||
Controlled failure handling validated
|
||||
```
|
||||
|
||||
Important behavior:
|
||||
|
||||
```text
|
||||
Provisioning is no longer run inside the HTTP request lifecycle.
|
||||
Portal sends Idempotency-Key.
|
||||
Operation state is pollable.
|
||||
Worker concurrency remains 1.
|
||||
Unsafe automatic retries remain disabled.
|
||||
```
|
||||
|
||||
### Controller / reconciler foundation
|
||||
|
||||
Status: implemented, validated, currently conservative.
|
||||
|
||||
Implemented and validated:
|
||||
|
||||
```text
|
||||
zlh-controller.service exists as singleton controller/reconciler with Redis lock
|
||||
zpack-repair-worker.service handles Level 1 repair jobs
|
||||
Discord notifications wired
|
||||
RepairEvent persistence added
|
||||
clear_stale_operation_lock validated
|
||||
live Cloudflare SRV drift detection validated
|
||||
edge_republish restored deleted Cloudflare SRV record through existing edge publish path
|
||||
Level 2 and Level 3 repairs remain disabled
|
||||
```
|
||||
|
||||
Current operating posture:
|
||||
|
||||
```text
|
||||
Controller is expected to remain in dry-run unless deliberately enabling Level 1 repairs.
|
||||
Repair worker is live.
|
||||
Level 1 repair path is proven.
|
||||
No destructive repairs are automatic.
|
||||
```
|
||||
|
||||
### Billing enforcement / overdue handling
|
||||
|
||||
Status: backend launch-ready.
|
||||
|
||||
Implemented and validated:
|
||||
|
||||
```text
|
||||
BillingEnforcementState
|
||||
BillingEnforcementEvent
|
||||
StripeEventLog
|
||||
Stripe event idempotency
|
||||
payment_failed warning flow
|
||||
final warning / backup block state
|
||||
suspension / shutdown state
|
||||
payment restored flow
|
||||
API billing gates while suspended
|
||||
controller does not repair suspended game servers
|
||||
billing worker installed and running under systemd
|
||||
billing announcements visible in Portal
|
||||
```
|
||||
|
||||
Service:
|
||||
|
||||
```text
|
||||
zpack-billing-worker.service installed and clean under systemd
|
||||
```
|
||||
|
||||
Safety guarantees validated:
|
||||
|
||||
```text
|
||||
No customer data deleted
|
||||
No backups deleted
|
||||
No DNS records deleted
|
||||
No Velocity records deleted
|
||||
No containers deleted
|
||||
Destructive billing actions are rejected and audited
|
||||
Suspended servers are not repaired back to connectable/running state
|
||||
```
|
||||
|
||||
Remaining billing follow-ups are fixture validation only:
|
||||
|
||||
```text
|
||||
File read/list against a responsive Agent
|
||||
Backup mutation route validation with a game backup fixture
|
||||
```
|
||||
|
||||
### Support ticket path
|
||||
|
||||
Status: launch-ready.
|
||||
|
||||
Implemented and validated:
|
||||
|
||||
```text
|
||||
POST /api/support/create exists
|
||||
SupportTicket DB model and migration added
|
||||
Human-readable ticket number: ZLH-YYYYMMDD-XXXX
|
||||
Portal form submits successfully
|
||||
Customer acknowledgement email received
|
||||
Discord #support alert received
|
||||
SupportTicket DB row created
|
||||
```
|
||||
|
||||
Post-launch enhancements only:
|
||||
|
||||
```text
|
||||
Admin ticket list/view
|
||||
Support triage diagnostics
|
||||
Self-hosted helpdesk integration
|
||||
Inbound email reply parsing
|
||||
Attachments
|
||||
```
|
||||
|
||||
## Current launch service set
|
||||
|
||||
```text
|
||||
zpack-api.service
|
||||
zpack-provision-worker.service
|
||||
zpack-repair-worker.service
|
||||
zlh-controller.service
|
||||
zpack-billing-worker.service
|
||||
```
|
||||
|
||||
Launch guardrail:
|
||||
|
||||
```text
|
||||
Do not add more worker/systemd services before launch unless there is a strong safety-boundary reason.
|
||||
```
|
||||
|
||||
## Remaining launch-active work
|
||||
|
||||
### Portal terminal reliability
|
||||
|
||||
Issue: console can hang at Connecting.
|
||||
|
||||
Required:
|
||||
|
||||
```text
|
||||
WebSocket connect timeout
|
||||
Error path clears socket refs
|
||||
isStreaming resets on closed/error/idle
|
||||
Button recovers to Open Console/Reconnect
|
||||
Validate console still works after fix
|
||||
```
|
||||
|
||||
### Monitoring / observability readiness
|
||||
|
||||
Still a major infrastructure item.
|
||||
|
||||
Remaining:
|
||||
|
||||
```text
|
||||
Restrict Prometheus/Grafana/node_exporter exposure
|
||||
Fix game/dev discovery sync
|
||||
Remove stale file_sd targets
|
||||
Install Grafana dashboards
|
||||
Add API health/app scrape
|
||||
Add lifecycle visibility
|
||||
Add or explicitly defer centralized logs/Loki
|
||||
Tighten monitoring token storage
|
||||
Add/verify queue staleness visibility for provisioning, repair, billing_enforcement
|
||||
```
|
||||
|
||||
### Patch management / maintenance window policy
|
||||
|
||||
Needs written policy/runbook:
|
||||
|
||||
```text
|
||||
Normal maintenance window cadence/timezone
|
||||
Emergency maintenance behavior
|
||||
Customer notification expectations
|
||||
Patch order and rollback expectations
|
||||
```
|
||||
|
||||
### Notepad / messaging retest
|
||||
|
||||
Announcements are validated for billing and support context. Still validate:
|
||||
|
||||
```text
|
||||
notepad/notes load and save
|
||||
persistence after reload/login
|
||||
permissions
|
||||
empty/error states
|
||||
```
|
||||
|
||||
### Final integrated smoke test
|
||||
|
||||
Run after the above launch blockers are clean:
|
||||
|
||||
```text
|
||||
Game lifecycle: create -> ready/connectable -> console -> files -> backup -> restore -> delete -> DNS/Velocity/Cloudflare cleanup
|
||||
Dev lifecycle: create -> hosted IDE -> stop/restart/delete -> cleanup
|
||||
Security: Agent auth fail-closed, non-owner blocked, browser does not expose internal secrets
|
||||
```
|
||||
|
||||
## Issues likely ready to close or supersede
|
||||
|
||||
```text
|
||||
#9 Support email/ticket path — resolved
|
||||
#11 Provisioning worker / async create — resolved
|
||||
#10 Multi-create modal confusion — likely resolved by async inline cards; quick two-create validation or close as covered by #11
|
||||
#6 Non-payment grace flow — superseded by #14 billing enforcement
|
||||
```
|
||||
|
||||
## Issues still active
|
||||
|
||||
```text
|
||||
#12 Portal terminal reliability
|
||||
#5 Monitoring / observability readiness
|
||||
#7 Patch management / maintenance window policy
|
||||
#8 Notepad / announcements / messaging retest
|
||||
#4 Integrated Portal/API/Agent smoke test
|
||||
#13 Controller/reconciler — keep dry-run soak / decide Level 1 live posture
|
||||
#14 Billing enforcement — core resolved; minor fixture validation remains
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- Support is launch-ready with ZLH-native DB ticket + customer email + Discord alert.
|
||||
- A self-hosted helpdesk such as FreeScout/Zammad can be considered post-launch, but ZLH should keep its SupportTicket intake/audit record either way.
|
||||
- Controller should not parse support ticket text or auto-run repairs from free text at launch. Post-launch support triage may add tags, read-only diagnostics, and suggested actions.
|
||||
Loading…
Reference in New Issue
Block a user