# ZLH Session Summary — Launch Autonomy, Billing, and Support Update Date: 2026-05-03 ## Summary This session moved several major launch blockers from design/risk into implemented and validated status. The biggest completed areas were asynchronous provisioning, controller/reconciler foundation, billing enforcement, and the support ticket path. ## Completed / Validated ### Provisioning worker / async create Status: launch-ready. Implemented and validated: ```text POST /api/instances now creates/reuses a durable ProvisioningOperation API returns 202 Accepted quickly BullMQ provisioning worker consumes jobs from provisioning queue zpack-provision-worker.service installed and running under systemd Portal async pending cards show queued/running phases and replace with real server cards Game provisioning through worker validated Dev provisioning through worker validated API teardown still works for worker-created servers Duplicate/idempotency guards validated Controlled failure handling validated ``` Important behavior: ```text Provisioning is no longer run inside the HTTP request lifecycle. Portal sends Idempotency-Key. Operation state is pollable. Worker concurrency remains 1. Unsafe automatic retries remain disabled. ``` ### Controller / reconciler foundation Status: implemented, validated, currently conservative. Implemented and validated: ```text zlh-controller.service exists as singleton controller/reconciler with Redis lock zpack-repair-worker.service handles Level 1 repair jobs Discord notifications wired RepairEvent persistence added clear_stale_operation_lock validated live Cloudflare SRV drift detection validated edge_republish restored deleted Cloudflare SRV record through existing edge publish path Level 2 and Level 3 repairs remain disabled ``` Current operating posture: ```text Controller is expected to remain in dry-run unless deliberately enabling Level 1 repairs. Repair worker is live. Level 1 repair path is proven. No destructive repairs are automatic. ``` ### Billing enforcement / overdue handling Status: backend launch-ready. Implemented and validated: ```text BillingEnforcementState BillingEnforcementEvent StripeEventLog Stripe event idempotency payment_failed warning flow final warning / backup block state suspension / shutdown state payment restored flow API billing gates while suspended controller does not repair suspended game servers billing worker installed and running under systemd billing announcements visible in Portal ``` Service: ```text zpack-billing-worker.service installed and clean under systemd ``` Safety guarantees validated: ```text No customer data deleted No backups deleted No DNS records deleted No Velocity records deleted No containers deleted Destructive billing actions are rejected and audited Suspended servers are not repaired back to connectable/running state ``` Remaining billing follow-ups are fixture validation only: ```text File read/list against a responsive Agent Backup mutation route validation with a game backup fixture ``` ### Support ticket path Status: launch-ready. Implemented and validated: ```text POST /api/support/create exists SupportTicket DB model and migration added Human-readable ticket number: ZLH-YYYYMMDD-XXXX Portal form submits successfully Customer acknowledgement email received Discord #support alert received SupportTicket DB row created ``` Post-launch enhancements only: ```text Admin ticket list/view Support triage diagnostics Self-hosted helpdesk integration Inbound email reply parsing Attachments ``` ## Current launch service set ```text zpack-api.service zpack-provision-worker.service zpack-repair-worker.service zlh-controller.service zpack-billing-worker.service ``` Launch guardrail: ```text Do not add more worker/systemd services before launch unless there is a strong safety-boundary reason. ``` ## Remaining launch-active work ### Portal terminal reliability Issue: console can hang at Connecting. Required: ```text WebSocket connect timeout Error path clears socket refs isStreaming resets on closed/error/idle Button recovers to Open Console/Reconnect Validate console still works after fix ``` ### Monitoring / observability readiness Still a major infrastructure item. Remaining: ```text Restrict Prometheus/Grafana/node_exporter exposure Fix game/dev discovery sync Remove stale file_sd targets Install Grafana dashboards Add API health/app scrape Add lifecycle visibility Add or explicitly defer centralized logs/Loki Tighten monitoring token storage Add/verify queue staleness visibility for provisioning, repair, billing_enforcement ``` ### Patch management / maintenance window policy Needs written policy/runbook: ```text Normal maintenance window cadence/timezone Emergency maintenance behavior Customer notification expectations Patch order and rollback expectations ``` ### Notepad / messaging retest Announcements are validated for billing and support context. Still validate: ```text notepad/notes load and save persistence after reload/login permissions empty/error states ``` ### Final integrated smoke test Run after the above launch blockers are clean: ```text Game lifecycle: create -> ready/connectable -> console -> files -> backup -> restore -> delete -> DNS/Velocity/Cloudflare cleanup Dev lifecycle: create -> hosted IDE -> stop/restart/delete -> cleanup Security: Agent auth fail-closed, non-owner blocked, browser does not expose internal secrets ``` ## Issues likely ready to close or supersede ```text #9 Support email/ticket path — resolved #11 Provisioning worker / async create — resolved #10 Multi-create modal confusion — likely resolved by async inline cards; quick two-create validation or close as covered by #11 #6 Non-payment grace flow — superseded by #14 billing enforcement ``` ## Issues still active ```text #12 Portal terminal reliability #5 Monitoring / observability readiness #7 Patch management / maintenance window policy #8 Notepad / announcements / messaging retest #4 Integrated Portal/API/Agent smoke test #13 Controller/reconciler — keep dry-run soak / decide Level 1 live posture #14 Billing enforcement — core resolved; minor fixture validation remains ``` ## Notes - Support is launch-ready with ZLH-native DB ticket + customer email + Discord alert. - A self-hosted helpdesk such as FreeScout/Zammad can be considered post-launch, but ZLH should keep its SupportTicket intake/audit record either way. - Controller should not parse support ticket text or auto-run repairs from free text at launch. Post-launch support triage may add tags, read-only diagnostics, and suggested actions.