Launch bug: harden Portal terminal connection state and xterm lifecycle #12

Open
opened 2026-05-03 11:47:03 +00:00 by jester · 0 comments
Owner

Launch bug

Portal console/terminal can still be flaky and occasionally hangs on Connecting....

Repo review confirms this does not look like an Agent language/runtime issue. Most of the console lifecycle architecture is already present:

  • zpack-portal uses @xterm/xterm and @xterm/addon-fit
  • TerminalView.tsx has WebSocket open/message/close/error handlers
  • reconnect/backoff behavior exists
  • cleanup on unmount exists
  • write buffering exists
  • stale PTY-ish error detection exists for messages like file already closed and /dev/ptmx
  • ServerConsole.tsx gates streaming by game connectability / dev agent-online state

Likely remaining issue

The hang appears to be a frontend connection-state trap rather than missing terminal architecture.

Observed risk from Portal code:

  • ServerConsole.tsx can leave isStreaming=true while connectionStatus is error, closed, or stuck at connecting.
  • The Open Console button label can remain Connecting... and stay disabled even after the WebSocket path failed.
  • TerminalView.tsx avoids opening a new socket if the current socket is OPEN or CONNECTING, but there is no obvious connect/open timeout to force-close a stale CONNECTING socket.
  • The WebSocket error path should aggressively close/clear the current socket and sender callback instead of relying on a clean close event.

Required launch-safe fix

  1. Add a WebSocket connect timeout in TerminalView.tsx.

    • Suggested timeout: 10–15 seconds.
    • If the socket does not reach open, close it, clear socketRef, call onSend(null), and emit onStatus("error") or onStatus("closed").
  2. Harden the WebSocket error path.

    • If the errored socket is current, close it, clear current refs/callbacks, and allow reconnect/manual retry.
  3. Reset isStreaming in ServerConsole.tsx when terminal status becomes:

    • closed
    • error
    • idle
  4. Make the button recoverable.

    • Do not leave users permanently at Connecting....
    • After failure, button should return to Open Console or Reconnect.
  5. Consider a backend/Agent ready semantic later.

    • Portal currently treats browser WebSocket open as connected, but that only proves Browser ↔ API opened.
    • Future improvement: emit an explicit console_ready / terminal_attached frame when API ↔ Agent and Agent ↔ console/session are actually attached.

xterm/addon notes

Current addons:

@xterm/addon-fit

Not currently using:

@xterm/addon-attach
@xterm/addon-web-links
@xterm/addon-search
@xterm/addon-unicode11
@xterm/addon-serialize

Do not add addon-attach as an automatic fix. The custom WebSocket layer is intentional because it handles auth, reconnect, buffering, stale session handling, and Portal status callbacks.

Launch expectation

This is a targeted Portal terminal reliability fix. Do not rewrite the Agent or switch languages for this issue.

## Launch bug Portal console/terminal can still be flaky and occasionally hangs on `Connecting...`. Repo review confirms this does **not** look like an Agent language/runtime issue. Most of the console lifecycle architecture is already present: - `zpack-portal` uses `@xterm/xterm` and `@xterm/addon-fit` - `TerminalView.tsx` has WebSocket open/message/close/error handlers - reconnect/backoff behavior exists - cleanup on unmount exists - write buffering exists - stale PTY-ish error detection exists for messages like `file already closed` and `/dev/ptmx` - `ServerConsole.tsx` gates streaming by game connectability / dev agent-online state ## Likely remaining issue The hang appears to be a frontend connection-state trap rather than missing terminal architecture. Observed risk from Portal code: - `ServerConsole.tsx` can leave `isStreaming=true` while `connectionStatus` is `error`, `closed`, or stuck at `connecting`. - The Open Console button label can remain `Connecting...` and stay disabled even after the WebSocket path failed. - `TerminalView.tsx` avoids opening a new socket if the current socket is `OPEN` or `CONNECTING`, but there is no obvious connect/open timeout to force-close a stale `CONNECTING` socket. - The WebSocket `error` path should aggressively close/clear the current socket and sender callback instead of relying on a clean `close` event. ## Required launch-safe fix 1. Add a WebSocket connect timeout in `TerminalView.tsx`. - Suggested timeout: 10–15 seconds. - If the socket does not reach `open`, close it, clear `socketRef`, call `onSend(null)`, and emit `onStatus("error")` or `onStatus("closed")`. 2. Harden the WebSocket error path. - If the errored socket is current, close it, clear current refs/callbacks, and allow reconnect/manual retry. 3. Reset `isStreaming` in `ServerConsole.tsx` when terminal status becomes: - `closed` - `error` - `idle` 4. Make the button recoverable. - Do not leave users permanently at `Connecting...`. - After failure, button should return to `Open Console` or `Reconnect`. 5. Consider a backend/Agent ready semantic later. - Portal currently treats browser WebSocket `open` as connected, but that only proves Browser ↔ API opened. - Future improvement: emit an explicit `console_ready` / `terminal_attached` frame when API ↔ Agent and Agent ↔ console/session are actually attached. ## xterm/addon notes Current addons: ```text @xterm/addon-fit ``` Not currently using: ```text @xterm/addon-attach @xterm/addon-web-links @xterm/addon-search @xterm/addon-unicode11 @xterm/addon-serialize ``` Do **not** add `addon-attach` as an automatic fix. The custom WebSocket layer is intentional because it handles auth, reconnect, buffering, stale session handling, and Portal status callbacks. ## Launch expectation This is a targeted Portal terminal reliability fix. Do not rewrite the Agent or switch languages for this issue.
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: jester/zlh-grind#12
No description provided.