🤖 fix: remote mux server gap fixes and cleanup#2233
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a17f98c6cf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Remote workspaces can have paths that refer to a different machine. Avoid generating local editor deep links unless the workspace is an SSH runtime. --- _Generated with `mux` • Model: `openai:gpt-5.2` • Thinking: `xhigh` • Cost: ``_ <!-- mux-attribution: model=openai:gpt-5.2 thinking=xhigh costs=unknown -->
The federation middleware intercepts ALL routes whose input contains a remote-encoded ID key and proxies them upstream, never calling next() for remote IDs. Therefore, the explicit resolveRemoteWorkspaceProxy() early-return blocks in individual handlers were unreachable dead code. Removed: - All 16 resolveRemoteWorkspaceProxy call sites and their surrounding if-blocks from handler functions - Unused imports: resolveRemoteWorkspaceProxy, rewriteRemoteFrontendWorkspaceMetadataIds, rewriteRemoteTaskToolPartsInMessage, rewriteRemoteWorkspaceChatMessageIds, decodeRemoteWorkspaceId Not touched: - The 4 multi-server aggregation handlers (workspace.list, workspace.onMetadata, workspace.activity.list, workspace.activity.subscribe) which fan out to ALL servers - federationMiddleware.ts and remoteMuxProxying.ts
The UI correctly hides remote features when the experiment is disabled, but the backend unconditionally fetched from all configured remote servers. This adds early-return guards in two locations: 1. getRemoteServersForWorkspaceViews: returns [] when experiment disabled, preventing fan-out to remote servers for workspace list views. 2. createFederationMiddleware: falls through to options.next() when experiment disabled, preventing remote-encoded ID interception.
Add optional timeoutMs parameter to createRemoteClient that applies AbortSignal.timeout() to each fetch call. When an existing signal is present, the two are composed via AbortSignal.any() so either caller cancellation or the deadline can abort the request. All non-streaming call sites (federation middleware, workspace list, workspace create, activity list) now pass timeoutMs: 30_000 (30s). Streaming subscriptions (onMetadata, onChat, activity.subscribe) are deliberately excluded — they have their own stall detection.
When pumpRemoteMetadata's connection drops and the retry loop reconnects, previously-visible workspaces were never removed from the frontend. Emit metadata: null for each stale workspace ID and clear the visibleWorkspaceIds set at the top of every loop iteration. On the first iteration the set is empty (no-op); on retries it flushes stale entries so the fresh stream can re-add any that still exist.
Before this change, removing a remote server while it still had project mappings could orphan workspace IDs in the frontend metadata cache. Any subsequent operation on those remote-encoded workspace IDs would hit the assert(server, ...) in the federation middleware and crash the app. Add a guard in the remoteServers.remove handler that checks whether the server has project mappings before allowing removal. Project mappings are the mechanism that causes remote workspaces to appear in the UI via onMetadata subscriptions — if they exist, the server may still be actively referenced. When blocked, the handler returns Err() with a descriptive message telling the user to remove project mappings first.
The federation middleware assumed every batch request targets a single remote server. When getSessionUsageBatch receives an array of workspace IDs that mix local and multi-server remote IDs, the middleware would send everything to one remote — local IDs got lost and IDs spanning multiple servers only reached one. Fix: 1. Skip the federation middleware for this route (it handles its own remote splitting). 2. In the handler, partition workspace IDs into local vs per-remote- server buckets using decodeRemoteWorkspaceId, fan out in parallel, re-key remote results with encodeRemoteWorkspaceId, and merge.
The federation middleware already rewrites metadata fields via duck-typing in rewriteFederationOutputValue, but workspace.fork returns a top-level projectPath that is a remote filesystem path. This adds a path-specific rewrite after the generic rewriter to map it to the local equivalent using the existing remoteProjectPathMap.
Terminal operations (close, resize, sendInput, onOutput, attach, onExit) use sessionId as their only input ID field, not workspaceId. Since sessionId was not in FEDERATION_ID_KEYS, the federation middleware did not intercept these calls, causing them to fail on the local server after terminal.create was proxied via workspaceId. Changes: - Add sessionId to FEDERATION_ID_KEYS so the middleware decodes remote-encoded sessionId in terminal route inputs and proxies them to the correct remote server. - Skip local-only Electron terminal routes (openWindow, closeWindow, openNative) from federation — these manage local windows and must not be called on remote servers. - Add output VALUE re-encoding in rewriteFederationOutputValue: string values under FEDERATION_ID_KEYS (e.g. sessionId from terminal.create) are re-encoded so the frontend can round-trip them back through federation. Already-encoded values are skipped via isRemoteWorkspaceId. - Add path-specific rewriting for terminal.listSessions, which returns a bare string[] of session IDs that the generic object-key rewriter does not touch.
BranchSelector and GitStatusIndicator run local git commands that silently fail for remote workspaces (no local repo). Hide them when the workspace is remote using the existing remoteWorkspaceInfo variable (already computed via decodeRemoteWorkspaceId). - WorkspaceHeader: wrap BranchSelector + GitStatusIndicator div - WorkspaceListItem: add !remoteWorkspaceInfo to existing conditional
… to tests/ipc/setup.ts - Extract the copy-pasted buildOrpcContext(env: TestEnvironment): ORPCContext helper from archiveRemoteWorkspace.test.ts and agentSkillsRemoteWorkspace.test.ts into tests/ipc/setup.ts as a shared export. - Add enableExperimentForTesting() helper that overrides ExperimentsService.isExperimentEnabled() on a TestEnvironment instance. In tests, PostHog is unavailable so experiments always return false. This helper allows federation tests to enable REMOTE_MUX_SERVERS. - Fix existing remote workspace tests (archiveRemoteWorkspace, agentSkillsRemoteWorkspace) that were broken by a prior commit adding experiment gates to the federation middleware. Both now call enableExperimentForTesting(localEnv, EXPERIMENT_IDS.REMOTE_MUX_SERVERS).
…perations Add two new test files covering remote workspace proxy paths: sendMessageRemoteWorkspace.test.ts: - workspace.sendMessage proxies remote-encoded workspaceIds through federation - Verifies user message is persisted on remote workspace's chat history - workspace.interruptStream proxies remote-encoded workspaceIds terminalRemoteWorkspace.test.ts: - terminal.listSessions proxies remote workspaceIds (no PTY required) - terminal.create returns remote-encoded sessionIds (integration only) - terminal.sendInput/close proxy remote-encoded sessionIds (integration only) PTY-dependent tests are gated behind TEST_INTEGRATION=1 because node-pty throws async ESPIPE errors in bun's non-integration test runner environment. The listSessions test runs unconditionally since it doesn't spawn PTY processes.
The experiment gates in federationMiddleware.ts and remoteMuxProxying.ts only checked backend experiment state (PostHog). Users can enable the experiment via UI override (stored in localStorage), but that override is never propagated to the backend experimentsService. In environments without PostHog (telemetry disabled, tests), the experiment is always disabled on the backend even when enabled via Settings. The UI-level gate is sufficient: if the experiment is off, the Settings page doesn't show Remote Servers, so no remote server configs can be created. Without configs, getRemoteServersForWorkspaceViews() already returns [] and the federation middleware has nothing to proxy.
Normalize metadata.projectPath with stripTrailingSlashes before looking it up in remoteProjectPathMap in both workspace.activity.list and workspace.activity.subscribe handlers. Without this, a remote server returning a project path with trailing slashes would cause the lookup to miss, silently excluding that workspace's activity from results.
a17f98c to
34856d4
Compare
|
@codex review Addressed both review comments:
Also rebased onto latest |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 34856d4934
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ervers Add a layered experiment check to getRemoteServersForWorkspaceViews() and createFederationMiddleware() that prevents remote network traffic when the experiment is disabled: 1. Config override "off" → force-disable (admin kill-switch) 2. Config override "on" → force-enable (self-hosted without PostHog) 3. Config override "default" → defer to experimentsService (PostHog) This ensures users with stale remoteServers config entries don't trigger remote traffic when the experiment is off, while still allowing self-hosted deployments to opt in via featureFlagOverrides.
During a long-lived remote activity subscription, the allowedWorkspaceIds set was built once per reconnect attempt. Workspaces created after the subscription started were permanently excluded from the activity stream. Add a 30-second periodic refresh inside the for-await loop that re-fetches workspace lists and adds any new IDs to the allowlist. The refresh is best-effort (wrapped in try-catch) and only adds to the set, never removes, so existing entries are never disrupted.
|
@codex review Addressed both round 2 comments:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0fcfb71525
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const client = createRemoteClient<AnyOrpcClient>({ | ||
| baseUrl: server.baseUrl, | ||
| authToken, | ||
| timeoutMs: 30_000, |
There was a problem hiding this comment.
Remove hard timeout from federated streaming requests
This middleware creates the remote client with timeoutMs: 30_000 for every proxied route, including long-lived subscriptions such as workspace.onChat, workspace.onMetadata, workspace.activity.subscribe, and terminal streams. Because AbortSignal.timeout() is wall-clock based, healthy remote streams are forcibly aborted at 30s, which can terminate active chat/terminal sessions and lose in-flight streamed output. Apply the timeout only to non-streaming procedures (or bypass it for async-iterable routes).
Useful? React with 👍 / 👎.
| } | ||
|
|
||
| if (typeof next.sectionId === "string") { | ||
| next.sectionId = encodeRemoteIdBestEffort(serverId, next.sectionId); |
There was a problem hiding this comment.
Preserve sectionId when rewriting remote workspace metadata
Encoding sectionId as a remote ID breaks section matching on the frontend, which compares workspace.sectionId to plain section IDs from project sections (sectionIds.has(workspace.sectionId)). After this rewrite, remote workspaces no longer match their actual section IDs and are treated as unsectioned, causing incorrect grouping/placement behavior in the sidebar. sectionId should remain a section identifier, not be rewritten as a remote workspace/task ID.
Useful? React with 👍 / 👎.
Summary
Comprehensive gap fixes and cleanup for the remote mux server feature branch. Addresses 12 issues identified during deep review of the ~5600 LoC remote server implementation, spanning federation middleware fixes, dead code removal, safety guards, and integration tests.
Background
The remote server architecture uses two proxy mechanisms: (1) a federation middleware that intercepts routes with remote-encoded IDs, and (2) explicit per-handler proxy blocks. The review identified that the per-handler blocks are dead code (federation runs first), terminal sessions aren't federated, mixed local/remote ID batches break, and several edge cases need hardening.
Implementation
Dead code removal (−194 LoC)
resolveRemoteWorkspaceProxycall sites from router.ts — unreachable since the federation middleware intercepts firstFederation fixes
sessionIdtoFEDERATION_ID_KEYS, output value re-encoding for ID fields, path-specific handling forterminal.listSessions, and skipping local-only routes (openWindow,closeWindow,openNative)getSessionUsageBatchnow splits mixed local/remote IDs across servers with proper result mergingworkspace.forktop-levelprojectPathnow mapped from remote to local viaremoteProjectPathMapREMOTE_MUX_SERVERSexperiment flag (was only checked in UI)Safety & robustness
pumpRemoteMetadataemitsmetadata: nullfor stale workspaces on reconnectAbortSignal.timeout)editConfignow usesAsyncMutexto prevent concurrent read-modify-write racesUI guards
BranchSelectorandGitStatusIndicatorhidden for remote workspaces (no local git repo)Tests
buildOrpcContexthelper totests/ipc/setup.tsenableExperimentForTestinghelper (fixes existing remote tests broken by experiment gating)sendMessage,interruptStream,terminal.listSessions,terminal.create,terminal.sendInput,terminal.closefederationValidation
make static-checkpasses (typecheck, lint, fmt, docs links, shellcheck)Risks
FEDERATION_ID_KEYSwith a non-encoded string value gets re-encoded. This is correct behavior but could interact with future FEDERATION_ID_KEYS additions.📋 Implementation Plan
Remote Mux Server: Gap Fixes & Cleanup
Context
This branch (~5600 LoC, 42 files) adds remote mux server support: a desktop Electron app connects to a
mux serverinstance, proxies workspace operations, and displays remote workspaces alongside local ones. The architecture isFrontend ↔ Local mux ↔ Remote mux serverwith two proxy mechanisms:src/node/orpc/federationMiddleware.ts) — catch-all.use()on the router that intercepts any route whose input contains a remote-encoded workspace/task ID, proxies it to the remote server, and rewrites output IDs back.resolveRemoteWorkspaceProxy— explicit early-return blocks in ~15 handlers that do the same thing (and are now dead code since federation runs first).A deep review identified several gaps. This plan addresses them in priority order. The plan preserves the existing
assert()for missing servers (crash-on-invariant-violation philosophy) but ensures the invariant is enforced upstream.Estimated net LoC: +350 / −250 (product code), +400 (tests)
1. Remove dead per-handler proxy blocks (cleanup, ~−200 LoC)
The federation middleware already intercepts all routes with
workspaceId/taskIdin input and never callsnext()— so the explicitresolveRemoteWorkspaceProxyblocks in handlers are unreachable dead code.Delete the early-return proxy blocks from these 15 handlers in
src/node/orpc/router.ts:agents.listagents.getagentSkills.listagentSkills.listDiagnosticsagentSkills.getworkspace.archiveworkspace.unarchiveworkspace.sendMessageworkspace.answerAskUserQuestionworkspace.resumeStreamworkspace.interruptStreamworkspace.getInfoworkspace.getFullReplayworkspace.getSubagentTranscriptworkspace.onChatThen clean up imports from
router.ts:resolveRemoteWorkspaceProxyimport (only used in deleted blocks)rewriteRemoteWorkspaceChatMessageIdsimport (still used infederationMiddleware.ts)rewriteRemoteFrontendWorkspaceMetadataIdsimport (still used infederationMiddleware.ts)rewriteRemoteTaskToolPartsInMessageimport (still used infederationMiddleware.ts)Keep the 4 multi-server aggregation handlers (
workspace.list,workspace.onMetadata,workspace.activity.list,workspace.activity.subscribe) — these fan out to ALL servers and can't use federation.Validation: All existing remote workspace tests must still pass. The federation middleware tests (
src/node/orpc/federationMiddleware.test.ts) are the source of truth.2. Fix terminal sessions for remote workspaces (~+120 LoC)
Terminal operations
close,resize,sendInput,onOutput,attach,onExitusesessionId(notworkspaceId), so the federation middleware doesn't intercept them. Afterterminal.createis proxied to the remote, subsequent operations fail on the local server.Approach: Encode server ID into session IDs
Use the same
remote.{b64server}.{b64id}encoding for session IDs. The codec inremoteMuxIds.tsis generic enough to reuse.Step 1: Add
"sessionId"toFEDERATION_ID_KEYSIn
src/node/orpc/federationMiddleware.tsL26–37:Step 2: Encode
sessionIdinterminal.createresponse rewritingterminal.createreturns{ sessionId: string }. The federation middleware's generic output rewriter needs to re-encode thesessionIdfrom the remote response. Add a federation output rewrite forsessionIdfields — either:"sessionId"to the set of keys thatrewriteFederationOutputValuere-encodes (in the generic object walker atfederationMiddleware.tsL289–365), OR["terminal", "create"]that encodes the returnedsessionId.The generic approach is cleaner: in
rewriteFederationOutputValue, when walking object keys, if a key is inFEDERATION_ID_KEYSand the value is a string, re-encode it asencodeRemoteWorkspaceId(serverId, value). This already partially happens for record keys (L340–352); extend it to also rewrite values under known ID keys.Step 3: Handle
terminal.listSessionsoutputterminal.listSessions(input:workspaceId) returns{ sessions: Array<{ id: string, ... }> }. The federation middleware proxies it, but the returned sessionidvalues need encoding. Ensure the generic output rewriter handles nestedidfields inside sessions — or add a path-specific rewrite.Validation: Write a test in
tests/ipc/that:terminal.createwith the encoded workspace IDsessionIdis remote-encodedterminal.sendInputwith that encoded session IDterminal.onOutputwith that encoded session IDterminal.closewith that encoded session ID3. Gate backend on
REMOTE_MUX_SERVERSexperiment (~+10 LoC)The UI correctly hides remote features when the experiment is disabled, but the backend unconditionally fetches from all configured remote servers.
File:
src/node/orpc/remoteMuxProxying.ts—getRemoteServersForWorkspaceViews()at L155–158:File:
src/node/orpc/federationMiddleware.ts—createFederationMiddleware()at L404+:Validation: With experiment disabled,
workspace.listshould return only local workspaces. Sending a remote-encoded workspace ID should fall through to the local handler (which will fail to find it — correct behavior).4. Fix
getSessionUsageBatchmixed ID routing (~+40 LoC)Input is
{ workspaceIds: string[] }. If the array mixes local and remote IDs, the federation middleware sends everything to one remote server — local IDs get lost.Fix: Add explicit handler-level splitting in
src/node/orpc/router.tsat L3662–3667.Replace the simple passthrough with:
Also add this route to the federation middleware skip list — the federation middleware should not intercept
getSessionUsageBatchsince the handler handles splitting itself. Add tofederationMiddleware.ts:5. Prevent server removal with orphaned workspaces (~+20 LoC)
When a server is removed, the
assert(server, ...)inresolveRemoteWorkspaceProxy(and the federation middleware) will crash on any operation targeting orphaned workspaces. Enforce the invariant upstream.File:
src/node/orpc/router.ts— theremoteServers.removehandler (find it near L600+):Before calling
context.remoteServersService.remove(), check for active workspaces:If the
removeendpoint currently returnsvoid, update the schema to returnResult<void>so it can communicate failure.6. Add timeout to remote oRPC client (~+15 LoC)
src/node/remote/remoteOrpcClient.tshas no request timeout. A hanging remote server blocks the request indefinitely.Add a
timeoutMsoption (default 30s for normal calls, no timeout for streaming):Then at call sites in
federationMiddleware.tsandrouter.ts, passtimeoutMs: 30_000for non-streaming calls. For streaming subscriptions (onMetadata,onChat,activity.subscribe), omit it (they have their own stall detection).7. Fix stale workspaces after remote disconnect/reconnect (~+15 LoC)
In
src/node/orpc/router.ts, thepumpRemoteMetadatafunction (insideworkspace.onMetadatahandler, ~L3124–3207) reconnects on failure but doesn't clear stale workspace entries.Add cleanup at reconnect point (~L3131, before creating new client):
8. Add
editConfigmutex (~+5 LoC)src/node/config.tsL455–462:editConfigis read-modify-write without locking. Concurrent async calls can lose updates.9. Disable UI controls for remote workspaces (~+30 LoC)
Several UI controls silently fail or reference non-existent local paths for remote workspaces.
9a. Terminal button —
src/browser/components/WorkspaceHeader.tsxAfter terminal sessions are fixed (item 2), the terminal button should work for remote workspaces. No UI guard needed if item 2 is done first. If item 2 is deferred, add a disabled state:
9b. BranchSelector & GitStatusIndicator —
src/browser/components/WorkspaceHeader.tsxThese run local git commands against
namedWorkspacePath. For remote workspaces, the path is rewritten but may not have a local git repo. Guard them:Same pattern in
WorkspaceListItem.tsxL514–520 for the GitStatusIndicator.9c. Explorer tab —
src/browser/components/RightSidebar/ExplorerTab.tsxThe file browser runs bash scripts via API. For remote workspaces, these would proxy to the remote (which is correct behavior). No guard needed — the federation middleware will proxy
executeBashto the remote server. The explorer should work for remote workspaces.However, verify that
workspacePathis correctly mapped for remote workspaces. IfExplorerTabreceives a local path that doesn't exist, it'll show an empty tree. Check that the path mapping fromrewriteRemoteFrontendWorkspaceMetadataForLocalProjectprovides a usable path.10. Handle
workspace.forkfor remote workspaces (~+10 LoC)workspace.forkinput hassourceWorkspaceId(inFEDERATION_ID_KEYS) and returns{ metadata, projectPath }. The federation middleware already:sourceWorkspaceIdin input ✅metadataviarewriteRemoteFrontendMetadataBestEffort(which includes project path mapping) ✅metadata.id✅The gap: The top-level
projectPathin the response is a remote path and isn't rewritten.Fix in
src/node/orpc/federationMiddleware.ts— add a path-specific rewrite:Where
mapRemoteProjectPathToLocaluses the server'sprojectMappingsto find the corresponding local path (similar to whatrewriteRemoteFrontendWorkspaceMetadataForLocalProjectdoes for metadata).11. Log warning for auth tokens over HTTP (~+5 LoC)
File:
src/node/services/remoteServersService.ts— inupsert()(~L154–180):12. Write integration tests for core proxy paths (~+400 LoC)
Using the pattern from
tests/ipc/archiveRemoteWorkspace.test.ts(two real test environments, one acting as remote), add tests for:File:
tests/ipc/sendMessageRemoteWorkspace.test.tsonChatvia local → verify events arrive with encoded IDsinterruptStreamvia local → verify it reaches remoteFile:
tests/ipc/terminalRemoteWorkspace.test.ts(after item 2)terminal.create→ verify encoded session ID returnedterminal.sendInputwith encoded session ID → verify proxiedterminal.onOutputsubscription → verify output arrivesterminal.close→ verify cleanupFile:
tests/ipc/resumeStreamRemoteWorkspace.test.tsresumeStreamvia localresumeStreamwhile remote is already streaming → verify no error, no duplicateExtract shared helper: The
buildOrpcContext()function is duplicated acrossagentSkillsRemoteWorkspace.test.ts,archiveRemoteWorkspace.test.ts, and others. Extract totests/ipc/helpers.ts.Implementation Order
Execute in this order to minimize conflicts and enable incremental validation:
Items 1–7 can be done in a single commit/PR. Items 8–12 could be a second pass.
Generated with
mux• Model:anthropic:claude-opus-4-6• Thinking:xhigh• Cost:$330.15