Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Feb 3, 2026

Cherry-picked from #60317

…#60317)

Root cause:
1. On the same host, old BE (e.g. heartbeat 9050) and new BE (e.g. 9051)
run
   in parallel during smooth upgrade.
2. migrateTablets updates CloudReplica primary from old BE to new BE. At
that
time the new BE may not have registered or sent heartbeats, so it is not
   alive in FE.
3. updateClusterToPrimaryBe(clusterId, dstBe) also clears secondary for
that
   cluster, so no fallback BE is left.
4. On query: primary (new BE) is not alive -> getSecondaryBackend()
returns
   null -> code falls back to hashReplicaToBe().
5. hashReplicaToBe() only considers BEs with
be.isQueryAvailable() && !be.isSmoothUpgradeSrc(). The old BE is
excluded
as smooth-upgrade source; the new BE is excluded because it is not
alive.
   Result: no available BE and COMPUTE_GROUPS_NO_ALIVE_BE.

Fix:
- After switching primary to the new BE in migrateTablets, set the old
BE
  (srcBe) as secondary for that cluster.
- Expose updateClusterToSecondaryBe in CloudReplica so the rebalancer
can set
  this fallback.
- Queries then use primary when possible and fall back to secondary (old
BE)
when primary is not alive; isQueryAvailable() does not exclude the old
BE
by isSmoothUpgradeSrc(), so the old BE can still serve reads until the
new
  BE becomes alive.
@github-actions github-actions bot requested a review from yiguolei as a code owner February 3, 2026 08:36
@yiguolei
Copy link
Contributor

yiguolei commented Feb 4, 2026

run buildall

@yiguolei yiguolei closed this Feb 4, 2026
@yiguolei yiguolei reopened this Feb 4, 2026
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 4, 2026
@github-actions
Copy link
Contributor Author

github-actions bot commented Feb 4, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor Author

github-actions bot commented Feb 4, 2026

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 8ef32a7 into branch-4.0 Feb 4, 2026
26 of 29 checks passed
@github-actions github-actions bot deleted the auto-pick-60317-branch-4.0 branch February 4, 2026 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants