Skip to content

Conversation

@becomeStar
Copy link
Contributor

@becomeStar becomeStar commented Feb 7, 2026

This PR fixes a race condition in RetriableStream where, under certain retry and deadline timings, the response future may never be completed.

When a deadline cancellation occurs concurrently with retry commit, inFlightSubStreams may not be decremented, causing ClientCallImpl.ClientStreamListenerImpl.closed to never be invoked. As a result, blockingUnaryCall can hang indefinitely.

After this change, the inFlightSubStreams counting is consistent whenever a scheduled retry is committed, ensuring the close signal is always delivered.

I verified this using the issue reproduction code from the issue reporter, which previously caused blockingUnaryCall to hang and eventually hit a TimeoutException because a while loop never progressed. After this change, running the same reproduction code no longer hangs and continues as expected without timing out.

Fixes #12620

Under certain retry and deadline timings, RetriableStream could
leave the response future incomplete because inFlightSubStreams
was not decremented consistently during commit.

The change ensures inFlightSubStreams is decremented whenever a
scheduled retry is committed, restoring correct close signaling
without altering retry semantics.
final Future<?> retryFuture;
final boolean retryWasScheduled = scheduledRetry != null;
if (scheduledRetry != null) {
retryFuture = scheduledRetry.markCancelled();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markCancelled() may return null if the scheduled retry was cancelled before setFuture() was called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when the hang occurs I see markCancelled() had returned null causing the code in CommitTask to skip closing the master listener.

@kannanjgithub
Copy link
Contributor

/gcbrun

@kannanjgithub kannanjgithub added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Feb 9, 2026
@grpc-kokoro grpc-kokoro removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Feb 9, 2026
@kannanjgithub kannanjgithub added kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run and removed kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run labels Feb 9, 2026
@kannanjgithub kannanjgithub merged commit 73abb48 into grpc:master Feb 9, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Java gRPC client can stuck forever if retries are enabled

3 participants