Skip to content

HDDS-14724. Fix infinite CPU spin loop in ECBlockInputStream#9833

Open
stuxuhai wants to merge 1 commit intoapache:masterfrom
stuxuhai:HDDS-14724
Open

HDDS-14724. Fix infinite CPU spin loop in ECBlockInputStream#9833
stuxuhai wants to merge 1 commit intoapache:masterfrom
stuxuhai:HDDS-14724

Conversation

@stuxuhai
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes a critical 100% CPU infinite spin loop bug in ECBlockInputStream that occurs during transient network unavailability or DataNode unreadiness (e.g., when the underlying NIO channel returns 0 bytes).

Currently, ECBlockInputStream#readFromStream only checks if actualRead == -1 (EOF). If 0 bytes are returned while expectedRead > 0, the stream fails to advance its position but remains in the while loop, causing an infinite CPU spin and thread starvation.

Proposed Solution:
Instead of introducing complex timeouts or backoff loops, this patch aligns the EC read path with the traditional replica read path (BlockInputStream#readWithStrategy).
By strictly validating actualRead != expectedRead (or explicitly intercepting 0), it throws an IOException immediately on inconsistent reads.

This naturally integrates with the existing Ozone client architecture:

  1. ECBlockInputStream throws IOException.
  2. readWithStrategy wraps it into BadDataLocationException.
  3. ECBlockInputStreamProxy catches it and gracefully falls over to failoverToReconstructionRead.

This minimalist approach completely eliminates the spin loop while fully leveraging the ecosystem's native reconstruction/failover mechanisms without modifying the Proxy class.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14724

How was this patch tested?

Added a JUnit test testZeroByteReadTriggersFailoverException to verify that a 0-byte read in ECBlockInputStream correctly throws an IOException (which translates to BadDataLocationException), instantly breaking the loop.

@adoroszlai adoroszlai requested a review from sodonnel February 26, 2026 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant