Skip to content

HDDS-14652. Handle NoSuchFileException during the bootstrap tarball transfer.#9784

Merged
aswinshakil merged 12 commits intoapache:masterfrom
sadanand48:HDDS-14652
Feb 26, 2026
Merged

HDDS-14652. Handle NoSuchFileException during the bootstrap tarball transfer.#9784
aswinshakil merged 12 commits intoapache:masterfrom
sadanand48:HDDS-14652

Conversation

@sadanand48
Copy link
Contributor

What changes were proposed in this pull request?

While transferring the contents from SST backup dir, it can so happen that the DAG Pruner can delete the non-leaf nodes (SST files that are present on the disk) but not remove them from the compaction log.
We iterate the compaction log to get the list of backup sst files to transfer, this can lead to a situation where we are trying to transfer a file cleaned up by the pruner which will throw NoSuchFileException. The bootstrap will fail and retrigger only to be stuck again in a loop.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14652

How was this patch tested?

will add

@sadanand48 sadanand48 closed this Feb 18, 2026
@sadanand48 sadanand48 reopened this Feb 18, 2026
@sadanand48
Copy link
Contributor Author

@SaketaChalamchala Could you please take a look?

@jojochuang jojochuang added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Feb 19, 2026
@jojochuang jojochuang marked this pull request as ready for review February 25, 2026 17:58
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sadanand48 I'm taking another pass at the review and I think we should not handle NoSuchFileException for snapshots and OM DB. If an expected file is missing from snapshots or OM checkpoint an error should be thrown and the bootstrap retried, right?
We should just handle the exception here for SST backup directory because we know that non-L0 SST files can be pruned from the directory.

Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking about it too , this should get fixed after HDDS-14651. However since that is not yet implemented we can selectively catch it only for this particular case i.e sstBackupDir. Makes sense.

@sadanand48 sadanand48 marked this pull request as draft February 26, 2026 09:18
@sadanand48 sadanand48 marked this pull request as ready for review February 26, 2026 13:32
Copy link
Contributor

@SaketaChalamchala SaketaChalamchala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @sadanand48

Copy link
Member

@aswinshakil aswinshakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with the changes. I'm thinking if this can be simplified where we check file's existence in extractSSTFilesFromCompactionLog.

Since we have the bootstrap write lock, SST Pruning shouldn't have happened after that. Which kind of gives us a consistent state of the SST backup dir throughout the bootstrap.

Not really need for this PR, Just a suggestion.

@aswinshakil aswinshakil merged commit 96cc223 into apache:master Feb 26, 2026
140 of 143 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants