-
Notifications
You must be signed in to change notification settings - Fork 1
Optimise CLEM Data Transfer and Registration #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ith certain string patterns from being transferred
…gs; added 'substrings_blacklist' as an attribute to the DirWatcher class
…d of based on keywords in the file path
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #707 +/- ##
==========================================
- Coverage 38.51% 38.25% -0.26%
==========================================
Files 99 99
Lines 11663 11880 +217
Branches 1542 1611 +69
==========================================
+ Hits 4492 4545 +53
- Misses 6973 7113 +140
- Partials 198 222 +24 🚀 New features to boost your workflow:
|
…or the CLEM workflow with an active blacklist
stephen-riggs
approved these changes
Dec 8, 2025
Contributor
stephen-riggs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, couple of minor comments.
… and added the threshold at which a dataset is considered an Atlas database entry to it
stephen-riggs
pushed a commit
that referenced
this pull request
Jan 8, 2026
With the implementation of a substrings_blacklist key to the MachineConfig Pydantic model as part of PR #707, we need to ensure that any files that are blacklisted do not then get transferred over to the destination by the RSyncer when a session is being finalised. This PR resolves that issue by passing the MachineConfig through the MultigridController to any instantiated RSyncers. When finalising the session, the RSyncers will delete the files and folders matching the blacklist criteria, while backing up and transferring other files.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
client
Relates to the client component
cryo-clem
Part of the cryo-CLEM pipeline extension
enhancement
New feature or request
server
Relates to the server component
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After encountering an incident recently where Murfey was overwhelmed by the sheer number of files generated by the CLEM workflow (the CLEM app saves a lot of helper files used for rendering UI components that are not necessary for the data processing), it became evident that some safeguards needed to be put in place to filter out the files to be transferred.
Additionally, datasets were registered as either a
GridSquareorAtlasin ISPyB based on keywords in their file paths ("Overview" forAtlas, and everything else forGridSquare). However, it was recently determined that the "Overview" dataset is generated at the end of a session after collecting individual positions as part of a tile scanning protocol, and that it is also possible to collect an atlas using the tile scanning protocol directly. The collection of an Overview after having run a grid-spanning tile scan thus just results in data duplication, and the use of file path keywords to determine if the dataset is anAtlasorGridSquareis thus unwieldy.This PR fixes these issues by implementing a substrings blacklist in the
MachineConfigPydantic model that can be passed to theDirWatcherclass. Files and folders containing strings present in the blacklist will be excluded from theRSyncerandAnalyser, thereby preventing junk files from slowing down the file transfer and analysis processes.Additionally, the logic for registering CLEM data was optimised slightly. Classification as
AtlasorGridSquareis now determined by the image dimensions: A sample grid is ~3mm, so if an image is at least 1.5mm in width or height, it should be considered anAtlas. This number can be adjusted as the CLEM workflow continues to mature.NOTE: Many of the new line insertions are due to additional tests written to improve coverage of existing code.