Skip to content

Conversation

@tieneupin
Copy link
Contributor

@tieneupin tieneupin commented Dec 5, 2025

After encountering an incident recently where Murfey was overwhelmed by the sheer number of files generated by the CLEM workflow (the CLEM app saves a lot of helper files used for rendering UI components that are not necessary for the data processing), it became evident that some safeguards needed to be put in place to filter out the files to be transferred.

Additionally, datasets were registered as either a GridSquare or Atlas in ISPyB based on keywords in their file paths ("Overview" for Atlas, and everything else for GridSquare). However, it was recently determined that the "Overview" dataset is generated at the end of a session after collecting individual positions as part of a tile scanning protocol, and that it is also possible to collect an atlas using the tile scanning protocol directly. The collection of an Overview after having run a grid-spanning tile scan thus just results in data duplication, and the use of file path keywords to determine if the dataset is an Atlas or GridSquare is thus unwieldy.

This PR fixes these issues by implementing a substrings blacklist in the MachineConfig Pydantic model that can be passed to the DirWatcher class. Files and folders containing strings present in the blacklist will be excluded from the RSyncer and Analyser, thereby preventing junk files from slowing down the file transfer and analysis processes.

Additionally, the logic for registering CLEM data was optimised slightly. Classification as Atlas or GridSquare is now determined by the image dimensions: A sample grid is ~3mm, so if an image is at least 1.5mm in width or height, it should be considered an Atlas. This number can be adjusted as the CLEM workflow continues to mature.

NOTE: Many of the new line insertions are due to additional tests written to improve coverage of existing code.

@tieneupin tieneupin self-assigned this Dec 5, 2025
@tieneupin tieneupin added enhancement New feature or request server Relates to the server component client Relates to the client component cryo-clem Part of the cryo-CLEM pipeline extension labels Dec 5, 2025
@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

❌ Patch coverage is 94.44444% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 38.25%. Comparing base (c02c9da) to head (e76c32d).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #707      +/-   ##
==========================================
- Coverage   38.51%   38.25%   -0.26%     
==========================================
  Files          99       99              
  Lines       11663    11880     +217     
  Branches     1542     1611      +69     
==========================================
+ Hits         4492     4545      +53     
- Misses       6973     7113     +140     
- Partials      198      222      +24     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tieneupin tieneupin marked this pull request as ready for review December 8, 2025 11:51
Copy link
Contributor

@stephen-riggs stephen-riggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, couple of minor comments.

@tieneupin tieneupin merged commit 13f35a7 into main Dec 8, 2025
17 checks passed
@tieneupin tieneupin deleted the optimise-clem-logic branch December 8, 2025 12:48
stephen-riggs pushed a commit that referenced this pull request Jan 8, 2026
With the implementation of a substrings_blacklist key to the MachineConfig Pydantic model as part of PR #707, we need to ensure that any files that are blacklisted do not then get transferred over to the destination by the RSyncer when a session is being finalised.

This PR resolves that issue by passing the MachineConfig through the MultigridController to any instantiated RSyncers. When finalising the session, the RSyncers will delete the files and folders matching the blacklist criteria, while backing up and transferring other files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client Relates to the client component cryo-clem Part of the cryo-CLEM pipeline extension enhancement New feature or request server Relates to the server component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants