Skip to content

Conversation

@BradyAJohnston
Copy link
Member

@BradyAJohnston BradyAJohnston commented Jan 6, 2026

Fixes #5200

Changes made in this Pull Request:

  • Refactor count_by_time() to handle returning values on a non-linear FrameIterator

LLM / AI generated code disclosure

LLMs or other AI-powered tools (beyond simple IDE use cases) were used in this contribution: no

PR Checklist

  • Issue raised/referenced?
  • Tests updated/added?
  • Documentation updated/added?
  • package/CHANGELOG file updated?
  • Is your name in package/AUTHORS? (If it is not, add it!)
  • LLM/AI disclosure was updated.

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5202.org.readthedocs.build/en/5202/

@BradyAJohnston
Copy link
Member Author

Oops it seems like my auto-formatter went a bit wild - despite still passing Black. Will clean up.

@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.72%. Comparing base (528b512) to head (5900d3e).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #5202   +/-   ##
========================================
  Coverage    92.72%   92.72%           
========================================
  Files          180      180           
  Lines        22475    22475           
  Branches      3190     3190           
========================================
+ Hits         20840    20841    +1     
  Misses        1177     1177           
+ Partials       458      457    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Optionally: Look at the performance, perhaps there's a faster way to do the lookup.

Comment on lines 923 to 924
count_lookup = dict(zip(indices, tmp_counts))
return np.array([count_lookup.get(i, 0) for i in range(len(self.frames))])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking up each frame looks slow. Perhaps there's some numpy magic (take???) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only really faster approach I could figure out would be this:

  if self.start is None:
      counts = np.zeros(len(self.frames), dtype=int)
      positions = np.searchsorted(self.frames, indices)
      counts[positions] = tmp_counts
      return counts

But this assumes the self.frames to be sorted. Would this always be the case, given the FrameIterator could be a non-sorted sequence of frames?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if self.frames is sorted, possibly not when using run(frames=[2, 3, 0, 7, 6]). Maybe do a quick test?

Perhaps one could sort frames and rearrange counts in the same way and then un-sort everything again before returning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I just revisited it and the whole function can actually be simplified which I just pushed.

    def count_by_time(self):
        """Counts the number of hydrogen bonds per timestep.

        Returns
        -------
        counts : numpy.ndarray
             Contains the total number of hydrogen bonds found at each timestep.
             Can be used along with :attr:`HydrogenBondAnalysis.times` to plot
             the number of hydrogen bonds over time.
        """
        hbond_frames = self.results.hbonds[:, 0].astype(int)
        frame_unique, frame_counts = np.unique(hbond_frames, return_counts=True)

        counts = np.zeros(max(self.frames) + 1, dtype=int)
        counts[frame_unique] = frame_counts
        return counts[self.frames]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively could ensure we only make a small-as-necessary results array:

    def count_by_time(self):
        """Counts the number of hydrogen bonds per timestep.

        Returns
        -------
        counts : numpy.ndarray
             Contains the total number of hydrogen bonds found at each timestep.
             Can be used along with :attr:`HydrogenBondAnalysis.times` to plot
             the number of hydrogen bonds over time.
        """
        hbond_frames = self.results.hbonds[:, 0].astype(int)
        frame_unique, frame_counts = np.unique(hbond_frames, return_counts=True)

        frame_min = min(self.frames)
        frame_max = max(self.frames)

        counts = np.zeros(frame_max - frame_min + 1, dtype=int)
        counts[frame_unique - frame_min] = frame_counts
        return counts[self.frames - frame_min]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ended up updating it for the second approach using min and max

@orbeckst orbeckst self-assigned this Jan 6, 2026
@BradyAJohnston
Copy link
Member Author

Codecov seems to be getting it wrong - definitely covered if you click through

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

H Bond count_by_time gives a dtype error when analysis was run with a FrameIterator

2 participants