Skip to content

test: Enable test_snapshot_pruning_removes_outdated_records#1735

Open
Pijukatel wants to merge 3 commits intomasterfrom
fix-test-snapshot-pruning
Open

test: Enable test_snapshot_pruning_removes_outdated_records#1735
Pijukatel wants to merge 3 commits intomasterfrom
fix-test-snapshot-pruning

Conversation

@Pijukatel
Copy link
Collaborator

@Pijukatel Pijukatel commented Feb 11, 2026

Description:

  • Update test_snapshot_pruning_removes_outdated_records to remove the expected source of flaky behavior
  • The root cause of the flaky test behavior is a design flaw in snapshot handling:
    event_manager.on(event=Event.SYSTEM_INFO, listener=self._snapshot_cpu) adds self._snapshot_cpu listener, but event manager runs sync listeners through asyncio.to_thread. self._snapshot_cpu modifies in-place instance list (for example self._cpu_snapshots) - it does bisect.insort and del operations on the same list from several threads, which creates oportunity for a race condition.

Issues:

Closes: #1734

@github-actions github-actions bot added this to the 134th sprint - Tooling team milestone Feb 11, 2026
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Feb 11, 2026
@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.48%. Comparing base (4163413) to head (dced3e3).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1735      +/-   ##
==========================================
+ Coverage   92.41%   92.48%   +0.06%     
==========================================
  Files         156      156              
  Lines       10628    10628              
==========================================
+ Hits         9822     9829       +7     
+ Misses        806      799       -7     
Flag Coverage Δ
unit 92.48% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pijukatel Pijukatel requested a review from vdusek February 11, 2026 10:42
@Pijukatel Pijukatel marked this pull request as ready for review February 11, 2026 10:43
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright let's see :)

Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FAILED tests/unit/_autoscaling/test_snapshotter.py::test_snapshot_pruning_removes_outdated_records - assert 3 == 2

  • where 3 = len([CpuSnapshot(used_ratio=0.5, max_used_ratio=0.95, created_at=datetime.datetime(2026, 2, 11, 7, 39, 32, 789908, tzinfo=datetime.timezone.utc)), CpuSnapshot(used_ratio=0.5, max_used_ratio=0.95, created_at=datetime.datetime(2026, 2, 11, 9, 39, 32, 789908, tzinfo=datetime.timezone.utc)), CpuSnapshot(used_ratio=0.5, max_used_ratio=0.95, created_at=datetime.datetime(2026, 2, 11, 10, 39, 32, 789908, tzinfo=datetime.timezone.utc))])
    ============ 1 failed, 1615 passed, 8 skipped in 345.52s (0:05:45) =============
    Error: Process completed with exit code 1.

@Pijukatel Pijukatel force-pushed the fix-test-snapshot-pruning branch from d7783c5 to 63e8468 Compare February 11, 2026 15:20
@Pijukatel
Copy link
Collaborator Author

@Mantisus , @vdusek, please check the root cause in the issue description. The suggested fix should be ok, but I fear it is too implicit and relies on EventManager non-obvious behavior. Would you be in favor of a more explicit(probably more complex) solution that would involve explicit threading locks that would prevent such a race condition?

@Mantisus
Copy link
Collaborator

Mantisus commented Feb 11, 2026

@Mantisus, @vdusek, please check the root cause in the issue description.

The root cause hypothesis seems incorrect. If the cause was an error processing in the 2-hour snapshot, then there would be 1 snapshot in the failed tests, not 3.

It might be worth trying

for event_data in events_data:
    event_manager.emit(event=Event.SYSTEM_INFO, event_data=event_data)
    await event_manager.wait_for_all_listeners_to_complete()

Perhaps the reason is that event_manager generates events out of chronological order.

@Pijukatel
Copy link
Collaborator Author

The root cause hypothesis seems incorrect.

Yes, the hypothesis in the issue description seems incorrect. I was referring to the root cause description in this PR

@Pijukatel Pijukatel requested a review from vdusek February 13, 2026 07:31
@vdusek
Copy link
Collaborator

vdusek commented Feb 13, 2026

Please check the root cause in the issue description. The suggested fix should be ok, but I fear it is too implicit and relies on EventManager non-obvious behavior. Would you be in favor of a more explicit(probably more complex) solution that would involve explicit threading locks that would prevent such a race condition?

I'd leave it up to you. If you believe the current approach is stable, I'm okay with it. Otherwise, let's implement the thread locks.

@Mantisus
Copy link
Collaborator

Mantisus commented Feb 13, 2026

I was referring to the root cause description in this PR

Got it. Perhaps the race condition also affects this.

I hypothesize that when event_manager.wait_for_all_listeners_to_complete() is executed, all 5 events are executed simultaneously without any guarantee that they will be executed in the same order as they were created. If the result is [now-3, now-2, now, now-5], the last snapshot will not be deleted.
This is not a problem in normal work, because if this happened, it would be deleted when the next snapshot was recorded. But it is a problem for this test.

@Pijukatel
Copy link
Collaborator Author

I was referring to the root cause description in this PR

Got it. Perhaps the race condition also affects this.

I hypothesize that when event_manager.wait_for_all_listeners_to_complete() is executed, all 5 events are executed simultaneously without any guarantee that they will be executed in the same order as they were created. If the result is [now-3, now-2, now, now-5], the last snapshot will not be deleted. This is not a problem in normal work, because if this happened, it would be deleted when the next snapshot was recorded. But it is a problem for this test.

Different insertion order is not a problem, as it should be handled by using SortedSnapshotList (which orders automatically when calling add), so we do not need to rely on the snapshots arriving time-ordered. The problem is that the same SortedSnapshotList can be in-place sorted/deleted in different threads. This is happening often in tests, but it can also happen in a real scenario, although it is extremely unlikely due to the usual time difference between different snapshots. But there is no guarantee that the snapshots will arrive "slowly enough" to guarantee processing of only one snapshot at a time.

@Mantisus
Copy link
Collaborator

Mantisus commented Feb 13, 2026

Different insertion order is not a problem, as it should be handled by using SortedSnapshotList

The snapshot is added to SortedSnapshotList after _prune_snapshots.
https://github.com/apify/crawlee-python/blob/master/src/crawlee/_autoscaling/snapshotter.py#L272

Therefore, it is important which snapshot will be processed last.

In the next test, 0 snapshots will be deleted. This is because the last snapshot processed is now-3.

async def test_without_pruning(
    snapshotter: Snapshotter, event_manager: LocalEventManager, default_memory_info: MemoryInfo
) -> None:
    snapshotter._SNAPSHOT_HISTORY = timedelta(hours=2)
    now = datetime.now(timezone.utc)

    snapshots = [
        CpuSnapshot(used_ratio=0.5, max_used_ratio=0.95, created_at=now - timedelta(hours=delta)) for delta in [5, 2, 0]
    ]

    for snapshot in snapshots: # create `SortedSnapshotList` with 3 elems
        snapshotter._cpu_snapshots.add(snapshot)

    event_manager.emit(event=Event.SYSTEM_INFO, event_data=EventSystemInfoData(
        cpu_info=CpuInfo(used_ratio=0.5, created_at=now - timedelta(hours=3)),
        memory_info=default_memory_info,
    ))

    await event_manager.wait_for_all_listeners_to_complete()

    cpu_snapshots = cast('list[CpuSnapshot]', snapshotter.get_cpu_sample())

    assert len(cpu_snapshots) == 4 # Not one was removed

@Pijukatel
Copy link
Collaborator Author

@Mantisus good point!

To ensure that the Different insertion order is not a problem, we need to run the _snapshot_cpu uninterrupted (without anyone else modifying the self._cpu_snapshots ). That will guarantee the assumption that pruning is happening on a sorted list.

cpu_info=CpuInfo(used_ratio=0.5, created_at=now - timedelta(hours=delta)),
memory_info=default_memory_info,
)
for delta in [3, 2, 5, 0] # Out of order timestamps. Snapshotter can not rely on natural ordering.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mantisus I meant that this order should not matter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: test_snapshot_pruning_removes_outdated_records fails with assert 3 == 2

3 participants