Skip to content

[BUG] Flaky test_101_update_execution on Humble - action creation returns 500 due to concurrent goal race condition #222

@bburda

Description

@bburda

Bug report

Steps to reproduce

  1. Push to main branch (or any branch triggering CI)
  2. Observe the build-and-test (humble, ubuntu:jammy) job
  3. test_101_update_execution in test_integration.test.py fails intermittently

The test can be reproduced locally on ROS 2 Humble by running integration tests:

source /opt/ros/humble/setup.bash
colcon build --symlink-install && source install/setup.bash
colcon test --packages-select ros2_medkit_gateway --ctest-args -R test_integration

Expected behavior

test_101_update_execution should pass consistently on all supported ROS 2 distros. The test creates a new action execution via POST /apps/{app_id}/operations/long_calibration/executions and expects HTTP 202 (Accepted).

Actual behavior

On ROS 2 Humble, the action execution creation sometimes returns HTTP 500 instead of 202.

Assertion error (line 3697 of test_integration.test.py):

AssertionError: 500 != 202 : Expected 202 for action creation, got 500

Root cause analysis:

The test (test_101) creates a second action execution (Fibonacci with order=10) while a previous execution from test_100 (Fibonacci with order=5) may still be in progress on the same action server (/powertrain/engine/long_calibration).

Gateway logs show the SendGoal request IS dispatched and the demo action server DOES receive and start executing the goal:

[gateway_node-1] SendGoal request type: example_interfaces/action/Fibonacci_SendGoal_Request, JSON: {"goal":{"order":10},...}
[gateway_node-1] Sending action goal: /powertrain/engine/long_calibration (type: example_interfaces/action/Fibonacci)
[demo_long_calibration_action-9] Received calibration goal request with order 10
[demo_long_calibration_action-9] Executing calibration...

However, unlike successful executions, there is no subsequent Action goal accepted with ID: ... log line from the gateway. The gateway returns 500 to the test client, suggesting an exception or timeout in the action client's SendGoal response handling on Humble.

For comparison, a later test (test_41) successfully creates the same action goal on the same action server and the gateway correctly logs Action goal accepted with ID: ... and returns 202.

Flakiness pattern:

  • Fails only on Humble (ubuntu:jammy) - passes consistently on Jazzy and Rolling
  • The same Humble job also failed on a previous run (#22103100298) on the feat/91-rate-limiting branch
  • Likely a race condition in ROS 2 Humble's rclcpp_action client when sending a new goal while a previous goal on the same action server is still executing

Environment

Additional information

Failing job summary:

  • Job: build-and-test (humble, ubuntu:jammy) - failure
  • Job: build-and-test (jazzy, ubuntu:noble) - success
  • Job: build-and-test (rolling, ubuntu:noble) - success
  • Job: coverage - success

Test results: 1260 total tests, 1 integration test failure (test_101_update_execution), 7 skipped

Possible fixes:

  1. Add a small delay or wait for the previous action to complete before creating a new execution in test_101
  2. Cancel the previous execution explicitly at the start of test_101
  3. Use a separate action server / different operation for test_101 to avoid contention
  4. Investigate if the gateway's action client needs retry logic or better error handling for concurrent goals on Humble

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions