-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Bug report
Steps to reproduce
- Push to
mainbranch (or any branch triggering CI) - Observe the
build-and-test (humble, ubuntu:jammy)job test_101_update_executionintest_integration.test.pyfails intermittently
The test can be reproduced locally on ROS 2 Humble by running integration tests:
source /opt/ros/humble/setup.bash
colcon build --symlink-install && source install/setup.bash
colcon test --packages-select ros2_medkit_gateway --ctest-args -R test_integrationExpected behavior
test_101_update_execution should pass consistently on all supported ROS 2 distros. The test creates a new action execution via POST /apps/{app_id}/operations/long_calibration/executions and expects HTTP 202 (Accepted).
Actual behavior
On ROS 2 Humble, the action execution creation sometimes returns HTTP 500 instead of 202.
Assertion error (line 3697 of test_integration.test.py):
AssertionError: 500 != 202 : Expected 202 for action creation, got 500
Root cause analysis:
The test (test_101) creates a second action execution (Fibonacci with order=10) while a previous execution from test_100 (Fibonacci with order=5) may still be in progress on the same action server (/powertrain/engine/long_calibration).
Gateway logs show the SendGoal request IS dispatched and the demo action server DOES receive and start executing the goal:
[gateway_node-1] SendGoal request type: example_interfaces/action/Fibonacci_SendGoal_Request, JSON: {"goal":{"order":10},...}
[gateway_node-1] Sending action goal: /powertrain/engine/long_calibration (type: example_interfaces/action/Fibonacci)
[demo_long_calibration_action-9] Received calibration goal request with order 10
[demo_long_calibration_action-9] Executing calibration...
However, unlike successful executions, there is no subsequent Action goal accepted with ID: ... log line from the gateway. The gateway returns 500 to the test client, suggesting an exception or timeout in the action client's SendGoal response handling on Humble.
For comparison, a later test (test_41) successfully creates the same action goal on the same action server and the gateway correctly logs Action goal accepted with ID: ... and returns 202.
Flakiness pattern:
- Fails only on Humble (
ubuntu:jammy) - passes consistently on Jazzy and Rolling - The same Humble job also failed on a previous run (#22103100298) on the
feat/91-rate-limitingbranch - Likely a race condition in ROS 2 Humble's
rclcpp_actionclient when sending a new goal while a previous goal on the same action server is still executing
Environment
- ros2_medkit version:
2531c67(main, 2026-02-17) - ROS 2 distro: Humble (fails) / Jazzy, Rolling (pass)
- OS: Ubuntu 22.04 (Jammy) in GitHub Actions CI
- CI run: https://github.com/selfpatch/ros2_medkit/actions/runs/22112666500/job/63912660542
Additional information
Failing job summary:
- Job:
build-and-test (humble, ubuntu:jammy)- failure - Job:
build-and-test (jazzy, ubuntu:noble)- success - Job:
build-and-test (rolling, ubuntu:noble)- success - Job:
coverage- success
Test results: 1260 total tests, 1 integration test failure (test_101_update_execution), 7 skipped
Possible fixes:
- Add a small delay or wait for the previous action to complete before creating a new execution in
test_101 - Cancel the previous execution explicitly at the start of
test_101 - Use a separate action server / different operation for
test_101to avoid contention - Investigate if the gateway's action client needs retry logic or better error handling for concurrent goals on Humble