-
Notifications
You must be signed in to change notification settings - Fork 2.7k
fix: heal orphaned function calls to prevent 400 BadRequest crash loops #4055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: heal orphaned function calls to prevent 400 BadRequest crash loops #4055
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @donggyun112, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a significant stability issue where incomplete function call executions could lead to persistent session failures and API errors. By introducing an intelligent auto-healing mechanism, the system can now detect these unresponded function calls and automatically inject synthetic error messages, allowing the conversation flow to continue without crashing. This enhancement improves the robustness and user experience of interactions involving tool use with LLMs. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Response from ADK Triaging Agent Hello @donggyun112, thank you for your contribution! It looks like the Contributor License Agreement (CLA) check has failed. Before we can merge this pull request, you'll need to sign the CLA. You can find more information and sign it at https://cla.developers.google.com/. Once the CLA is signed, we can proceed with the review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a robust solution to handle orphaned function calls, preventing potential crash loops. The implementation is well-integrated into the existing logic, and the choice to extend _rearrange_events_for_async_function_responses_in_history is well-justified for performance reasons. The addition of comprehensive unit tests covering various scenarios, including partial failures and logging, is commendable and ensures the reliability of the fix. I have a couple of minor suggestions to improve maintainability and code style, but overall, this is an excellent contribution.
| orphaned_calls: list[types.FunctionCall], | ||
| ) -> Event: | ||
| """Create synthetic error responses for orphaned function calls.""" | ||
| error_response = {'error': 'Tool execution was interrupted.'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better maintainability, it's recommended to define this hardcoded dictionary as a module-level constant (e.g., _ORPHANED_CALL_ERROR_RESPONSE). This makes the error message easier to find and modify, and promotes consistency. Please define the constant at the module level and reference it here.
| error_response = {'error': 'Tool execution was interrupted.'} | |
| error_response = _ORPHANED_CALL_ERROR_RESPONSE |
| async def test_auto_healing_logs_warning(caplog): | ||
| """Test that auto-healing logs a warning for each orphaned call.""" | ||
| import logging | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a robust mechanism to handle "orphaned" function calls, which occur when an execution is interrupted after a tool call is made but before its response is recorded. By injecting synthetic error responses, this change prevents sessions from entering unrecoverable crash loops with APIs that enforce strict call-response pairing. The implementation is cleanly integrated into the existing event processing logic for efficiency, and it is supported by a comprehensive set of new unit tests that validate the healing behavior across various scenarios. My feedback includes a couple of suggestions to improve maintainability.
| def _create_synthetic_response_for_orphaned_calls( | ||
| event: Event, | ||
| orphaned_calls: list[types.FunctionCall], | ||
| ) -> Event: | ||
| """Create synthetic error responses for orphaned function calls.""" | ||
| error_response = {'error': 'Tool execution was interrupted.'} | ||
| parts: list[types.Part] = [] | ||
|
|
||
| for func_call in orphaned_calls: | ||
| logger.warning( | ||
| 'Auto-healing orphaned function_call (id=%s, name=%s). ' | ||
| 'This indicates execution was interrupted before tool completion.', | ||
| func_call.id, | ||
| func_call.name, | ||
| ) | ||
| part = types.Part.from_function_response( | ||
| name=func_call.name, | ||
| response=error_response, | ||
| ) | ||
| part.function_response.id = func_call.id | ||
| parts.append(part) | ||
|
|
||
| return Event( | ||
| invocation_id=event.invocation_id, | ||
| author=event.author, | ||
| content=types.Content(role='user', parts=parts), | ||
| branch=event.branch, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this new function, the error response dictionary (line 85) and the warning log message format string (lines 89-91) are hardcoded. It's a best practice to define these as module-level constants to improve maintainability, reusability, and readability.
For example:
# At module level
_SYNTHETIC_ERROR_RESPONSE = {'error': 'Tool execution was interrupted.'}
_ORPHANED_CALL_HEALING_WARNING_MSG = (
'Auto-healing orphaned function_call (id=%s, name=%s). '
'This indicates execution was interrupted before tool completion.'
)You can then reference these constants within the _create_synthetic_response_for_orphaned_calls function.
| @pytest.mark.asyncio | ||
| async def test_auto_healing_logs_warning(caplog): | ||
| """Test that auto-healing logs a warning for each orphaned call.""" | ||
| import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a well-designed fix for a critical issue where orphaned function calls could lead to unrecoverable crash loops. The approach of injecting synthetic error responses is robust, and integrating it into the existing _rearrange_events_for_async_function_responses_in_history function is an efficient choice that avoids redundant processing. The changes are clearly documented, and the new unit tests are comprehensive, covering various scenarios including single, multiple, and partial orphans, which ensures the fix is both effective and safe from regressions. Overall, this is a high-quality contribution. I have a couple of minor suggestions to make the tests even more specific.
| synthetic_response = synthetic_content.parts[0].function_response | ||
| assert synthetic_response.id == "orphaned_123" | ||
| assert synthetic_response.name == "get_weather" | ||
| assert "error" in synthetic_response.response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion correctly checks for the presence of an error key, but it could be more specific by also validating its value. Asserting against the exact error response dictionary (_ORPHANED_CALL_ERROR_RESPONSE) will make the test more robust and ensures the correct error message is injected.
| assert "error" in synthetic_response.response | |
| assert synthetic_response.response == contents._ORPHANED_CALL_ERROR_RESPONSE |
| # Fourth content should be the synthetic response for orphaned call | ||
| synthetic_content = llm_request.contents[3] | ||
| assert synthetic_content.parts[0].function_response.id == "orphaned_456" | ||
| assert "error" in synthetic_content.parts[0].function_response.response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the suggestion in the other test, this assertion can be made more specific. By comparing against the exact error response dictionary, the test becomes more robust and clearly documents the expected error structure and message for an orphaned call.
| assert "error" in synthetic_content.parts[0].function_response.response | |
| assert synthetic_content.parts[0].function_response.response == contents._ORPHANED_CALL_ERROR_RESPONSE |
f6b31dd to
9c50f84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent pull request that introduces a robust "auto-healing" mechanism for orphaned function calls. The solution effectively prevents unrecoverable session crash loops by injecting synthetic error responses, a critical fix for integrations with APIs like Anthropic and OpenAI. The chosen implementation approach is efficient, integrating the logic into an existing event processing function to avoid redundant work. The changes are well-contained, backward-compatible, and thoroughly validated with a comprehensive suite of new unit tests that cover various success and failure scenarios. The code is clear, well-documented, and demonstrates a thoughtful approach to solving a tricky state-related problem.
9c50f84 to
a9643c1
Compare
When execution is interrupted (e.g., server restart or connection loss) after a function_call but before the function_response is saved, the session becomes unrecoverable because Anthropic/OpenAI require tool_calls to be immediately followed by tool_results. This change detects orphaned function_calls and injects synthetic error responses to gracefully recover the session. Changes: - Add _ORPHANED_CALL_ERROR_RESPONSE constant for error responses - Add _create_synthetic_response_for_orphaned_calls helper function - Extend _rearrange_events_for_async_function_responses_in_history with heal_orphaned_calls parameter - Add 5 comprehensive test cases for auto-healing behavior Fixes google#3971
a9643c1 to
f68c225
Compare
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
2. Or, if no issue exists, describe the change:
Problem:
When execution is interrupted (e.g., server restart, browser refresh, connection loss) after a
function_callbut before thefunction_responseis saved, the session becomes permanently unrecoverable. Anthropic and OpenAI APIs requiretool_callsto be immediately followed bytool_results, so subsequent requests fail with 400 BadRequest, creating a crash loop.Solution:
Detect orphaned function_calls (calls without matching responses) during content processing and inject synthetic error responses to gracefully recover the session.
Why this approach:
Two approaches were considered:
_find_orphaned_function_calls()+_create_synthetic_response_event()functions, called after_rearrange_events_for_async_function_responses_in_history()_rearrange_events_for_async_function_responses_in_history()withheal_orphaned_callsparam, detect orphaned calls during existing loopfunction_call_id_to_response_events_indexmapping, no duplicate iteration, better performanceDecision: Chose Approach 2 for the following reasons:
_rearrange_events_for_async_function_responses_in_history()already builds afunction_call_id_to_response_events_indexmapping - reusing it avoids redundant workheal_orphaned_calls=Falsedefault maintains backward compatibilityTesting Plan
Unit Tests:
$ uv run pytest tests/unittests/flows/llm_flows/test_contents_function.py -v # 12 passed (7 existing + 5 new)New test cases:
test_auto_healing_single_orphaned_function_call- single orphaned calltest_auto_healing_multiple_orphaned_function_calls- multiple orphaned calls in one eventtest_auto_healing_partial_orphaned_function_calls- mix of completed and orphaned callstest_auto_healing_no_healing_when_responses_exist- no false positivestest_auto_healing_logs_warning- warning log verificationManual End-to-End (E2E) Tests:
Reproduced the issue using a test script that sends broken message history (tool_call without tool_result) to Anthropic/OpenAI/Gemini APIs. Before the fix, all non-Gemini models returned 400 BadRequest. After the fix, the synthetic error response allows the session to continue.
Checklist
Additional context
Changes:
src/google/adk/flows/llm_flows/contents.py_create_synthetic_response_for_orphaned_calls()helper, extend_rearrange_events_for_async_function_responses_in_history()withheal_orphaned_callsparametertests/unittests/flows/llm_flows/test_contents_function.pyKey implementation details:
{'error': 'Tool execution was interrupted.'}(follows existing error response pattern in codebase)Auto-healing orphaned function_call (id=..., name=...)for debugging/monitoring_get_contents()Known Limitations & Future Work:
Synthetic responses not persisted to session: Generated at LLM request time only, not saved to session storage. UI/logs/telemetry may still show orphaned calls as "pending". Future consideration: should synthetic events be persisted? This requires policy decision affecting session history integrity, replay scenarios, and multi-client sync.
Repeated warning logs:
logger.warning()emitted each time_get_contents()processes an orphaned call. If session resumes multiple times before progressing, same warning repeats. Future options: persist synthetic responses, deduplicate by call ID, or demote tologger.info()after first occurrence.These are intentionally left for future PRs to keep this fix focused and minimal.