fix: heal orphaned function calls to prevent 400 BadRequest crash loops #4055

donggyun112 · 2025-12-31T12:52:49Z

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

Closes: Persistent crash loop caused by missing tool_result in conversation history after interrupted execution #3971

2. Or, if no issue exists, describe the change:

Problem:

When execution is interrupted (e.g., server restart, browser refresh, connection loss) after a function_call but before the function_response is saved, the session becomes permanently unrecoverable. Anthropic and OpenAI APIs require tool_calls to be immediately followed by tool_results, so subsequent requests fail with 400 BadRequest, creating a crash loop.

Solution:

Detect orphaned function_calls (calls without matching responses) during content processing and inject synthetic error responses to gracefully recover the session.

Why this approach:

Two approaches were considered:

Approach	Description	Pros	Cons
1. Separation of Concerns	Separate `_find_orphaned_function_calls()` + `_create_synthetic_response_event()` functions, called after `_rearrange_events_for_async_function_responses_in_history()`	Clear responsibility separation, easier to test independently, self-documenting code	Extra O(N) event iteration, duplicates ID mapping logic already in rearrange function
2. Single-Pass Integration ✅	Extend `_rearrange_events_for_async_function_responses_in_history()` with `heal_orphaned_calls` param, detect orphaned calls during existing loop	Reuses existing `function_call_id_to_response_events_index` mapping, no duplicate iteration, better performance	Slightly increases function complexity, mixed responsibilities

Decision: Chose Approach 2 for the following reasons:

The existing _rearrange_events_for_async_function_responses_in_history() already builds a function_call_id_to_response_events_index mapping - reusing it avoids redundant work
Orphaned call detection is logically part of the "rearrangement" process (pairing calls with responses)
Avoids extra O(N) iteration over events
The heal_orphaned_calls=False default maintains backward compatibility

Testing Plan

Unit Tests:

I have added or updated unit tests for my change.
All unit tests pass locally.

$ uv run pytest tests/unittests/flows/llm_flows/test_contents_function.py -v
# 12 passed (7 existing + 5 new)

New test cases:

test_auto_healing_single_orphaned_function_call - single orphaned call
test_auto_healing_multiple_orphaned_function_calls - multiple orphaned calls in one event
test_auto_healing_partial_orphaned_function_calls - mix of completed and orphaned calls
test_auto_healing_no_healing_when_responses_exist - no false positives
test_auto_healing_logs_warning - warning log verification

Manual End-to-End (E2E) Tests:

Reproduced the issue using a test script that sends broken message history (tool_call without tool_result) to Anthropic/OpenAI/Gemini APIs. Before the fix, all non-Gemini models returned 400 BadRequest. After the fix, the synthetic error response allows the session to continue.

Checklist

I have read the CONTRIBUTING.md document.
I have performed a self-review of my own code.
I have commented my code, particularly in hard-to-understand areas.
I have added tests that prove my fix is effective or that my feature works.
New and existing unit tests pass locally with my changes.
I have manually tested my changes end-to-end.
Any dependent changes have been merged and published in downstream modules.

Additional context

Changes:

File	Description
`src/google/adk/flows/llm_flows/contents.py`	Add `_create_synthetic_response_for_orphaned_calls()` helper, extend `_rearrange_events_for_async_function_responses_in_history()` with `heal_orphaned_calls` parameter
`tests/unittests/flows/llm_flows/test_contents_function.py`	Add 5 test cases for auto-healing behavior

Key implementation details:

Synthetic response format: {'error': 'Tool execution was interrupted.'} (follows existing error response pattern in codebase)
Warning log: Auto-healing orphaned function_call (id=..., name=...) for debugging/monitoring
Location: As specified by maintainer, detection occurs around line 445 in _get_contents()

Known Limitations & Future Work:

Synthetic responses not persisted to session: Generated at LLM request time only, not saved to session storage. UI/logs/telemetry may still show orphaned calls as "pending". Future consideration: should synthetic events be persisted? This requires policy decision affecting session history integrity, replay scenarios, and multi-client sync.
Repeated warning logs: logger.warning() emitted each time _get_contents() processes an orphaned call. If session resumes multiple times before progressing, same warning repeats. Future options: persist synthetic responses, deduplicate by call ID, or demote to logger.info() after first occurrence.

These are intentionally left for future PRs to keep this fix focused and minimal.

google-cla · 2025-12-31T12:52:54Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

gemini-code-assist · 2025-12-31T12:53:04Z

Summary of Changes

Hello @donggyun112, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a significant stability issue where incomplete function call executions could lead to persistent session failures and API errors. By introducing an intelligent auto-healing mechanism, the system can now detect these unresponded function calls and automatically inject synthetic error messages, allowing the conversation flow to continue without crashing. This enhancement improves the robustness and user experience of interactions involving tool use with LLMs.

Highlights

Problem Addressed: Fixes a critical issue where interrupted function call executions (e.g., server restart, connection loss) leave sessions in an unrecoverable state, leading to 400 BadRequest crash loops with APIs like Anthropic and OpenAI.
Solution Implemented: Introduces an auto-healing mechanism that detects 'orphaned' function calls (calls without matching responses) during content processing and injects synthetic error responses to gracefully recover the session.
Implementation Approach: The auto-healing logic is integrated into the existing _rearrange_events_for_async_function_responses_in_history() function, reusing existing ID mapping and avoiding redundant iterations for better performance.
New Helper Function: A new helper function, _create_synthetic_response_for_orphaned_calls(), was added to generate the standardized error response {'error': 'Tool execution was interrupted.'}.
Testing: Five new unit tests were added to cover various scenarios including single, multiple, and partial orphaned calls, ensuring no false positives, and verifying warning log generation. Manual end-to-end tests confirmed the fix across different LLM APIs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

adk-bot · 2025-12-31T12:53:46Z

Response from ADK Triaging Agent

Hello @donggyun112, thank you for your contribution!

It looks like the Contributor License Agreement (CLA) check has failed. Before we can merge this pull request, you'll need to sign the CLA. You can find more information and sign it at https://cla.developers.google.com/.

Once the CLA is signed, we can proceed with the review. Thanks!

gemini-code-assist

Code Review

This pull request introduces a robust solution to handle orphaned function calls, preventing potential crash loops. The implementation is well-integrated into the existing logic, and the choice to extend _rearrange_events_for_async_function_responses_in_history is well-justified for performance reasons. The addition of comprehensive unit tests covering various scenarios, including partial failures and logging, is commendable and ensures the reliability of the fix. I have a couple of minor suggestions to improve maintainability and code style, but overall, this is an excellent contribution.

gemini-code-assist · 2025-12-31T12:54:27Z

src/google/adk/flows/llm_flows/contents.py

+    orphaned_calls: list[types.FunctionCall],
+) -> Event:
+  """Create synthetic error responses for orphaned function calls."""
+  error_response = {'error': 'Tool execution was interrupted.'}


For better maintainability, it's recommended to define this hardcoded dictionary as a module-level constant (e.g., _ORPHANED_CALL_ERROR_RESPONSE). This makes the error message easier to find and modify, and promotes consistency. Please define the constant at the module level and reference it here.

Suggested change

error_response = {'error': 'Tool execution was interrupted.'}

error_response = _ORPHANED_CALL_ERROR_RESPONSE

gemini-code-assist · 2025-12-31T12:54:28Z

tests/unittests/flows/llm_flows/test_contents_function.py

+async def test_auto_healing_logs_warning(caplog):
+  """Test that auto-healing logs a warning for each orphaned call."""
+  import logging
+


According to PEP 8, imports should be at the top of the file. Please move import logging to the file's top level to adhere to standard Python style conventions.

gemini-code-assist

Code Review

This pull request introduces a robust mechanism to handle "orphaned" function calls, which occur when an execution is interrupted after a tool call is made but before its response is recorded. By injecting synthetic error responses, this change prevents sessions from entering unrecoverable crash loops with APIs that enforce strict call-response pairing. The implementation is cleanly integrated into the existing event processing logic for efficiency, and it is supported by a comprehensive set of new unit tests that validate the healing behavior across various scenarios. My feedback includes a couple of suggestions to improve maintainability.

gemini-code-assist · 2025-12-31T12:55:48Z

src/google/adk/flows/llm_flows/contents.py

+def _create_synthetic_response_for_orphaned_calls(
+    event: Event,
+    orphaned_calls: list[types.FunctionCall],
+) -> Event:
+  """Create synthetic error responses for orphaned function calls."""
+  error_response = {'error': 'Tool execution was interrupted.'}
+  parts: list[types.Part] = []
+
+  for func_call in orphaned_calls:
+    logger.warning(
+        'Auto-healing orphaned function_call (id=%s, name=%s). '
+        'This indicates execution was interrupted before tool completion.',
+        func_call.id,
+        func_call.name,
+    )
+    part = types.Part.from_function_response(
+        name=func_call.name,
+        response=error_response,
+    )
+    part.function_response.id = func_call.id
+    parts.append(part)
+
+  return Event(
+      invocation_id=event.invocation_id,
+      author=event.author,
+      content=types.Content(role='user', parts=parts),
+      branch=event.branch,
+  )


In this new function, the error response dictionary (line 85) and the warning log message format string (lines 89-91) are hardcoded. It's a best practice to define these as module-level constants to improve maintainability, reusability, and readability.

For example:

# At module level _SYNTHETIC_ERROR_RESPONSE = {'error': 'Tool execution was interrupted.'} _ORPHANED_CALL_HEALING_WARNING_MSG = ( 'Auto-healing orphaned function_call (id=%s, name=%s). ' 'This indicates execution was interrupted before tool completion.' )

You can then reference these constants within the _create_synthetic_response_for_orphaned_calls function.

gemini-code-assist · 2025-12-31T12:55:48Z

tests/unittests/flows/llm_flows/test_contents_function.py

+@pytest.mark.asyncio
+async def test_auto_healing_logs_warning(caplog):
+  """Test that auto-healing logs a warning for each orphaned call."""
+  import logging


The import logging statement is located inside the test function. Per PEP 8 style guidelines, all imports should be placed at the top of the file. This improves code readability and makes dependencies clear. Please move this import to the file's header.

gemini-code-assist

Code Review

This pull request provides a well-designed fix for a critical issue where orphaned function calls could lead to unrecoverable crash loops. The approach of injecting synthetic error responses is robust, and integrating it into the existing _rearrange_events_for_async_function_responses_in_history function is an efficient choice that avoids redundant processing. The changes are clearly documented, and the new unit tests are comprehensive, covering various scenarios including single, multiple, and partial orphans, which ensures the fix is both effective and safe from regressions. Overall, this is a high-quality contribution. I have a couple of minor suggestions to make the tests even more specific.

gemini-code-assist · 2025-12-31T12:58:17Z

tests/unittests/flows/llm_flows/test_contents_function.py

+  synthetic_response = synthetic_content.parts[0].function_response
+  assert synthetic_response.id == "orphaned_123"
+  assert synthetic_response.name == "get_weather"
+  assert "error" in synthetic_response.response


This assertion correctly checks for the presence of an error key, but it could be more specific by also validating its value. Asserting against the exact error response dictionary (_ORPHANED_CALL_ERROR_RESPONSE) will make the test more robust and ensures the correct error message is injected.

Suggested change

assert "error" in synthetic_response.response

assert synthetic_response.response == contents._ORPHANED_CALL_ERROR_RESPONSE

gemini-code-assist · 2025-12-31T12:58:18Z

tests/unittests/flows/llm_flows/test_contents_function.py

+  # Fourth content should be the synthetic response for orphaned call
+  synthetic_content = llm_request.contents[3]
+  assert synthetic_content.parts[0].function_response.id == "orphaned_456"
+  assert "error" in synthetic_content.parts[0].function_response.response


Similar to the suggestion in the other test, this assertion can be made more specific. By comparing against the exact error response dictionary, the test becomes more robust and clearly documents the expected error structure and message for an orphaned call.

Suggested change

assert "error" in synthetic_content.parts[0].function_response.response

assert synthetic_content.parts[0].function_response.response == contents._ORPHANED_CALL_ERROR_RESPONSE

gemini-code-assist

Code Review

This is an excellent pull request that introduces a robust "auto-healing" mechanism for orphaned function calls. The solution effectively prevents unrecoverable session crash loops by injecting synthetic error responses, a critical fix for integrations with APIs like Anthropic and OpenAI. The chosen implementation approach is efficient, integrating the logic into an existing event processing function to avoid redundant work. The changes are well-contained, backward-compatible, and thoroughly validated with a comprehensive suite of new unit tests that cover various success and failure scenarios. The code is clear, well-documented, and demonstrates a thoughtful approach to solving a tricky state-related problem.

When execution is interrupted (e.g., server restart or connection loss) after a function_call but before the function_response is saved, the session becomes unrecoverable because Anthropic/OpenAI require tool_calls to be immediately followed by tool_results. This change detects orphaned function_calls and injects synthetic error responses to gracefully recover the session. Changes: - Add _ORPHANED_CALL_ERROR_RESPONSE constant for error responses - Add _create_synthetic_response_for_orphaned_calls helper function - Extend _rearrange_events_for_async_function_responses_in_history with heal_orphaned_calls parameter - Add 5 comprehensive test cases for auto-healing behavior Fixes google#3971

adk-bot added the core [Component] This issue is related to the core interface and implementation label Dec 31, 2025

gemini-code-assist bot reviewed Dec 31, 2025

View reviewed changes

donggyun112 force-pushed the fix/orphaned-tool-calls-crash-loop branch from f6b31dd to 9c50f84 Compare December 31, 2025 13:04

gemini-code-assist bot reviewed Dec 31, 2025

View reviewed changes

donggyun112 force-pushed the fix/orphaned-tool-calls-crash-loop branch from 9c50f84 to a9643c1 Compare December 31, 2025 13:51

donggyun112 force-pushed the fix/orphaned-tool-calls-crash-loop branch from a9643c1 to f68c225 Compare December 31, 2025 13:53

donggyun112 closed this Dec 31, 2025

donggyun112 deleted the fix/orphaned-tool-calls-crash-loop branch December 31, 2025 13:54

donggyun112 restored the fix/orphaned-tool-calls-crash-loop branch December 31, 2025 13:54

ryanaiagent self-assigned this Jan 1, 2026

	error_response = {'error': 'Tool execution was interrupted.'}
	error_response = _ORPHANED_CALL_ERROR_RESPONSE

	assert "error" in synthetic_response.response
	assert synthetic_response.response == contents._ORPHANED_CALL_ERROR_RESPONSE

	assert "error" in synthetic_content.parts[0].function_response.response
	assert synthetic_content.parts[0].function_response.response == contents._ORPHANED_CALL_ERROR_RESPONSE

fix: heal orphaned function calls to prevent 400 BadRequest crash loops #4055

fix: heal orphaned function calls to prevent 400 BadRequest crash loops #4055

Conversation

donggyun112 commented Dec 31, 2025

Link to Issue or Description of Change

Testing Plan

Checklist

Additional context

Uh oh!

google-cla bot commented Dec 31, 2025

Uh oh!

gemini-code-assist bot commented Dec 31, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

adk-bot commented Dec 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants