Skip to content

[BUG]: optimize_data_file=True does not persist inline_data replacement - CSV sent on every LLM call #4012

@osushinekotan

Description

@osushinekotan

Describe the bug
When optimize_data_file=True is set on a custom CodeExecutor (extending BaseCodeExecutor), the inline_data (CSV file) is not permanently replaced with the text placeholder "Available file: 'xxx.csv'". Instead, the original inline_data is restored on every subsequent LLM call, causing the full CSV to be sent to the LLM every turn.

This means the token-saving feature of optimize_data_file does not work as intended.

To Reproduce

  1. Create a custom CodeExecutor extending BaseCodeExecutor with optimize_data_file=True
  2. Upload a CSV file and send a message to the agent
  3. Observe debug logs showing inline_data being replaced with "Available file: ..." in _extract_and_replace_inline_files
  4. On the next turn, observe that inline_data has returned (the full CSV is sent again)

Debug logs show:

=== AFTER _extract_and_replace_inline_files ===
[0][1] replaced with: Available file: `data_1_2.csv`

=== LLM Request Contents (in before_model_callback) ===
[0][1] inline_data: mime=text/csv, size=61194   ← inline_data is back!

Additionally, comparing object IDs reveals that llm_request.contents[0] is a different object between _extract_and_replace_inline_files and before_model_callback:

=== _extract_and_replace_inline_files ===
contents[0] id: 5998228768   ← replaced this object

=== before_model_callback ===
contents[0] id: 4631206624   ← different object!

Expected behavior
Once _extract_and_replace_inline_files replaces inline_data with "Available file: 'xxx.csv'", this replacement should persist for all subsequent LLM calls in the session. The full CSV should not be sent to the LLM on every turn.

Root Cause (Hypothesis)
Based on code analysis, the suspected cause is:

  1. _code_execution.request_processor yields Events (for Processing input file and code_execution_result)
  2. These Events are not is_final_response(), so the while True loop in run_async() (base_llm_flow.py:367) continues
  3. _run_one_step_async() creates a new LlmRequest() on each iteration (base_llm_flow.py:383)
  4. contents.request_processor rebuilds llm_request.contents by calling copy.deepcopy(event.content) from session.events (contents.py:444)
  5. Since the original session.events[0].content was never modified (only the llm_request.contents copy was), the inline_data is restored

Desktop (please complete the following information):

  • OS: macOS
  • Python version: 3.12
  • ADK version: 1.21.0

Model Information:

  • Are you using LiteLLM: Yes
  • Which model is being used: Claude (via Bedrock)

Metadata

Metadata

Assignees

No one assigned

    Labels

    core[Component] This issue is related to the core interface and implementation

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions