Skip to content

run_live() HITL behavior inconsistency between Text and Audio modes #4002

@bashiryounis

Description

@bashiryounis

Description

When using run_live() with LongRunningFunctionTool and request_confirmation(), there are two issues with inconsistent behavior between text and audio input modes.

Environment

  • ADK Version: latest
  • Python Version: 3.12
  • Models tested:
    • gemini-2.0-flash-live (text mode)
    • gemini-2.5-flash-native-audio-preview-09-2025 (audio mode)
  • Streaming Mode: StreamingMode.BIDI
  • Tool Configuration: LongRunningFunctionTool(func=book_hotel)

Issue 1: HITL Detection - long_running_tool_ids not populated in audio mode

Mode long_running_tool_ids in event Detection Method
Text ✅ Populated Detected via event.long_running_tool_ids
Audio ❌ Not populated (always None) Must fallback to polling session.events

When using text input via send_content(), the long_running_tool_ids field is correctly populated in the yielded events, allowing real-time HITL detection.

When using audio input via send_realtime(), the long_running_tool_ids field is NOT populated in events. The tool executes (confirmed via logs inside the tool function), but the HITL indicator is missing from the event stream.

Log evidence (audio mode):
[Downstream DEBUG] Event #336 received [Downstream DEBUG] - author: live_hotel_booking_agent [Downstream DEBUG] - long_running_tool_ids: None [Downstream DEBUG] - actions: ... requested_tool_confirmations={}

Issue 2: FunctionResponse not processed in audio mode

Mode FunctionResponse via send_content() Result
Text ✅ Works Agent receives confirmation, processes it, returns response
Audio ❌ Does not work Agent returns empty turnComplete, no response generated

When resuming after HITL approval:

Text mode: Sending a FunctionResponse via send_content() works correctly. The agent processes the confirmation and continues with its response.

Audio mode: Sending the same FunctionResponse via send_content() does NOT work. The agent acknowledges the turn but returns empty content.

Log evidence (audio mode after sending FunctionResponse):

[HITL Handler] Sending FunctionResponse to live_request_queue: approved=True
Deprecation warning shows the message was sent:
DeprecationWarning: The session.send method is deprecated... Please use one of the more specific methods: send_client_content, send_realtime_input, or send_tool_response instead.
But agent returns empty turnComplete:
[Downstream DEBUG] Event #338 received [Downstream DEBUG] - author: live_hotel_booking_agent [Downstream DEBUG] - long_running_tool_ids: None [Downstream DEBUG] - actions: ... requestedToolConfirmations: {} [Downstream DEBUG] - content: None or no parts [Downstream DEBUG] - AFTER HITL (hitl was at event #336)
Final event shows turnComplete but NO response content:
{'turnComplete': True, 'invocationId': '...', 'author': 'live_hotel_booking_agent', ...}

Potential Root Cause

The deprecation warning suggests using send_tool_response for FunctionResponse:
Please use one of the more specific methods: send_client_content, send_realtime_input, or send_tool_response instead.

However, LiveRequestQueue only exposes these methods:

  • send_content() - for text/content
  • send_realtime() - for audio blobs
  • send() - deprecated generic method

send_tool_response() is NOT available on LiveRequestQueue, so there's no way to properly send a FunctionResponse in audio mode.

Expected Behavior

  1. long_running_tool_ids should be populated in events for both text and audio input modes
  2. FunctionResponse sent via send_content() should be processed by the agent in both modes
  3. OR LiveRequestQueue should expose a send_tool_response() method for sending FunctionResponse

Workarounds Currently Used

  1. For HITL detection: Poll session.events for requested_tool_confirmations as fallback (works for both modes)
  2. For audio resume: No working workaround found - FunctionResponse is ignored in audio mode

Reproduction Steps

  1. Create agent with LongRunningFunctionTool:
from google.adk.tools.long_running_tool import LongRunningFunctionTool

live_booking_agent = Agent(
    model="gemini-2.5-flash-native-audio-preview-09-2025",
    tools=[LongRunningFunctionTool(func=book_hotel)],
)
  1. Tool calls request_confirmation():
def book_hotel(tool_context: ToolContext, ...):
    tool_context.request_confirmation(
        hint="Please confirm the booking",
        payload={"status": "pending", "invoice": {...}},
    )
    return {"status": "pending"}
  1. Use run_live() with audio input via send_realtime()
  2. Observe long_running_tool_ids is None in all events
  3. Send FunctionResponse via send_content() after external approval
  4. Observe agent returns empty turnComplete without processing the response

Metadata

Metadata

Assignees

No one assigned

    Labels

    live[Component] This issue is related to live, voice and video chat

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions