Skip to content

Comments

feat: add ask_user tool with host-side LLM interception#1

Open
yash-scaleai wants to merge 3 commits intoscaleapi:scale-customizationsfrom
yash-scaleai:yash/ask-user-host-interception
Open

feat: add ask_user tool with host-side LLM interception#1
yash-scaleai wants to merge 3 commits intoscaleapi:scale-customizationsfrom
yash-scaleai:yash/ask-user-host-interception

Conversation

@yash-scaleai
Copy link

@yash-scaleai yash-scaleai commented Jan 28, 2026

Summary

Adds an ask_user tool that allows agents to request clarification on underspecified tasks. The tool simulates user responses using an LLM based on the complete task definition.

Key changes:

  • Add ask_user tool in tools/ask_user/
  • Add TaskDefinitionInjectionHook to inject task definitions into containers
  • Auto-detect task_definitions.json in run_batch.py
  • Critical fix: Intercept ask_user commands on HOST side in agents.py

Why host-side interception?

The original implementation (Bryan's PR #17144) runs the LLM call inside the Modal container. However, Modal containers cannot reach internal API endpoints like litellm.ml-serving-internal.scale.com.

This PR intercepts ask_user commands in agents.py before they are sent to the container, makes the LLM call on the host (where the agent's own LLM calls are made), and returns the response directly.

Flow:

1. Agent outputs: ask_user "what's the cert validation behavior?"
2. Host intercepts command in handle_action()
3. Host makes LLM call (can reach internal API)
4. Response returned directly to agent
5. Container's ask_user binary never runs

Based on Bryan's PR #17144 with architectural modification for internal API compatibility.

Test plan

  • Single task test with scripts/test_ask_user_single.py
  • Verified ask_user interception in logs
  • Verified response in trajectory file

🤖 Generated with Claude Code

Greptile Summary

Adds an ask_user tool that lets agents request clarification on underspecified tasks by simulating user responses via an LLM. The key architectural decision is to intercept ask_user commands on the host side (in agents.py) rather than inside the container, since Modal containers cannot reach internal API endpoints.

  • Adds host-side interception in DefaultAgent.handle_action() that parses ask_user commands, loads task definitions from a JSON file, and calls litellm.completion to simulate a user response
  • Adds TaskDefinitionInjectionHook to inject task definitions into containers (for the in-container fallback binary)
  • Auto-detects task_definitions.json in run_batch.py and registers the hook
  • Includes container-side ask_user binary and llm.py helper (effectively dead code since host intercepts first)
  • New tool_use_with_ask_user.yaml config variant

Issues found:

  • litellm.drop_params = True in agents.py:189 is a global mutation that permanently affects all litellm.completion calls in the process, including the main agent model calls. This is especially problematic in multi-worker mode.
  • Silent exception swallowing in _load_task_definitions (except Exception: pass) makes it impossible to debug task definition loading failures.
  • Unnecessary credential injection into the container environment (OPENAI_API_KEY, LLM_API_KEY) despite host-side interception making the in-container binary redundant — this is unnecessary credential exposure.

Confidence Score: 2/5

  • The litellm.drop_params global mutation can silently alter behavior of the main agent LLM calls across all workers.
  • Score of 2 reflects the global side-effect bug with litellm.drop_params which can affect correctness of the main agent model calls, combined with silent error swallowing that will make production debugging difficult. The core interception logic is sound but these issues need fixing before merge.
  • Pay close attention to sweagent/agent/agents.py (global state mutation, silent error handling) and sweagent/run/hooks/task_definition_injection.py (unnecessary credential exposure).

Important Files Changed

Filename Overview
sweagent/agent/agents.py Adds ~180 lines for host-side ask_user interception. Contains a global side-effect bug (litellm.drop_params = True) that affects all LLM calls, and silently swallows task definition load errors.
sweagent/run/hooks/task_definition_injection.py New RunHook that injects task definitions into containers. Unnecessarily injects API credentials into the container despite host-side interception making the in-container binary redundant.
sweagent/run/run_batch.py Auto-detects task_definitions.json in output_dir and adds the TaskDefinitionInjectionHook. Clean integration following existing hook patterns.
tools/ask_user/bin/ask_user Container-side ask_user binary (now effectively dead code since host intercepts). Contains debug print statements that leak API base URL to stderr.
tools/ask_user/bin/llm.py Simple OpenAI client wrapper for the container-side tool. Module-level client singleton without thread-safety, but acceptable since this only runs in single-threaded container context.
tools/ask_user/config.yaml Tool definition for ask_user with question (required) and context (optional) arguments. Correctly follows the existing tool config schema.
config/tool_use_with_ask_user.yaml New agent config variant adding the ask_user tool bundle alongside existing tool bundles. Clean config following existing patterns.

Sequence Diagram

sequenceDiagram
    participant Agent as Agent LLM
    participant Host as Host (agents.py)
    participant Cache as Task Definitions Cache
    participant LiteLLM as LiteLLM (User Simulator)
    participant Container as Container (never reached)

    Agent->>Host: ask_user "question" "context"
    Host->>Host: _parse_ask_user_command()
    Host->>Cache: _load_task_definitions(traj_path)
    Cache-->>Host: task_defs dict
    Host->>Host: Build system prompt with primary_task + underspecified_task
    Host->>LiteLLM: litellm.completion(model, messages)
    LiteLLM-->>Host: Simulated user response
    Host-->>Agent: step.observation = response
    Note over Container: Container binary never executes<br/>(host intercepts first)
Loading

Last reviewed commit: 36d9a40

(4/5) You can add custom instructions or style guidelines for the agent here!

Adds an ask_user tool that allows agents to request clarification on
underspecified tasks. The tool simulates user responses using an LLM
based on the complete task definition.

Key changes:
- Add ask_user tool in tools/ask_user/
- Add TaskDefinitionInjectionHook to inject task definitions into containers
- Auto-detect task_definitions.json in run_batch.py
- **Critical fix**: Intercept ask_user commands on HOST side in agents.py

The host-side interception is necessary because Modal containers cannot
reach internal API endpoints (e.g., litellm.ml-serving-internal.scale.com).
By handling the LLM call on the host (where the agent's own LLM calls are
made), we avoid timeout issues while maintaining the same functionality.

Based on Bryan's PR #17144 with architectural modification for internal
API compatibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@socket-security
Copy link

socket-security bot commented Jan 28, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedopenai@​2.21.096100100100100
Updatedswe-rex@​1.2.0 ⏵ 1.4.098100100100100

View full report

yash-scaleai and others added 2 commits February 20, 2026 00:50
Includes post-experiment working tree fixes (traj_path refactor,
_find_task_definitions_file search) that ran during Phase B.

Safety fixes (no behavior change):
- Add threading.Lock around task definitions cache
- Tighten _parse_ask_user_command to exact word boundary
- Guard result against None before slicing for log
- Remove no-op @Retry(stop_after_attempt(1)) in container llm.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Infrastructure fixes applied during experiment runs:
- pyproject.toml: use jeff-da/SWE-ReX sweap-support branch for Modal
- models.py: handle top_p=None in model ID string (was crashing)
- justfile: mount data/ volume for container access

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

try:
logger.info(f"ask_user intercepted on host: question='{question[:100]}...'")
# Drop unsupported params for models like GPT-5 that don't support temperature
litellm.drop_params = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global side-effect on all litellm calls

litellm.drop_params is a module-level global. Setting it to True here permanently affects all subsequent litellm.completion calls in the process, including the main agent model calls in models.py:693. This means unsupported parameters will be silently dropped for the agent's own LLM calls too, which could mask configuration errors or change behavior unexpectedly.

This is especially problematic in multi-worker mode (ThreadPoolExecutor in run_batch.py:288) where a single ask_user call from one worker mutates global state for all workers.

Consider passing drop_params as a per-call parameter instead:

Suggested change
litellm.drop_params = True
litellm.drop_params = True # TODO: this mutates global state; move to per-call param if litellm supports it

Or scope the change more carefully by saving and restoring the original value.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sweagent/agent/agents.py
Line: 189

Comment:
**Global side-effect on all litellm calls**

`litellm.drop_params` is a module-level global. Setting it to `True` here permanently affects all subsequent `litellm.completion` calls in the process, including the main agent model calls in `models.py:693`. This means unsupported parameters will be silently dropped for the agent's own LLM calls too, which could mask configuration errors or change behavior unexpectedly.

This is especially problematic in multi-worker mode (`ThreadPoolExecutor` in `run_batch.py:288`) where a single `ask_user` call from one worker mutates global state for all workers.

Consider passing `drop_params` as a per-call parameter instead:

```suggestion
        litellm.drop_params = True  # TODO: this mutates global state; move to per-call param if litellm supports it
```

Or scope the change more carefully by saving and restoring the original value.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +111 to +112
except Exception:
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent exception swallowing hides load failures

The bare except Exception: pass here silently swallows all errors when loading task definitions (malformed JSON, permission errors, encoding issues, etc.). When ask_user later fails with "No task definition found for instance X", there will be no indication that the file existed but couldn't be parsed.

At minimum, log the exception:

Suggested change
except Exception:
pass
except Exception as e:
logging.getLogger(__name__).warning("Failed to load task definitions from %s: %s", task_def_file, e)
Prompt To Fix With AI
This is a comment left during a code review.
Path: sweagent/agent/agents.py
Line: 111-112

Comment:
**Silent exception swallowing hides load failures**

The bare `except Exception: pass` here silently swallows all errors when loading task definitions (malformed JSON, permission errors, encoding issues, etc.). When `ask_user` later fails with "No task definition found for instance X", there will be no indication that the file existed but couldn't be parsed.

At minimum, log the exception:

```suggestion
        except Exception as e:
            logging.getLogger(__name__).warning("Failed to load task definitions from %s: %s", task_def_file, e)
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +104 to +116
# Pass through API credentials for the user simulator LLM
api_key = os.environ.get("OPENAI_API_KEY") or os.environ.get("LLM_API_KEY")
base_url = os.environ.get("OPENAI_BASE_URL") or os.environ.get("LLM_BASE_URL")
simulator_model = os.environ.get("USER_SIMULATOR_MODEL")

if api_key:
env_vars["OPENAI_API_KEY"] = api_key
env_vars["LLM_API_KEY"] = api_key
if base_url:
env_vars["OPENAI_BASE_URL"] = base_url
env_vars["LLM_BASE_URL"] = base_url
if simulator_model:
env_vars["USER_SIMULATOR_MODEL"] = simulator_model
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary credential injection into container

Since ask_user commands are now intercepted on the host side (in agents.py) before reaching the container, the container's ask_user binary never actually runs. Injecting API keys (OPENAI_API_KEY, LLM_API_KEY) into the container environment is unnecessary credential exposure. If the container is compromised or the agent runs arbitrary code, these credentials could be exfiltrated.

Consider removing the API credential injection since the host-side interception makes it redundant, or add a comment explaining why it's still needed (e.g., as a fallback).

Prompt To Fix With AI
This is a comment left during a code review.
Path: sweagent/run/hooks/task_definition_injection.py
Line: 104-116

Comment:
**Unnecessary credential injection into container**

Since `ask_user` commands are now intercepted on the host side (in `agents.py`) before reaching the container, the container's `ask_user` binary never actually runs. Injecting API keys (`OPENAI_API_KEY`, `LLM_API_KEY`) into the container environment is unnecessary credential exposure. If the container is compromised or the agent runs arbitrary code, these credentials could be exfiltrated.

Consider removing the API credential injection since the host-side interception makes it redundant, or add a comment explaining why it's still needed (e.g., as a fallback).

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant