feat: add ask_user tool with host-side LLM interception#1
feat: add ask_user tool with host-side LLM interception#1yash-scaleai wants to merge 3 commits intoscaleapi:scale-customizationsfrom
Conversation
Adds an ask_user tool that allows agents to request clarification on underspecified tasks. The tool simulates user responses using an LLM based on the complete task definition. Key changes: - Add ask_user tool in tools/ask_user/ - Add TaskDefinitionInjectionHook to inject task definitions into containers - Auto-detect task_definitions.json in run_batch.py - **Critical fix**: Intercept ask_user commands on HOST side in agents.py The host-side interception is necessary because Modal containers cannot reach internal API endpoints (e.g., litellm.ml-serving-internal.scale.com). By handling the LLM call on the host (where the agent's own LLM calls are made), we avoid timeout issues while maintaining the same functionality. Based on Bryan's PR #17144 with architectural modification for internal API compatibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Includes post-experiment working tree fixes (traj_path refactor, _find_task_definitions_file search) that ran during Phase B. Safety fixes (no behavior change): - Add threading.Lock around task definitions cache - Tighten _parse_ask_user_command to exact word boundary - Guard result against None before slicing for log - Remove no-op @Retry(stop_after_attempt(1)) in container llm.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Infrastructure fixes applied during experiment runs: - pyproject.toml: use jeff-da/SWE-ReX sweap-support branch for Modal - models.py: handle top_p=None in model ID string (was crashing) - justfile: mount data/ volume for container access Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| try: | ||
| logger.info(f"ask_user intercepted on host: question='{question[:100]}...'") | ||
| # Drop unsupported params for models like GPT-5 that don't support temperature | ||
| litellm.drop_params = True |
There was a problem hiding this comment.
Global side-effect on all litellm calls
litellm.drop_params is a module-level global. Setting it to True here permanently affects all subsequent litellm.completion calls in the process, including the main agent model calls in models.py:693. This means unsupported parameters will be silently dropped for the agent's own LLM calls too, which could mask configuration errors or change behavior unexpectedly.
This is especially problematic in multi-worker mode (ThreadPoolExecutor in run_batch.py:288) where a single ask_user call from one worker mutates global state for all workers.
Consider passing drop_params as a per-call parameter instead:
| litellm.drop_params = True | |
| litellm.drop_params = True # TODO: this mutates global state; move to per-call param if litellm supports it |
Or scope the change more carefully by saving and restoring the original value.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sweagent/agent/agents.py
Line: 189
Comment:
**Global side-effect on all litellm calls**
`litellm.drop_params` is a module-level global. Setting it to `True` here permanently affects all subsequent `litellm.completion` calls in the process, including the main agent model calls in `models.py:693`. This means unsupported parameters will be silently dropped for the agent's own LLM calls too, which could mask configuration errors or change behavior unexpectedly.
This is especially problematic in multi-worker mode (`ThreadPoolExecutor` in `run_batch.py:288`) where a single `ask_user` call from one worker mutates global state for all workers.
Consider passing `drop_params` as a per-call parameter instead:
```suggestion
litellm.drop_params = True # TODO: this mutates global state; move to per-call param if litellm supports it
```
Or scope the change more carefully by saving and restoring the original value.
How can I resolve this? If you propose a fix, please make it concise.| except Exception: | ||
| pass |
There was a problem hiding this comment.
Silent exception swallowing hides load failures
The bare except Exception: pass here silently swallows all errors when loading task definitions (malformed JSON, permission errors, encoding issues, etc.). When ask_user later fails with "No task definition found for instance X", there will be no indication that the file existed but couldn't be parsed.
At minimum, log the exception:
| except Exception: | |
| pass | |
| except Exception as e: | |
| logging.getLogger(__name__).warning("Failed to load task definitions from %s: %s", task_def_file, e) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: sweagent/agent/agents.py
Line: 111-112
Comment:
**Silent exception swallowing hides load failures**
The bare `except Exception: pass` here silently swallows all errors when loading task definitions (malformed JSON, permission errors, encoding issues, etc.). When `ask_user` later fails with "No task definition found for instance X", there will be no indication that the file existed but couldn't be parsed.
At minimum, log the exception:
```suggestion
except Exception as e:
logging.getLogger(__name__).warning("Failed to load task definitions from %s: %s", task_def_file, e)
```
How can I resolve this? If you propose a fix, please make it concise.| # Pass through API credentials for the user simulator LLM | ||
| api_key = os.environ.get("OPENAI_API_KEY") or os.environ.get("LLM_API_KEY") | ||
| base_url = os.environ.get("OPENAI_BASE_URL") or os.environ.get("LLM_BASE_URL") | ||
| simulator_model = os.environ.get("USER_SIMULATOR_MODEL") | ||
|
|
||
| if api_key: | ||
| env_vars["OPENAI_API_KEY"] = api_key | ||
| env_vars["LLM_API_KEY"] = api_key | ||
| if base_url: | ||
| env_vars["OPENAI_BASE_URL"] = base_url | ||
| env_vars["LLM_BASE_URL"] = base_url | ||
| if simulator_model: | ||
| env_vars["USER_SIMULATOR_MODEL"] = simulator_model |
There was a problem hiding this comment.
Unnecessary credential injection into container
Since ask_user commands are now intercepted on the host side (in agents.py) before reaching the container, the container's ask_user binary never actually runs. Injecting API keys (OPENAI_API_KEY, LLM_API_KEY) into the container environment is unnecessary credential exposure. If the container is compromised or the agent runs arbitrary code, these credentials could be exfiltrated.
Consider removing the API credential injection since the host-side interception makes it redundant, or add a comment explaining why it's still needed (e.g., as a fallback).
Prompt To Fix With AI
This is a comment left during a code review.
Path: sweagent/run/hooks/task_definition_injection.py
Line: 104-116
Comment:
**Unnecessary credential injection into container**
Since `ask_user` commands are now intercepted on the host side (in `agents.py`) before reaching the container, the container's `ask_user` binary never actually runs. Injecting API keys (`OPENAI_API_KEY`, `LLM_API_KEY`) into the container environment is unnecessary credential exposure. If the container is compromised or the agent runs arbitrary code, these credentials could be exfiltrated.
Consider removing the API credential injection since the host-side interception makes it redundant, or add a comment explaining why it's still needed (e.g., as a fallback).
How can I resolve this? If you propose a fix, please make it concise.
Summary
Adds an
ask_usertool that allows agents to request clarification on underspecified tasks. The tool simulates user responses using an LLM based on the complete task definition.Key changes:
ask_usertool intools/ask_user/TaskDefinitionInjectionHookto inject task definitions into containerstask_definitions.jsoninrun_batch.pyask_usercommands on HOST side inagents.pyWhy host-side interception?
The original implementation (Bryan's PR #17144) runs the LLM call inside the Modal container. However, Modal containers cannot reach internal API endpoints like
litellm.ml-serving-internal.scale.com.This PR intercepts
ask_usercommands inagents.pybefore they are sent to the container, makes the LLM call on the host (where the agent's own LLM calls are made), and returns the response directly.Flow:
Based on Bryan's PR #17144 with architectural modification for internal API compatibility.
Test plan
scripts/test_ask_user_single.py🤖 Generated with Claude Code
Greptile Summary
Adds an
ask_usertool that lets agents request clarification on underspecified tasks by simulating user responses via an LLM. The key architectural decision is to interceptask_usercommands on the host side (inagents.py) rather than inside the container, since Modal containers cannot reach internal API endpoints.DefaultAgent.handle_action()that parsesask_usercommands, loads task definitions from a JSON file, and callslitellm.completionto simulate a user responseTaskDefinitionInjectionHookto inject task definitions into containers (for the in-container fallback binary)task_definitions.jsoninrun_batch.pyand registers the hookask_userbinary andllm.pyhelper (effectively dead code since host intercepts first)tool_use_with_ask_user.yamlconfig variantIssues found:
litellm.drop_params = Trueinagents.py:189is a global mutation that permanently affects alllitellm.completioncalls in the process, including the main agent model calls. This is especially problematic in multi-worker mode._load_task_definitions(except Exception: pass) makes it impossible to debug task definition loading failures.OPENAI_API_KEY,LLM_API_KEY) despite host-side interception making the in-container binary redundant — this is unnecessary credential exposure.Confidence Score: 2/5
litellm.drop_paramsglobal mutation can silently alter behavior of the main agent LLM calls across all workers.litellm.drop_paramswhich can affect correctness of the main agent model calls, combined with silent error swallowing that will make production debugging difficult. The core interception logic is sound but these issues need fixing before merge.sweagent/agent/agents.py(global state mutation, silent error handling) andsweagent/run/hooks/task_definition_injection.py(unnecessary credential exposure).Important Files Changed
litellm.drop_params = True) that affects all LLM calls, and silently swallows task definition load errors.Sequence Diagram
sequenceDiagram participant Agent as Agent LLM participant Host as Host (agents.py) participant Cache as Task Definitions Cache participant LiteLLM as LiteLLM (User Simulator) participant Container as Container (never reached) Agent->>Host: ask_user "question" "context" Host->>Host: _parse_ask_user_command() Host->>Cache: _load_task_definitions(traj_path) Cache-->>Host: task_defs dict Host->>Host: Build system prompt with primary_task + underspecified_task Host->>LiteLLM: litellm.completion(model, messages) LiteLLM-->>Host: Simulated user response Host-->>Agent: step.observation = response Note over Container: Container binary never executes<br/>(host intercepts first)Last reviewed commit: 36d9a40
(4/5) You can add custom instructions or style guidelines for the agent here!