-
Notifications
You must be signed in to change notification settings - Fork 144
RFD: Add Elicitation specification for structured user input #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Yordis Prieto <yordis.prieto@gmail.com>
Apply fixes from code review and GitHub research: - Fix client capabilities to use ClientCapabilities pattern (like fs, terminal) - Add complete turn response example showing elicitation + content integration - Define single-elicitation-per-turn design decision for v1 - Clarify URL-mode OAuth is ACP-specific, not fully MCP-aligned - Expand validation behavior FAQ with client/server responsibility split These changes align with existing protocol patterns and clarify architectural decisions identified in code review.
Add comprehensive elicitation system allowing agents to request structured user input during conversation turns. Includes: - ElicitationRequest: Request types (text, number, select, multiselect, boolean, password, URL) - ElicitationSchema: JSON Schema constraints for validation - ElicitationOption: Choices for select/multiselect types - ElicitationResponse: User responses with convenient builder methods - ElicitationCapability: Client capability negotiation - StopReason.ElicitationRequested: Stop reason for elicitation requests - Integration with PromptResponse and PromptRequest All types support serialization, JSON Schema generation, and include comprehensive tests. Feature-gated under unstable_elicitation flag.
Add 14 new tests covering: - Schema constraints (min/max values, enum values) - URL mode with OAuth return formats - Metadata handling for options and responses - All ElicitationType variants serialization - Multiselect array responses - Optional field serialization behavior - Custom capability configurations Total test count: 37 (13 new elicitation tests)
|
Hey! Thanks for contributing, is it true, that we can handle things like that https://github.com/orgs/agentclientprotocol/discussions/371 with this feature? |
Add StreamMessage, StreamMessageDirection, and StreamMessageContent types for monitoring and debugging RPC message flow. These types enable implementations to observe incoming/outgoing requests, responses, and notifications. Includes: - StreamMessageDirection enum (Incoming/Outgoing) - StreamMessageContent enum (Request/Response/Notification variants) - StreamMessage struct wrapping content and direction - StreamSender/StreamReceiver type aliases using async-broadcast - Helper constructors (::incoming(), ::outgoing()) - 5 comprehensive tests for serialization and variants Also adds async-broadcast v0.7 dependency for async multi-consumer broadcast channel support.
|
Just for some context, OpenCodes question tool schema: |
|
Hey @ignatov these type of spec should allow for that type of "question" form to happen. In fact, my existing ACP needs is all about such feature. I am trying to write the spec AND figure out if I can make Zed GUI to implement it. Overall, the intent is to allow structure input from the user such as Questionnaire, or buttons for specific auctions (at least that is my immediate need). |
benbrandt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need something like this, and adopting the same pattern as MCP also allows us to forward MCP elicitation requests which is nice
…like permissions Changed elicitation from embedded in session/prompt flow to a separate request/response method pattern matching permissions design. This provides clearer protocol semantics and consistent handling of structured user input requests. PROTOCOL CHANGES: - Moved from PromptRequest.elicitation_response to RequestElicitationRequest - Moved from PromptResponse.elicitation to RequestElicitationResponse - New method: session/elicitation (separate like session/request_permission) API CHANGES: - Removed elicitation_response field from PromptRequest - Removed elicitation field from PromptResponse - Added RequestElicitationRequest wrapper struct - Added RequestElicitationResponse wrapper struct - Added SESSION_ELICITATION_METHOD_NAME constant This aligns with @benbrandt feedback on consistency with permission request/response pattern.
|
@benbrandt is it OK to ask to ignore the Rust changes for now, I know it could be annoying, but I am trying to make Zed to work to see the full picture, I just realized that I committed those changes as well while trying to fix the request/response situation. Since I am in such active development, and I want to see Zed working, the burden is on you to ignore those files and only focus on the markdown until we are ready to merge. Otherwise, totally cool, I create another branch for myself, just make it a bit more difficult since it requires to switch between them and synchronize a bit more. |
Updated RFD to document the refactored architecture where elicitation uses a separate session/elicitation request/response method (matching permissions pattern) instead of being embedded in session/prompt flow. KEY CHANGES: - Clarified that elicitation is triggered by stopReason: "elicitation_requested" - Updated flow to show separate session/elicitation method call - Aligned with permission request/response pattern for consistency - Added complete JSON-RPC examples with method names and full message structure This addresses @benbrandt's feedback about consistency between permission and elicitation request/response mechanisms.
|
@benbrandt I think that we have to work on that in the next wave |
|
@ignatov tomorrow I am back home so I can continue the work, I got distracted by trying to actually make a Zed GUI that allow us to do OpenCode-style of questionnaire as a PoC. Let me know whatever you would like to see happening, I am available in 24hrs |
benbrandt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK left some notes.
I would also appreciate to have the RFD merged separately from the Rust changes. as some of the things you have added would fit better within the SDK (this crate should be mostly limited to schema types)
So before we can merge this, I would want these separated to focus the review and also save you a bunch of extra work
| - **Selections**: select (single), multiselect (multiple) with enum-based options | ||
| - **Sensitive inputs**: password, URL-mode for out-of-band OAuth flows (addressing PR #330 authentication pain points) | ||
|
|
||
| 3. **Work in turn context**: Elicitation requests are triggered when a turn ends with `stopReason: "elicitation_requested"`, allowing agents to ask questions naturally within the conversation flow. Agents send elicitation requests via a separate `session/elicitation` method (following the same request/response pattern as `session/request_permission`). Unlike Session Config Options (which are persistent), elicitation requests are transient and turn-specific. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the agent need to return a stop_reason? For example, when permissions are requested, there isn't a stop reason, the agent may just await a response before continuing
|
|
||
| 3. **Work in turn context**: Elicitation requests are triggered when a turn ends with `stopReason: "elicitation_requested"`, allowing agents to ask questions naturally within the conversation flow. Agents send elicitation requests via a separate `session/elicitation` method (following the same request/response pattern as `session/request_permission`). Unlike Session Config Options (which are persistent), elicitation requests are transient and turn-specific. | ||
|
|
||
| 4. **Support client capability negotiation**: Clients declare what elicitation types they support (similar to the client capabilities pattern emerging in the protocol). Agents handle gracefully when clients don't support elicitation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely think we should follow the MCP capablity model here:
{
"capabilities": {
"elicitation": {
"form": {},
"url": {}
}
}
}
Where we distinguish between the two forms so we can also better map and pass this along to agents to pass to their MCP clients, and also support clients who may only be able to offer one or the other for various reasons
|
|
||
| ### Elicitation Request Structure | ||
|
|
||
| When a turn ends with `stopReason: "elicitation_requested"`, the agent sends a separate elicitation request (following the same pattern as permission requests). Example 1 (User Selection - from PR #340): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't think the agent needs to end their turn, the client would just respond to a request, same as auth
| - `select` - Single-choice selection from a list | ||
| - `multiselect` - Multiple-choice selection | ||
| - `boolean` - Yes/no choice | ||
| - `password` - Masked text input (for sensitive credentials) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these don't seem to match MCP, like this one isn't here (in fact MCP explicitly says not to ask for passwords in these forms)
I also think stuff like this would be a format field on field of type: string (this is json schema after all)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the client needs to know it needs to support the restricted json schema and can decide how to represent that. We don't necessarily need to specify specific input types in the protocol definition in my opinion (again, also just looking to the MCP specification here)
|
|
||
| **Not supported** (to keep initial implementation simple): | ||
| - Complex nested objects/arrays | ||
| - `allOf`, `anyOf`, `oneOf` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need oneOf to represent single select enums: https://modelcontextprotocol.io/specification/2025-11-25/client/elicitation#requested-schema
| |--------|------------------------|-------------| | ||
| | **Lifecycle** | Persistent, pre-declared at session init | Transient, appears during turns | | ||
| | **Scope** | Session-wide configuration | Single turn/decision point | | ||
| | **Defaults** | Required (agents must have defaults) | Required (agents should always provide) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are defaults required on elicitation? I don't think so? I think the point is the json schema can allow for required fields?
|
|
||
| ### Can agents use elicitation for information required before responding? | ||
|
|
||
| Yes. An agent can include an elicitation request in a turn response with a default value and continue, then incorporate the user's response into the next turn. This is how agents can guide users through multi-step workflows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, I think this is where tying it to turns is an anti-pattern.
By modeling it as a request / response, the agent can decide in its own control flow whether or not to wait for a response before doing something else
|
|
||
| ### What if a user doesn't respond to an elicitation request? | ||
|
|
||
| The agent's default value is used (which agents must always provide). If an agent truly requires user input and wants to block, it should fail the turn and let the client handle retry logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fair to say that an elicitation request requires a response, even if that response is "cancelled" (we should allow for request cancellation, we can tie it together with the request cancellation changes in that RFD)
|
|
||
| ### Should elicitation support complex nested data structures? | ||
|
|
||
| For the initial version: no. We're focusing on simple types (strings, numbers, booleans, arrays of those). Complex nested structures can be added in future versions if use cases emerge. This keeps the initial scope manageable and lets us learn from real-world usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the key point is we are supporting whatever MCP supports here
|
|
||
| ### Can we extend this to replace the existing Permission-Request mechanism? | ||
|
|
||
| Potentially, but that's out of scope for this RFD. PR #210 discussed that elicitation "could potentially even replace the Permission-Request mechanism" (Phil65), but that requires separate analysis of the permission request use cases and whether elicitation's constraints (no complex nesting, simpler lifecycle) are sufficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A point in favor of keeping these separate: since permission requests are more of a security concern, they should be handled separately so that the Client can offer a consistent experience.
Me deciding to allow a tool call should be distinct from the model asking for clarification. Also a reason that was brought up to keep the auth flow distinct. Maybe we reuse some types, but I don't think we should necessarily conflate the features
That will happen for sure! Please ignore for now, 🙏🏻 I will revert the code tomorrow when I wake up and move it to another branch for my own sake. |
Signed-off-by: Yordis Prieto yordis.prieto@gmail.com