-
Notifications
You must be signed in to change notification settings - Fork 395
feat(librarian): add documentation discovery workflow for targeted doc investigation #377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…c investigation - Add Phase 0.5 (Documentation Discovery) before TYPE A and D requests - Sequential flow: websearch → version check → sitemap → targeted investigation - Enables version-specific documentation lookup when user specifies version - Sitemap discovery helps understand doc structure before searching - Update tool reference with sitemap and doc page fetching - Add failure recovery for sitemap/versioned docs not found cases
|
All contributors have signed the CLA. Thank you! ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
Greptile SummaryAdds Phase 0.5 Documentation Discovery workflow to improve documentation lookup accuracy before TYPE A (Conceptual) and TYPE D (Comprehensive) requests. The new workflow discovers official docs URL, verifies versioned documentation, fetches sitemap.xml to understand doc structure, then performs targeted investigation instead of random searching.
Issue Found: Changed tool name from Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Librarian
participant WebSearch as websearch_exa
participant Sitemap as webfetch(sitemap)
participant Docs as webfetch(doc_pages)
participant Context7
participant GrepApp as grep_app
Note over User,GrepApp: TYPE A or TYPE D Request
User->>Librarian: "How do I use React 18?"
rect rgb(230, 240, 255)
Note right of Librarian: Phase 0.5: Documentation Discovery (SEQUENTIAL)
Librarian->>WebSearch: "React official documentation site"
WebSearch-->>Librarian: https://react.dev
Librarian->>WebSearch: "React v18 documentation"
WebSearch-->>Librarian: Versioned URL confirmed
Librarian->>Sitemap: GET /sitemap.xml
Sitemap-->>Librarian: Parse doc structure
Note right of Librarian: Identify relevant sections from sitemap
end
rect rgb(240, 255, 240)
Note right of Librarian: Phase 1: Main Investigation (PARALLEL)
par Parallel Execution
Librarian->>Context7: resolve-library-id("react")
Context7-->>Librarian: library_id
Librarian->>Context7: query-docs(id, "hooks")
Context7-->>Librarian: Official docs
and
Librarian->>Docs: GET /docs/hooks.html (from sitemap)
Docs-->>Librarian: Targeted doc page
and
Librarian->>GrepApp: searchGitHub("React hooks usage")
GrepApp-->>Librarian: Code examples
end
end
Librarian->>User: Synthesized answer with permalinks
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (3)
-
src/agents/librarian.ts, line 114 (link)syntax: tool name changed from
context7_get-library-docstocontext7_query-docsbutsrc/hooks/agent-usage-reminder/constants.ts:18still references the old name -
src/agents/librarian.ts, line 181 (link)syntax: tool name changed from
context7_get-library-docstocontext7_query-docsbutsrc/hooks/agent-usage-reminder/constants.ts:18still references the old name -
src/agents/librarian.ts, line 237 (link)syntax: tool name changed from
context7_get-library-docstocontext7_query-docsbutsrc/hooks/agent-usage-reminder/constants.ts:18still references the old name
1 file reviewed, 3 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 issues found across 1 file
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/agents/librarian.ts">
<violation number="1" location="src/agents/librarian.ts:53">
P2: TYPE A table says `context7 + websearch_exa`, but TYPE A instructions now use `webfetch(...)` from the sitemap instead. Align the table with the actual workflow so the agent follows a single, consistent playbook.</violation>
<violation number="2" location="src/agents/librarian.ts:76">
P3: The `webfetch(... + "/versions" or "/v{version}")` example is ambiguous/invalid pseudo-code. Split into explicit alternatives like you do for sitemap fallbacks.</violation>
<violation number="3" location="src/agents/librarian.ts:96">
P1: `context7_query-docs` appears to be an incorrect/non-existent tool name (only used in this prompt, while the repo references `context7_get-library-docs`). This will likely break the librarian’s Context7 step or confuse tool usage; use the real tool name consistently.</violation>
<violation number="4" location="src/agents/librarian.ts:114">
P2: Context7 call examples are inconsistent (`query:` vs `topic:`). Pick one argument shape and use it everywhere to avoid invalid tool-call schemas.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| | Type | Trigger Examples | Tools | | ||
| |------|------------------|-------| | ||
| | **TYPE A: CONCEPTUAL** | "How do I use X?", "Best practice for Y?" | context7 + websearch_exa (parallel) | | ||
| | **TYPE A: CONCEPTUAL** | "How do I use X?", "Best practice for Y?" | Doc Discovery → context7 + websearch_exa | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: TYPE A table says context7 + websearch_exa, but TYPE A instructions now use webfetch(...) from the sitemap instead. Align the table with the actual workflow so the agent follows a single, consistent playbook.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/agents/librarian.ts, line 53:
<comment>TYPE A table says `context7 + websearch_exa`, but TYPE A instructions now use `webfetch(...)` from the sitemap instead. Align the table with the actual workflow so the agent follows a single, consistent playbook.</comment>
<file context>
@@ -50,10 +50,56 @@ Classify EVERY request into one of these categories before taking action:
| Type | Trigger Examples | Tools |
|------|------------------|-------|
-| **TYPE A: CONCEPTUAL** | "How do I use X?", "Best practice for Y?" | context7 + websearch_exa (parallel) |
+| **TYPE A: CONCEPTUAL** | "How do I use X?", "Best practice for Y?" | Doc Discovery → context7 + websearch_exa |
| **TYPE B: IMPLEMENTATION** | "How does X implement Y?", "Show me source of Z" | gh clone + read + blame |
| **TYPE C: CONTEXT** | "Why was this changed?", "History of X?" | gh issues/prs + git log/blame |
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table shows "Doc Discovery →" which includes the webfetch sitemap workflow. The context7 + websearch_exa still applies after Doc Discovery phase. The table is a summary, while the detailed workflow is in PHASE 0.5 and TYPE A sections.
src/agents/librarian.ts
Outdated
| With sitemap knowledge, fetch the SPECIFIC documentation pages relevant to the query: | ||
| \`\`\` | ||
| webfetch(specific_doc_page_from_sitemap) | ||
| context7_query-docs(libraryId, query: "specific topic") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: context7_query-docs appears to be an incorrect/non-existent tool name (only used in this prompt, while the repo references context7_get-library-docs). This will likely break the librarian’s Context7 step or confuse tool usage; use the real tool name consistently.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/agents/librarian.ts, line 96:
<comment>`context7_query-docs` appears to be an incorrect/non-existent tool name (only used in this prompt, while the repo references `context7_get-library-docs`). This will likely break the librarian’s Context7 step or confuse tool usage; use the real tool name consistently.</comment>
<file context>
@@ -50,10 +50,56 @@ Classify EVERY request into one of these categories before taking action:
+With sitemap knowledge, fetch the SPECIFIC documentation pages relevant to the query:
+\`\`\`
+webfetch(specific_doc_page_from_sitemap)
+context7_query-docs(libraryId, query: "specific topic")
+\`\`\`
+
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 3853456. The tool name context7_query-docs is correct (matches the MCP tool definition). Also updated src/hooks/agent-usage-reminder/constants.ts to use the same name for consistency.
- Fix ambiguous webfetch pseudo-code (split /versions or /v{version})
- Unify context7_query-docs parameter format (query: instead of topic:)
- Update agent-usage-reminder constants to use context7_query-docs
|
@sisyphus-dev-ai review this And how do you think this will change the librarian's behavior be like? Will it slower? Preciser? More possiblities to answer deprecated stuff? ulw to share your opinion |
Review: Documentation Discovery Workflow for LibrarianGreat question @code-yeongyu! Let me break down the behavioral impact of this change. TL;DR
Detailed AnalysisSpeed Trade-offBefore: Immediate parallel fire (3+ calls for TYPE A, 6+ for TYPE D) After: Sequential discovery → then parallel fire This adds ~2-4 seconds latency on TYPE A/D requests. The main investigation phase remains parallel. Precision Improvement (The Real Win)The current librarian does "hopeful parallel searching" - it fires multiple tools simultaneously without knowing WHERE the relevant documentation lives. This leads to:
The new flow:
This is like the difference between grepping blindly vs. understanding the codebase structure first. Deprecated Content (Counter-intuitive!)You asked if this increases deprecated content risk. It's actually the opposite. The new Phase 0.5 explicitly:
Before this change, the librarian had zero version awareness. It would happily return Next.js 13 App Router docs when the user was asking about Next.js 14. Trade-off Summary
VerdictThis is a good trade-off. Documentation questions benefit more from precision than raw speed. Users asking "How do I use X in version Y?" care about getting the right answer, not getting any answer 2 seconds faster. The sequential discovery is a one-time upfront cost that dramatically improves the quality of the parallel phase that follows. Minor SuggestionConsider adding a skip condition for well-known libraries with consistent doc structures (React, Next.js, etc.) where we already know the sitemap location. This could recover some latency for common cases. But this is a nice-to-have, not a blocker. LGTM - The precision improvement justifies the latency cost. |
Summary
Changes
New Documentation Discovery Flow (Phase 0.5)
Why This Matters
Previously, the librarian would immediately fire parallel searches without understanding the documentation structure. This led to:
Now the flow is:
Updated Components
Summary by cubic
Adds a Documentation Discovery phase before TYPE A and D requests to locate official, versioned docs and fetch targeted pages. Improves accuracy and cuts down random, unfocused searches.
New Features
Refactors
Written for commit 3853456. Summary will update on new commits.