Skip to content

Conversation

@gtg7784
Copy link
Contributor

@gtg7784 gtg7784 commented Dec 31, 2025

Summary

  • Add Phase 0.5 (Documentation Discovery) workflow before TYPE A and D requests
  • Improves documentation lookup accuracy by discovering doc structure before searching

Changes

New Documentation Discovery Flow (Phase 0.5)

  1. Find Official Documentation: Web search to identify the official docs URL
  2. Version Check: If user specifies a version (e.g., "React 18", "Next.js 14"), verify versioned documentation
  3. Sitemap Discovery: Fetch sitemap.xml to understand documentation structure
  4. Targeted Investigation: Use sitemap knowledge to fetch specific relevant pages

Why This Matters

Previously, the librarian would immediately fire parallel searches without understanding the documentation structure. This led to:

  • Random page fetching that might miss relevant content
  • No version awareness (could return wrong version's docs)
  • Inefficient searching without knowing where to look

Now the flow is:

  • Doc Discovery is SEQUENTIAL (websearch → version check → sitemap → investigate)
  • Main phase is PARALLEL once you know where to look

Updated Components

  • Classification table now shows "Doc Discovery →" for TYPE A and D
  • Tool reference includes sitemap discovery and doc page fetching
  • Failure recovery handles sitemap/versioned docs not found cases
  • Parallel execution requirements table now shows Doc Discovery column

Summary by cubic

Adds a Documentation Discovery phase before TYPE A and D requests to locate official, versioned docs and fetch targeted pages. Improves accuracy and cuts down random, unfocused searches.

  • New Features

    • Sequential doc discovery: web search → version check → sitemap → targeted fetch.
    • Version-aware lookups when a version is specified.
    • Targeted page fetching using sitemap; main investigation runs in parallel after discovery.
    • Failure recovery for missing sitemaps or versioned docs.
  • Refactors

    • Update classification to show “Doc Discovery →” for TYPE A and D.
    • Switch from context7_get-library-docs to context7_query-docs.
    • Expand tool reference for sitemap and doc page fetching.
    • Add Doc Discovery column to parallel execution requirements.

Written for commit 3853456. Summary will update on new commits.

…c investigation

- Add Phase 0.5 (Documentation Discovery) before TYPE A and D requests
- Sequential flow: websearch → version check → sitemap → targeted investigation
- Enables version-specific documentation lookup when user specifies version
- Sitemap discovery helps understand doc structure before searching
- Update tool reference with sitemap and doc page fetching
- Add failure recovery for sitemap/versioned docs not found cases
@github-actions
Copy link
Contributor

github-actions bot commented Dec 31, 2025

All contributors have signed the CLA. Thank you! ✅
Posted by the CLA Assistant Lite bot.

@gtg7784
Copy link
Contributor Author

gtg7784 commented Dec 31, 2025

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Dec 31, 2025
@greptile-apps
Copy link

greptile-apps bot commented Dec 31, 2025

Greptile Summary

Adds Phase 0.5 Documentation Discovery workflow to improve documentation lookup accuracy before TYPE A (Conceptual) and TYPE D (Comprehensive) requests. The new workflow discovers official docs URL, verifies versioned documentation, fetches sitemap.xml to understand doc structure, then performs targeted investigation instead of random searching.

  • Sequential Doc Discovery: websearch → version check → sitemap → targeted investigation
  • Main Phase Remains Parallel: Once doc structure is known, parallel tool execution continues
  • Classification table updated: Shows "Doc Discovery →" for TYPE A and D
  • New tool entries: Added sitemap discovery and doc page fetching with webfetch
  • Failure recovery: Handles sitemap/versioned docs not found cases

Issue Found: Changed tool name from context7_get-library-docs to context7_query-docs in three locations (lines 114, 181, 237), but src/hooks/agent-usage-reminder/constants.ts:18 still references the old tool name. This creates an inconsistency where the agent usage reminder hook won't recognize the tool being used.

Confidence Score: 3/5

  • This PR is safe to merge after fixing the tool name inconsistency
  • The workflow logic and documentation improvements are sound, but the tool name change from context7_get-library-docs to context7_query-docs creates a runtime inconsistency with the agent-usage-reminder hook that still references the old name. This must be corrected either by reverting the tool name changes or updating the hook configuration.
  • Check src/hooks/agent-usage-reminder/constants.ts to ensure tool name matches the changes in src/agents/librarian.ts

Important Files Changed

Filename Overview
src/agents/librarian.ts Added Phase 0.5 Documentation Discovery workflow with sitemap-based targeted investigation. Changed tool name from context7_get-library-docs to context7_query-docs which creates inconsistency with agent-usage-reminder hook.

Sequence Diagram

sequenceDiagram
    participant User
    participant Librarian
    participant WebSearch as websearch_exa
    participant Sitemap as webfetch(sitemap)
    participant Docs as webfetch(doc_pages)
    participant Context7
    participant GrepApp as grep_app
    
    Note over User,GrepApp: TYPE A or TYPE D Request
    
    User->>Librarian: "How do I use React 18?"
    
    rect rgb(230, 240, 255)
        Note right of Librarian: Phase 0.5: Documentation Discovery (SEQUENTIAL)
        Librarian->>WebSearch: "React official documentation site"
        WebSearch-->>Librarian: https://react.dev
        
        Librarian->>WebSearch: "React v18 documentation"
        WebSearch-->>Librarian: Versioned URL confirmed
        
        Librarian->>Sitemap: GET /sitemap.xml
        Sitemap-->>Librarian: Parse doc structure
        
        Note right of Librarian: Identify relevant sections from sitemap
    end
    
    rect rgb(240, 255, 240)
        Note right of Librarian: Phase 1: Main Investigation (PARALLEL)
        par Parallel Execution
            Librarian->>Context7: resolve-library-id("react")
            Context7-->>Librarian: library_id
            Librarian->>Context7: query-docs(id, "hooks")
            Context7-->>Librarian: Official docs
        and
            Librarian->>Docs: GET /docs/hooks.html (from sitemap)
            Docs-->>Librarian: Targeted doc page
        and
            Librarian->>GrepApp: searchGitHub("React hooks usage")
            GrepApp-->>Librarian: Code examples
        end
    end
    
    Librarian->>User: Synthesized answer with permalinks
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. src/agents/librarian.ts, line 114 (link)

    syntax: tool name changed from context7_get-library-docs to context7_query-docs but src/hooks/agent-usage-reminder/constants.ts:18 still references the old name

  2. src/agents/librarian.ts, line 181 (link)

    syntax: tool name changed from context7_get-library-docs to context7_query-docs but src/hooks/agent-usage-reminder/constants.ts:18 still references the old name

  3. src/agents/librarian.ts, line 237 (link)

    syntax: tool name changed from context7_get-library-docs to context7_query-docs but src/hooks/agent-usage-reminder/constants.ts:18 still references the old name

1 file reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 1 file

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/agents/librarian.ts">

<violation number="1" location="src/agents/librarian.ts:53">
P2: TYPE A table says `context7 + websearch_exa`, but TYPE A instructions now use `webfetch(...)` from the sitemap instead. Align the table with the actual workflow so the agent follows a single, consistent playbook.</violation>

<violation number="2" location="src/agents/librarian.ts:76">
P3: The `webfetch(... + &quot;/versions&quot; or &quot;/v{version}&quot;)` example is ambiguous/invalid pseudo-code. Split into explicit alternatives like you do for sitemap fallbacks.</violation>

<violation number="3" location="src/agents/librarian.ts:96">
P1: `context7_query-docs` appears to be an incorrect/non-existent tool name (only used in this prompt, while the repo references `context7_get-library-docs`). This will likely break the librarian’s Context7 step or confuse tool usage; use the real tool name consistently.</violation>

<violation number="4" location="src/agents/librarian.ts:114">
P2: Context7 call examples are inconsistent (`query:` vs `topic:`). Pick one argument shape and use it everywhere to avoid invalid tool-call schemas.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

| Type | Trigger Examples | Tools |
|------|------------------|-------|
| **TYPE A: CONCEPTUAL** | "How do I use X?", "Best practice for Y?" | context7 + websearch_exa (parallel) |
| **TYPE A: CONCEPTUAL** | "How do I use X?", "Best practice for Y?" | Doc Discovery → context7 + websearch_exa |
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: TYPE A table says context7 + websearch_exa, but TYPE A instructions now use webfetch(...) from the sitemap instead. Align the table with the actual workflow so the agent follows a single, consistent playbook.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/agents/librarian.ts, line 53:

<comment>TYPE A table says `context7 + websearch_exa`, but TYPE A instructions now use `webfetch(...)` from the sitemap instead. Align the table with the actual workflow so the agent follows a single, consistent playbook.</comment>

<file context>
@@ -50,10 +50,56 @@ Classify EVERY request into one of these categories before taking action:
 | Type | Trigger Examples | Tools |
 |------|------------------|-------|
-| **TYPE A: CONCEPTUAL** | &quot;How do I use X?&quot;, &quot;Best practice for Y?&quot; | context7 + websearch_exa (parallel) |
+| **TYPE A: CONCEPTUAL** | &quot;How do I use X?&quot;, &quot;Best practice for Y?&quot; | Doc Discovery → context7 + websearch_exa |
 | **TYPE B: IMPLEMENTATION** | &quot;How does X implement Y?&quot;, &quot;Show me source of Z&quot; | gh clone + read + blame |
 | **TYPE C: CONTEXT** | &quot;Why was this changed?&quot;, &quot;History of X?&quot; | gh issues/prs + git log/blame |
</file context>
Fix with Cubic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table shows "Doc Discovery →" which includes the webfetch sitemap workflow. The context7 + websearch_exa still applies after Doc Discovery phase. The table is a summary, while the detailed workflow is in PHASE 0.5 and TYPE A sections.

With sitemap knowledge, fetch the SPECIFIC documentation pages relevant to the query:
\`\`\`
webfetch(specific_doc_page_from_sitemap)
context7_query-docs(libraryId, query: "specific topic")
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: context7_query-docs appears to be an incorrect/non-existent tool name (only used in this prompt, while the repo references context7_get-library-docs). This will likely break the librarian’s Context7 step or confuse tool usage; use the real tool name consistently.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/agents/librarian.ts, line 96:

<comment>`context7_query-docs` appears to be an incorrect/non-existent tool name (only used in this prompt, while the repo references `context7_get-library-docs`). This will likely break the librarian’s Context7 step or confuse tool usage; use the real tool name consistently.</comment>

<file context>
@@ -50,10 +50,56 @@ Classify EVERY request into one of these categories before taking action:
+With sitemap knowledge, fetch the SPECIFIC documentation pages relevant to the query:
+\`\`\`
+webfetch(specific_doc_page_from_sitemap)
+context7_query-docs(libraryId, query: &quot;specific topic&quot;)
+\`\`\`
+
</file context>
Fix with Cubic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 3853456. The tool name context7_query-docs is correct (matches the MCP tool definition). Also updated src/hooks/agent-usage-reminder/constants.ts to use the same name for consistency.

@gtg7784 gtg7784 changed the base branch from master to dev January 2, 2026 05:33
- Fix ambiguous webfetch pseudo-code (split /versions or /v{version})
- Unify context7_query-docs parameter format (query: instead of topic:)
- Update agent-usage-reminder constants to use context7_query-docs
@code-yeongyu
Copy link
Owner

@sisyphus-dev-ai review this

And how do you think this will change the librarian's behavior be like? Will it slower? Preciser? More possiblities to answer deprecated stuff?

ulw to share your opinion

@sisyphus-dev-ai sisyphus-dev-ai added the sisyphus: working Sisyphus is currently working on this label Jan 3, 2026
@sisyphus-dev-ai
Copy link
Collaborator

Review: Documentation Discovery Workflow for Librarian

Great question @code-yeongyu! Let me break down the behavioral impact of this change.

TL;DR

Aspect Impact Why
Speed Slightly slower Sequential discovery phase (3-4 calls) before parallel execution
Precision Significantly better Targeted searching informed by doc structure
Deprecated content Less likely, not more Version-aware lookups actively prevent wrong-version docs

Detailed Analysis

Speed Trade-off

Before: Immediate parallel fire (3+ calls for TYPE A, 6+ for TYPE D)

After: Sequential discovery → then parallel fire

websearch (find docs URL)
   ↓
webfetch (version check if specified)
   ↓
webfetch (sitemap.xml)
   ↓
[NOW parallel phase begins]

This adds ~2-4 seconds latency on TYPE A/D requests. The main investigation phase remains parallel.

Precision Improvement (The Real Win)

The current librarian does "hopeful parallel searching" - it fires multiple tools simultaneously without knowing WHERE the relevant documentation lives. This leads to:

  • Random page hits that miss relevant content
  • No understanding of doc organization
  • Luck-dependent results

The new flow:

  1. Finds official docs URL - not blogs, not tutorials
  2. Verifies version - React 18 docs, not React 17
  3. Parses sitemap - understands the doc structure
  4. Fetches targeted pages - informed by sitemap knowledge

This is like the difference between grepping blindly vs. understanding the codebase structure first.

Deprecated Content (Counter-intuitive!)

You asked if this increases deprecated content risk. It's actually the opposite.

The new Phase 0.5 explicitly:

  • Checks for versioned documentation when user specifies a version
  • Falls back to latest with explicit acknowledgment: "Versioned docs not found → Fall back to latest version, note this in response"

Before this change, the librarian had zero version awareness. It would happily return Next.js 13 App Router docs when the user was asking about Next.js 14.


Trade-off Summary

Before After
Lower latency Higher latency (+2-4s)
Random parallel hits Informed targeted hits
No version awareness Explicit version checking
Variable precision Consistent precision
Lower token usage Higher token usage (sitemap parsing)

Verdict

This is a good trade-off. Documentation questions benefit more from precision than raw speed. Users asking "How do I use X in version Y?" care about getting the right answer, not getting any answer 2 seconds faster.

The sequential discovery is a one-time upfront cost that dramatically improves the quality of the parallel phase that follows.

Minor Suggestion

Consider adding a skip condition for well-known libraries with consistent doc structures (React, Next.js, etc.) where we already know the sitemap location. This could recover some latency for common cases.

// Potential optimization
if (library in KNOWN_DOC_STRUCTURES) {
  // Skip websearch, go directly to sitemap
  webfetch(KNOWN_DOC_STRUCTURES[library].sitemap)
}

But this is a nice-to-have, not a blocker.


LGTM - The precision improvement justifies the latency cost.

@sisyphus-dev-ai sisyphus-dev-ai removed the sisyphus: working Sisyphus is currently working on this label Jan 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants