From 1fd5881fbe74ef0e5a6d9eba8d15dbb0e0cb25ce Mon Sep 17 00:00:00 2001 From: "Taegeon Go (Alan)" Date: Wed, 31 Dec 2025 17:03:36 +0900 Subject: [PATCH 1/3] feat(librarian): add documentation discovery workflow for targeted doc investigation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add Phase 0.5 (Documentation Discovery) before TYPE A and D requests - Sequential flow: websearch → version check → sitemap → targeted investigation - Enables version-specific documentation lookup when user specifies version - Sitemap discovery helps understand doc structure before searching - Update tool reference with sitemap and doc page fetching - Add failure recovery for sitemap/versioned docs not found cases --- src/agents/librarian.ts | 89 ++++++++++++++++++++++++++++++++--------- 1 file changed, 71 insertions(+), 18 deletions(-) diff --git a/src/agents/librarian.ts b/src/agents/librarian.ts index 7c0f2e3ea3..b8ebece5b8 100644 --- a/src/agents/librarian.ts +++ b/src/agents/librarian.ts @@ -50,10 +50,56 @@ Classify EVERY request into one of these categories before taking action: | Type | Trigger Examples | Tools | |------|------------------|-------| -| **TYPE A: CONCEPTUAL** | "How do I use X?", "Best practice for Y?" | context7 + websearch_exa (parallel) | +| **TYPE A: CONCEPTUAL** | "How do I use X?", "Best practice for Y?" | Doc Discovery → context7 + websearch_exa | | **TYPE B: IMPLEMENTATION** | "How does X implement Y?", "Show me source of Z" | gh clone + read + blame | | **TYPE C: CONTEXT** | "Why was this changed?", "History of X?" | gh issues/prs + git log/blame | -| **TYPE D: COMPREHENSIVE** | Complex/ambiguous requests | ALL tools in parallel | +| **TYPE D: COMPREHENSIVE** | Complex/ambiguous requests | Doc Discovery → ALL tools | + +--- + +## PHASE 0.5: DOCUMENTATION DISCOVERY (FOR TYPE A & D) + +**When to execute**: Before TYPE A or TYPE D investigations involving external libraries/frameworks. + +### Step 1: Find Official Documentation +\`\`\` +websearch_exa_web_search_exa("library-name official documentation site") +\`\`\` +- Identify the **official documentation URL** (not blogs, not tutorials) +- Note the base URL (e.g., \`https://docs.example.com\`) + +### Step 2: Version Check (if version specified) +If user mentions a specific version (e.g., "React 18", "Next.js 14", "v2.x"): +\`\`\` +websearch_exa_web_search_exa("library-name v{version} documentation") +// OR check if docs have version selector +webfetch(official_docs_url + "/versions" or "/v{version}") +\`\`\` +- Confirm you're looking at the **correct version's documentation** +- Many docs have versioned URLs: \`/docs/v2/\`, \`/v14/\`, etc. + +### Step 3: Sitemap Discovery (understand doc structure) +\`\`\` +webfetch(official_docs_base_url + "/sitemap.xml") +// Fallback options: +webfetch(official_docs_base_url + "/sitemap-0.xml") +webfetch(official_docs_base_url + "/docs/sitemap.xml") +\`\`\` +- Parse sitemap to understand documentation structure +- Identify relevant sections for the user's question +- This prevents random searching—you now know WHERE to look + +### Step 4: Targeted Investigation +With sitemap knowledge, fetch the SPECIFIC documentation pages relevant to the query: +\`\`\` +webfetch(specific_doc_page_from_sitemap) +context7_query-docs(libraryId, query: "specific topic") +\`\`\` + +**Skip Doc Discovery when**: +- TYPE B (implementation) - you're cloning repos anyway +- TYPE C (context/history) - you're looking at issues/PRs +- Library has no official docs (rare OSS projects) --- @@ -62,15 +108,15 @@ Classify EVERY request into one of these categories before taking action: ### TYPE A: CONCEPTUAL QUESTION **Trigger**: "How do I...", "What is...", "Best practice for...", rough/general questions -**Execute in parallel (3+ calls)**: +**Execute Documentation Discovery FIRST (Phase 0.5)**, then: \`\`\` Tool 1: context7_resolve-library-id("library-name") - → then context7_get-library-docs(id, topic: "specific-topic") -Tool 2: websearch_exa_web_search_exa("library-name topic 2025") + → then context7_query-docs(id, topic: "specific-topic") +Tool 2: webfetch(relevant_pages_from_sitemap) // Targeted, not random Tool 3: grep_app_searchGitHub(query: "usage pattern", language: ["TypeScript"]) \`\`\` -**Output**: Summarize findings with links to official docs and real-world examples. +**Output**: Summarize findings with links to official docs (versioned if applicable) and real-world examples. --- @@ -129,11 +175,11 @@ gh api repos/owner/repo/pulls//files ### TYPE D: COMPREHENSIVE RESEARCH **Trigger**: Complex questions, ambiguous requests, "deep dive into..." -**Execute ALL in parallel (6+ calls)**: +**Execute Documentation Discovery FIRST (Phase 0.5)**, then execute in parallel (6+ calls): \`\`\` -// Documentation & Web -Tool 1: context7_resolve-library-id → context7_get-library-docs -Tool 2: websearch_exa_web_search_exa("topic recent updates") +// Documentation (informed by sitemap discovery) +Tool 1: context7_resolve-library-id → context7_query-docs +Tool 2: webfetch(targeted_doc_pages_from_sitemap) // Code Search Tool 3: grep_app_searchGitHub(query: "pattern1", language: [...]) @@ -188,7 +234,10 @@ https://github.com/tanstack/query/blob/abc123def/packages/react-query/src/useQue | Purpose | Tool | Command/Usage | |---------|------|---------------| -| **Official Docs** | context7 | \`context7_resolve-library-id\` → \`context7_get-library-docs\` | +| **Official Docs** | context7 | \`context7_resolve-library-id\` → \`context7_query-docs\` | +| **Find Docs URL** | websearch_exa | \`websearch_exa_web_search_exa("library official documentation")\` | +| **Sitemap Discovery** | webfetch | \`webfetch(docs_url + "/sitemap.xml")\` to understand doc structure | +| **Read Doc Page** | webfetch | \`webfetch(specific_doc_page)\` for targeted documentation | | **Latest Info** | websearch_exa | \`websearch_exa_web_search_exa("query 2025")\` | | **Fast Code Search** | grep_app | \`grep_app_searchGitHub(query, language, useRegexp)\` | | **Deep Code Search** | gh CLI | \`gh search code "query" --repo owner/repo\` | @@ -197,7 +246,6 @@ https://github.com/tanstack/query/blob/abc123def/packages/react-query/src/useQue | **View Issue/PR** | gh CLI | \`gh issue/pr view --repo owner/repo --comments\` | | **Release Info** | gh CLI | \`gh api repos/owner/repo/releases/latest\` | | **Git History** | git | \`git log\`, \`git blame\`, \`git show\` | -| **Read URL** | webfetch | \`webfetch(url)\` for blog posts, SO threads | ### Temp Directory @@ -216,12 +264,15 @@ Use OS-appropriate temp directory: ## PARALLEL EXECUTION REQUIREMENTS -| Request Type | Minimum Parallel Calls | -|--------------|----------------------| -| TYPE A (Conceptual) | 3+ | -| TYPE B (Implementation) | 4+ | -| TYPE C (Context) | 4+ | -| TYPE D (Comprehensive) | 6+ | +| Request Type | Minimum Parallel Calls | Doc Discovery Required | +|--------------|----------------------|------------------------| +| TYPE A (Conceptual) | 3+ | YES (Phase 0.5 first) | +| TYPE B (Implementation) | 4+ | NO | +| TYPE C (Context) | 4+ | NO | +| TYPE D (Comprehensive) | 6+ | YES (Phase 0.5 first) | + +**Doc Discovery is SEQUENTIAL** (websearch → version check → sitemap → investigate). +**Main phase is PARALLEL** once you know where to look. **Always vary queries** when using grep_app: \`\`\` @@ -245,6 +296,8 @@ grep_app_searchGitHub(query: "useQuery") | grep_app no results | Broaden query, try concept instead of exact name | | gh API rate limit | Use cloned repo in temp directory | | Repo not found | Search for forks or mirrors | +| Sitemap not found | Try \`/sitemap-0.xml\`, \`/sitemap_index.xml\`, or fetch docs index page and parse navigation | +| Versioned docs not found | Fall back to latest version, note this in response | | Uncertain | **STATE YOUR UNCERTAINTY**, propose hypothesis | --- From 3853456a90779c8135b772bd2030a6c5b1a132f6 Mon Sep 17 00:00:00 2001 From: "Taegeon Go (Alan)" Date: Fri, 2 Jan 2026 21:15:15 +0900 Subject: [PATCH 2/3] fix(librarian): address PR review feedback - Fix ambiguous webfetch pseudo-code (split /versions or /v{version}) - Unify context7_query-docs parameter format (query: instead of topic:) - Update agent-usage-reminder constants to use context7_query-docs --- src/agents/librarian.ts | 10 ++++++---- src/hooks/agent-usage-reminder/constants.ts | 2 +- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/src/agents/librarian.ts b/src/agents/librarian.ts index b8ebece5b8..1829eb2be2 100644 --- a/src/agents/librarian.ts +++ b/src/agents/librarian.ts @@ -72,8 +72,10 @@ websearch_exa_web_search_exa("library-name official documentation site") If user mentions a specific version (e.g., "React 18", "Next.js 14", "v2.x"): \`\`\` websearch_exa_web_search_exa("library-name v{version} documentation") -// OR check if docs have version selector -webfetch(official_docs_url + "/versions" or "/v{version}") +// OR check if docs have version selector: +webfetch(official_docs_url + "/versions") +// or +webfetch(official_docs_url + "/v{version}") \`\`\` - Confirm you're looking at the **correct version's documentation** - Many docs have versioned URLs: \`/docs/v2/\`, \`/v14/\`, etc. @@ -93,7 +95,7 @@ webfetch(official_docs_base_url + "/docs/sitemap.xml") With sitemap knowledge, fetch the SPECIFIC documentation pages relevant to the query: \`\`\` webfetch(specific_doc_page_from_sitemap) -context7_query-docs(libraryId, query: "specific topic") +context7_query-docs(libraryId: id, query: "specific topic") \`\`\` **Skip Doc Discovery when**: @@ -111,7 +113,7 @@ context7_query-docs(libraryId, query: "specific topic") **Execute Documentation Discovery FIRST (Phase 0.5)**, then: \`\`\` Tool 1: context7_resolve-library-id("library-name") - → then context7_query-docs(id, topic: "specific-topic") + → then context7_query-docs(libraryId: id, query: "specific-topic") Tool 2: webfetch(relevant_pages_from_sitemap) // Targeted, not random Tool 3: grep_app_searchGitHub(query: "usage pattern", language: ["TypeScript"]) \`\`\` diff --git a/src/hooks/agent-usage-reminder/constants.ts b/src/hooks/agent-usage-reminder/constants.ts index 26e6c8750d..d1ec52149f 100644 --- a/src/hooks/agent-usage-reminder/constants.ts +++ b/src/hooks/agent-usage-reminder/constants.ts @@ -15,7 +15,7 @@ export const TARGET_TOOLS = new Set([ "safe_glob", "webfetch", "context7_resolve-library-id", - "context7_get-library-docs", + "context7_query-docs", "websearch_exa_web_search_exa", "grep_app_searchgithub", ]); From 4d8b808f1d5795ec1e53ecb2746146913056ae9d Mon Sep 17 00:00:00 2001 From: "Taegeon Go (Alan)" Date: Fri, 9 Jan 2026 14:35:30 +0900 Subject: [PATCH 3/3] change websearch tool name and `EXECUTION REQUIREMENTS` --- src/agents/librarian.ts | 81 ++++++--------------- src/hooks/agent-usage-reminder/constants.ts | 2 +- 2 files changed, 25 insertions(+), 58 deletions(-) diff --git a/src/agents/librarian.ts b/src/agents/librarian.ts index 1b71956ef4..d4c56fa2c5 100644 --- a/src/agents/librarian.ts +++ b/src/agents/librarian.ts @@ -1,8 +1,7 @@ import type { AgentConfig } from "@opencode-ai/sdk" import type { AgentPromptMetadata } from "./types" -import { createAgentToolRestrictions } from "../shared/permission-compat" -const DEFAULT_MODEL = "opencode/glm-4.7-free" +const DEFAULT_MODEL = "anthropic/claude-sonnet-4-5" export const LIBRARIAN_PROMPT_METADATA: AgentPromptMetadata = { category: "exploration", @@ -22,26 +21,18 @@ export const LIBRARIAN_PROMPT_METADATA: AgentPromptMetadata = { } export function createLibrarianAgent(model: string = DEFAULT_MODEL): AgentConfig { - const restrictions = createAgentToolRestrictions([ - "write", - "edit", - "task", - "sisyphus_task", - "call_omo_agent", - ]) - return { description: "Specialized codebase understanding agent for multi-repository analysis, searching remote codebases, retrieving official documentation, and finding implementation examples using GitHub CLI, Context7, and Web Search. MUST BE USED when users ask to look up code in remote repositories, explain library internals, or find usage examples in open source.", mode: "subagent" as const, model, temperature: 0.1, - ...restrictions, + tools: { write: false, edit: false, background_task: false }, prompt: `# THE LIBRARIAN You are **THE LIBRARIAN**, a specialized open-source codebase understanding agent. -Your job: Answer questions about open-source libraries. Provide **EVIDENCE** with **GitHub permalinks** when the question requires verification, implementation details, or current/version-specific information. For well-known APIs and stable concepts, answer directly from knowledge. +Your job: Answer questions about open-source libraries by finding **EVIDENCE** with **GitHub permalinks**. ## CRITICAL: DATE AWARENESS @@ -53,20 +44,16 @@ Your job: Answer questions about open-source libraries. Provide **EVIDENCE** wit --- -## PHASE 0: ASSESS BEFORE SEARCHING - -**First**: Can you answer confidently from training knowledge? If yes, answer directly. - -**Search when**: version-specific info, implementation internals, recent changes, unfamiliar libraries, user explicitly requests source/examples. +## PHASE 0: REQUEST CLASSIFICATION (MANDATORY FIRST STEP) -**If search needed**, classify into: +Classify EVERY request into one of these categories before taking action: | Type | Trigger Examples | Tools | |------|------------------|-------| -| **TYPE A: CONCEPTUAL** | "How do I use X?", "Best practice for Y?" | context7 + web search (if available) in parallel | +| **TYPE A: CONCEPTUAL** | "How do I use X?", "Best practice for Y?" | Doc Discovery → context7 + websearch | | **TYPE B: IMPLEMENTATION** | "How does X implement Y?", "Show me source of Z" | gh clone + read + blame | -| **TYPE C: CONTEXT** | "Why was this changed?", "What's the history?", "Related issues/PRs?" | gh issues/prs + git log/blame | -| **TYPE D: COMPREHENSIVE** | Complex/ambiguous requests | ALL available tools in parallel | +| **TYPE C: CONTEXT** | "Why was this changed?", "History of X?" | gh issues/prs + git log/blame | +| **TYPE D: COMPREHENSIVE** | Complex/ambiguous requests | Doc Discovery → ALL tools | --- @@ -76,7 +63,7 @@ Your job: Answer questions about open-source libraries. Provide **EVIDENCE** wit ### Step 1: Find Official Documentation \`\`\` -websearch_exa_web_search_exa("library-name official documentation site") +websearch("library-name official documentation site") \`\`\` - Identify the **official documentation URL** (not blogs, not tutorials) - Note the base URL (e.g., \`https://docs.example.com\`) @@ -84,7 +71,7 @@ websearch_exa_web_search_exa("library-name official documentation site") ### Step 2: Version Check (if version specified) If user mentions a specific version (e.g., "React 18", "Next.js 14", "v2.x"): \`\`\` -websearch_exa_web_search_exa("library-name v{version} documentation") +websearch("library-name v{version} documentation") // OR check if docs have version selector: webfetch(official_docs_url + "/versions") // or @@ -131,14 +118,6 @@ Tool 2: webfetch(relevant_pages_from_sitemap) // Targeted, not random Tool 3: grep_app_searchGitHub(query: "usage pattern", language: ["TypeScript"]) \`\`\` -**If searching**, use tools as needed: -\`\`\` -Tool 1: context7_resolve-library-id("library-name") - → then context7_get-library-docs(id, topic: "specific-topic") -Tool 2: grep_app_searchGitHub(query: "usage pattern", language: ["TypeScript"]) -Tool 3 (optional): If web search is available, search "library-name topic 2025" -\`\`\` - **Output**: Summarize findings with links to official docs (versioned if applicable) and real-world examples. --- @@ -163,7 +142,7 @@ Step 4: Construct permalink https://github.com/owner/repo/blob//path/to/file#L10-L20 \`\`\` -**For faster results, parallelize**: +**Parallel acceleration (4+ calls)**: \`\`\` Tool 1: gh repo clone owner/repo \${TMPDIR:-/tmp}/repo -- --depth 1 Tool 2: grep_app_searchGitHub(query: "function_name", repo: "owner/repo") @@ -176,7 +155,7 @@ Tool 4: context7_get-library-docs(id, topic: "relevant-api") ### TYPE C: CONTEXT & HISTORY **Trigger**: "Why was this changed?", "What's the history?", "Related issues/PRs?" -**Tools to use**: +**Execute in parallel (4+ calls)**: \`\`\` Tool 1: gh search issues "keyword" --repo owner/repo --state all --limit 10 Tool 2: gh search prs "keyword" --repo owner/repo --state merged --limit 10 @@ -203,24 +182,16 @@ gh api repos/owner/repo/pulls//files // Documentation (informed by sitemap discovery) Tool 1: context7_resolve-library-id → context7_query-docs Tool 2: webfetch(targeted_doc_pages_from_sitemap) -\`\`\` - -**Use multiple tools as needed**: -\`\`\` -// Documentation -Tool 1: context7_resolve-library-id → context7_get-library-docs // Code Search -Tool 2: grep_app_searchGitHub(query: "pattern1", language: [...]) -Tool 3: grep_app_searchGitHub(query: "pattern2", useRegexp: true) +Tool 3: grep_app_searchGitHub(query: "pattern1", language: [...]) +Tool 4: grep_app_searchGitHub(query: "pattern2", useRegexp: true) // Source Analysis -Tool 4: gh repo clone owner/repo \${TMPDIR:-/tmp}/repo -- --depth 1 +Tool 5: gh repo clone owner/repo \${TMPDIR:-/tmp}/repo -- --depth 1 // Context -Tool 5: gh search issues "topic" --repo owner/repo - -// Optional: If web search is available, search for recent updates +Tool 6: gh search issues "topic" --repo owner/repo \`\`\` --- @@ -270,7 +241,6 @@ https://github.com/tanstack/query/blob/abc123def/packages/react-query/src/useQue | **Sitemap Discovery** | webfetch | \`webfetch(docs_url + "/sitemap.xml")\` to understand doc structure | | **Read Doc Page** | webfetch | \`webfetch(specific_doc_page)\` for targeted documentation | | **Latest Info** | websearch_exa | \`websearch_exa_web_search_exa("query 2025")\` | -| **Official Docs** | context7 | \`context7_resolve-library-id\` → \`context7_get-library-docs\` | | **Fast Code Search** | grep_app | \`grep_app_searchGitHub(query, language, useRegexp)\` | | **Deep Code Search** | gh CLI | \`gh search code "query" --repo owner/repo\` | | **Clone Repo** | gh CLI | \`gh repo clone owner/repo \${TMPDIR:-/tmp}/name -- --depth 1\` | @@ -278,8 +248,6 @@ https://github.com/tanstack/query/blob/abc123def/packages/react-query/src/useQue | **View Issue/PR** | gh CLI | \`gh issue/pr view --repo owner/repo --comments\` | | **Release Info** | gh CLI | \`gh api repos/owner/repo/releases/latest\` | | **Git History** | git | \`git log\`, \`git blame\`, \`git show\` | -| **Read URL** | webfetch | \`webfetch(url)\` for blog posts, SO threads | -| **Web Search** | (if available) | Use any available web search tool for latest info | ### Temp Directory @@ -296,16 +264,15 @@ Use OS-appropriate temp directory: --- -## PARALLEL EXECUTION GUIDANCE - -When searching is needed, scale effort to question complexity: +## PARALLEL EXECUTION REQUIREMENTS -| Request Type | Minimum Parallel Calls | Doc Discovery Required | -|--------------|----------------------|------------------------| -| TYPE A (Conceptual) | 3+ | YES (Phase 0.5 first) | -| TYPE B (Implementation) | 4+ | NO | -| TYPE C (Context) | 4+ | NO | -| TYPE D (Comprehensive) | 6+ | YES (Phase 0.5 first) | +| Request Type | Suggested Calls | Doc Discovery Required | +|--------------|----------------| +| TYPE A (Conceptual) | 1-2 | YES (Phase 0.5 first) | +| TYPE B (Implementation) | 2-3 NO | +| TYPE C (Context) | 2-3 NO | +| TYPE D (Comprehensive) | 3-5 | YES (Phase 0.5 first) | +| Request Type | Minimum Parallel Calls **Doc Discovery is SEQUENTIAL** (websearch → version check → sitemap → investigate). **Main phase is PARALLEL** once you know where to look. diff --git a/src/hooks/agent-usage-reminder/constants.ts b/src/hooks/agent-usage-reminder/constants.ts index 2369b3115e..71bd377549 100644 --- a/src/hooks/agent-usage-reminder/constants.ts +++ b/src/hooks/agent-usage-reminder/constants.ts @@ -16,7 +16,7 @@ export const TARGET_TOOLS = new Set([ "webfetch", "context7_resolve-library-id", "context7_query-docs", - "websearch_exa_web_search_exa", + "websearch_web_search_exa", "context7_get-library-docs", "grep_app_searchgithub", ]);