feat: add webhook_url parameter to crawler endpoint#71
Conversation
Add support for webhook notifications when crawl jobs complete. This allows users to receive POST notifications at a specified URL when their crawl job finishes processing. Changes: - Python SDK: Added webhook_url to CrawlRequest model with validation - Python SDK: Updated sync and async client crawl methods - JavaScript SDK: Added webhookUrl option to crawl function Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
Adds support for providing a webhook URL so users can receive POST notifications when crawl jobs complete.
Changes:
- Python SDK: Extend
CrawlRequestwithwebhook_urland add validation. - Python SDK: Add
webhook_urlparameter to sync/asynccrawl()methods and include it in request payloads. - JavaScript SDK: Add
webhookUrloption and map it towebhook_urlin the crawl payload.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| scrapegraph-py/scrapegraph_py/models/crawl.py | Adds webhook_url field to CrawlRequest plus URL-format validation. |
| scrapegraph-py/scrapegraph_py/client.py | Adds webhook_url to sync client crawl() signature, logging, and request JSON. |
| scrapegraph-py/scrapegraph_py/async_client.py | Adds webhook_url to async client crawl() signature and request JSON. |
| scrapegraph-js/src/crawl.js | Adds webhookUrl option and sends it as webhook_url in the request payload. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @model_validator(mode="after") | ||
| def validate_webhook_url(self) -> "CrawlRequest": | ||
| """Validate webhook URL format if provided""" | ||
| if self.webhook_url is not None: | ||
| if not self.webhook_url.strip(): | ||
| raise ValueError("Webhook URL cannot be empty") | ||
| if not ( | ||
| self.webhook_url.startswith("http://") | ||
| or self.webhook_url.startswith("https://") | ||
| ): | ||
| raise ValueError( | ||
| "Invalid webhook URL - must start with http:// or https://" | ||
| ) | ||
| return self |
There was a problem hiding this comment.
Missing test coverage for the newly added webhook_url behavior (accepts valid http(s) URLs, rejects empty/whitespace, rejects non-http(s) schemes, and is included/excluded correctly in model_dump(exclude_none=True)). There are already CrawlRequest validation/serialization tests (e.g., scrapegraph-py/tests/test_crawl_path_filtering.py) that should be extended to cover this field.
| request_data["exclude_paths"] = exclude_paths | ||
| if webhook_url is not None: | ||
| request_data["webhook_url"] = webhook_url |
There was a problem hiding this comment.
The client now adds webhook_url to request_data, but there are no assertions in the existing crawl request-body tests to ensure this field is actually sent when provided (and omitted when None). Please extend the existing crawl tests that inspect request JSON (e.g., in scrapegraph-py/tests/test_crawl_polling.py) to cover webhook_url.
| request_data["include_paths"] = include_paths | ||
| if exclude_paths is not None: | ||
| request_data["exclude_paths"] = exclude_paths | ||
| if webhook_url is not None: | ||
| request_data["webhook_url"] = webhook_url |
There was a problem hiding this comment.
Async crawl currently supports webhook_url, but the async client tests don’t verify the outgoing request payload includes webhook_url when set. Consider adding an async test that asserts the POST body contains webhook_url (and omits it when None) to prevent regressions.
| if (webhookUrl) { | ||
| payload.webhook_url = webhookUrl; |
There was a problem hiding this comment.
if (webhookUrl) silently ignores explicitly provided empty-string values and also allows whitespace-only strings through (both are likely invalid URLs). Prefer treating “provided” as webhookUrl != null, validating it’s a non-empty string after trim and starts with http:// or https://, and then setting payload.webhook_url (or throwing a clear error) to avoid surprising no-op behavior.
| if (webhookUrl) { | |
| payload.webhook_url = webhookUrl; | |
| if (webhookUrl != null) { | |
| if (typeof webhookUrl !== 'string') { | |
| throw new Error('webhookUrl must be a string starting with "http://" or "https://".'); | |
| } | |
| const trimmedWebhookUrl = webhookUrl.trim(); | |
| if (!trimmedWebhookUrl) { | |
| throw new Error('webhookUrl must be a non-empty string.'); | |
| } | |
| if (!trimmedWebhookUrl.startsWith('http://') && !trimmedWebhookUrl.startsWith('https://')) { | |
| throw new Error('webhookUrl must start with "http://" or "https://".'); | |
| } | |
| payload.webhook_url = trimmedWebhookUrl; |
|
🎉 This PR is included in version 1.45.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
Add support for webhook notifications when crawl jobs complete. This allows users to receive POST notifications at a specified URL when their crawl job finishes processing.
Changes: