Align exception hierarchy with Crawlee JS

## Summary

❗ **Note:** This is a starting point for discussion, not a final decision. The team should review and adjust before any implementation begins.❗ 

Crawlee JS defines a structured exception hierarchy that controls retry behavior and crawler lifecycle, while Crawlee Python is missing several key exception types. Before implementing anything, **this needs to be discussed with the whole team** to agree on the target state — the final structure is not decided yet.

This issue is for discussion and planning, not immediate implementation.

## Current State Comparison

### JS Error Hierarchy (`packages/core/src/errors.ts`)

```
Error (native)
├── NonRetryableError                      # Never retried
│   └── CriticalError                      # Shuts down the crawler
│       ├── MissingRouteError              # No route found — fatal
│       ├── ContextPipelineCleanupError    # Cleanup failure — fatal
│       └── BrowserLaunchError             # Browser launch failure — fatal
├── RetryRequestError                      # Always retried (overrides maxRequestRetries)
│   └── SessionError                       # Triggers session rotation
├── ContextPipelineInterruptedError
├── ContextPipelineInitializationError
├── RequestHandlerError
└── CookieParseError
```

### Python Error Hierarchy (`src/crawlee/errors.py`)

```
Exception
├── UserDefinedErrorHandlerError
│   └── UserHandlerTimeoutError
├── SessionError                            # ✅ Parity
│   └── ProxyError                          # Python ahead (JS has no dedicated ProxyError)
├── ServiceConflictError
├── HttpStatusCodeError
│   └── HttpClientStatusCodeError
├── RequestHandlerError [Generic]           # Python ahead (wraps with crawling context)
├── ContextPipelineInitializationError      # ✅ Parity
├── ContextPipelineFinalizationError        # ✅ Parity (named differently)
├── ContextPipelineInterruptedError         # ✅ Parity
├── RequestCollisionError
└── AbortError (internal)
```

### Gap Analysis

| Exception | JS | Python | Status |
|---|---|---|---|
| `RetryRequestError` | ✅ Always retried, overrides `maxRequestRetries` | ❌ Missing | **Gap** |
| `NonRetryableError` | ✅ Never retried | ❌ Missing | **Gap** |
| `CriticalError` | ✅ Shuts down crawler | ❌ Missing | **Gap** |
| `MissingRouteError` | ✅ Extends `CriticalError`, thrown by Router | ❌ Missing | **Gap** |
| `BrowserLaunchError` | ✅ Extends `CriticalError` | ❌ Missing | **Gap** |
| `CookieParseError` | ✅ Dedicated type | ❌ Missing | **Gap** |
| `SessionError` | ✅ Extends `RetryRequestError` | ✅ Standalone | Parity (different base) |
| `ProxyError` | ❌ Part of `SessionError` | ✅ Extends `SessionError` | Python ahead |
| `RequestHandlerError` | ✅ Simple wrapper | ✅ Generic with crawling context | Python ahead |

## What's Missing and Why It Matters

### 1. `RetryRequestError` — Force unlimited retries
In JS, throwing `RetryRequestError` in a handler overrides `maxRequestRetries` and forces the request to be retried. Python has no equivalent — users cannot signal "keep retrying this request" from within a handler.

In JS, `SessionError extends RetryRequestError`, which means session errors are also always retried (with a separate `maxSessionRotations` limit). In Python, `SessionError` already has special handling, but there's no general-purpose "always retry" error.

### 2. `NonRetryableError` — Skip retries entirely
In JS, throwing `NonRetryableError` marks the request as failed immediately without any retries. Python has no way for users to signal from a handler that an error should not be retried.

### 3. `CriticalError` — Shut down the crawler
In JS, `CriticalError extends NonRetryableError` and causes the entire crawler to abort. This is used for unrecoverable situations (e.g., no route found, browser won't launch). Python has no equivalent — unrecoverable errors don't trigger a clean crawler shutdown.

### 4. `MissingRouteError` — Router fails loudly
In JS, if no route matches a request label and there's no default handler, a `MissingRouteError` (extending `CriticalError`) is thrown, shutting down the crawler immediately. This makes misconfigured routers fail fast and visibly. In Python, this situation is handled differently (no dedicated error type).

### 5. `BrowserLaunchError` / `CookieParseError` — Domain-specific errors
Lower priority, but useful for users to catch and handle specific failure modes.

## Discussion Points

Before implementation, we need to agree on:

1. **Should we mirror the JS hierarchy exactly, or adapt it for Python idioms?**
   - JS: `SessionError extends RetryRequestError` — should Python do the same, or keep `SessionError` standalone with special handling?
   - Python already has `ProxyError extends SessionError` which JS lacks — do we keep this?

2. **Naming conventions** — Python uses both `*Error` and `*Exception` in the standard library. Should we stick with `*Error` for consistency with JS?

3. **What about Python-specific exceptions we already have?**
   - `UserDefinedErrorHandlerError`, `HttpStatusCodeError`, `ServiceConflictError`, `RequestCollisionError` — these don't exist in JS. Should they stay as-is?

4. **Integration with `BasicCrawler` error handling logic**
   - Adding `RetryRequestError` and `NonRetryableError` requires changes to the retry logic in `BasicCrawler._handle_request_function()`.
   - `CriticalError` needs integration with `AutoscaledPool` to trigger shutdown.

5. **Crawlee JS v4 direction** — The v4 branch has the same error hierarchy as v3. Should we wait for v4 to stabilize, or align with the current state?

## Proposed Target Hierarchy (for discussion)

```
Exception
├── RetryRequestError                       # NEW: Always retried
│   └── SessionError                        # CHANGED: Re-parent under RetryRequestError
│       └── ProxyError                      # KEEP: Python-specific
├── NonRetryableError                       # NEW: Never retried
│   └── CriticalError                       # NEW: Shuts down crawler
│       ├── MissingRouteError               # NEW: No route found
│       └── BrowserLaunchError              # NEW: Browser launch failure
├── UserDefinedErrorHandlerError            # KEEP
│   └── UserHandlerTimeoutError             # KEEP
├── ServiceConflictError                    # KEEP
├── HttpStatusCodeError                     # KEEP
│   └── HttpClientStatusCodeError           # KEEP
├── RequestHandlerError [Generic]           # KEEP
├── ContextPipelineInitializationError      # KEEP
├── ContextPipelineFinalizationError        # KEEP
├── ContextPipelineInterruptedError         # KEEP
├── RequestCollisionError                   # KEEP
├── CookieParseError                        # NEW
└── AbortError (internal)                   # KEEP
```

Exception	JS	Python	Status
`RetryRequestError`	✅ Always retried, overrides `maxRequestRetries`	❌ Missing	Gap
`NonRetryableError`	✅ Never retried	❌ Missing	Gap
`CriticalError`	✅ Shuts down crawler	❌ Missing	Gap
`MissingRouteError`	✅ Extends `CriticalError`, thrown by Router	❌ Missing	Gap
`BrowserLaunchError`	✅ Extends `CriticalError`	❌ Missing	Gap
`CookieParseError`	✅ Dedicated type	❌ Missing	Gap
`SessionError`	✅ Extends `RetryRequestError`	✅ Standalone	Parity (different base)
`ProxyError`	❌ Part of `SessionError`	✅ Extends `SessionError`	Python ahead
`RequestHandlerError`	✅ Simple wrapper	✅ Generic with crawling context	Python ahead

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align exception hierarchy with Crawlee JS #1739

Summary

Current State Comparison

JS Error Hierarchy (`packages/core/src/errors.ts`)

Python Error Hierarchy (`src/crawlee/errors.py`)

Gap Analysis

What's Missing and Why It Matters

1. `RetryRequestError` — Force unlimited retries

2. `NonRetryableError` — Skip retries entirely

3. `CriticalError` — Shut down the crawler

4. `MissingRouteError` — Router fails loudly

5. `BrowserLaunchError` / `CookieParseError` — Domain-specific errors

Discussion Points

Proposed Target Hierarchy (for discussion)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Align exception hierarchy with Crawlee JS #1739

Description

Summary

Current State Comparison

JS Error Hierarchy (packages/core/src/errors.ts)

Python Error Hierarchy (src/crawlee/errors.py)

Gap Analysis

What's Missing and Why It Matters

1. RetryRequestError — Force unlimited retries

2. NonRetryableError — Skip retries entirely

3. CriticalError — Shut down the crawler

4. MissingRouteError — Router fails loudly

5. BrowserLaunchError / CookieParseError — Domain-specific errors

Discussion Points

Proposed Target Hierarchy (for discussion)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

JS Error Hierarchy (`packages/core/src/errors.ts`)

Python Error Hierarchy (`src/crawlee/errors.py`)

1. `RetryRequestError` — Force unlimited retries

2. `NonRetryableError` — Skip retries entirely

3. `CriticalError` — Shut down the crawler

4. `MissingRouteError` — Router fails loudly

5. `BrowserLaunchError` / `CookieParseError` — Domain-specific errors