Python: Add auto_retry.py sample for rate limit handling by Copilot · Pull Request #4223 · microsoft/agent-framework

Copilot · 2026-02-24T19:21:40Z

Motivation and Context

Every model inference API is rate-limited. Without built-in retry support in the framework, every consumer must write the same boilerplate. This sample shows two clean, copy-paste-ready patterns using tenacity so developers can focus on agent logic instead.

Description

Adds python/samples/02-agents/auto_retry.py demonstrating two approaches to automatic retry on RateLimitError (HTTP 429):

Approach 1 – Client subclass
Override get_response() in a subclass of any chat client. Non-streaming calls are wrapped in a tenacity.AsyncRetrying loop so each retry re-issues the full HTTP request (not a re-await of a stale coroutine). Streaming falls back to the parent with a note that streaming retry requires more delicate handling.

class AzureOpenAIChatClientWithRetry(AzureOpenAIChatClient):
    retry_attempts: int = 3

    def get_response(self, *args, **kwargs):
        if kwargs.get("stream", False):
            return super().get_response(*args, **kwargs)

        async def _with_retry():
            async for attempt in AsyncRetrying(
                stop=stop_after_attempt(self.retry_attempts),
                wait=wait_exponential(multiplier=1, min=4, max=10),
                retry=retry_if_exception_type(RateLimitError),
                reraise=True,
                before_sleep=before_sleep_log(logger, logging.WARNING),
            ):
                with attempt:
                    return await super(AzureOpenAIChatClientWithRetry, self).get_response(*args, **kwargs)

        return _with_retry()

Approach 2 – Chat middleware (two styles)

Class-based – RateLimitRetryMiddleware(ChatMiddleware) wraps await call_next() in AsyncRetrying.

Function-based – @chat_middleware decorated function, with an inner async function decorated by @retry, that calls call_next():

@chat_middleware
async def rate_limit_retry_middleware(context: ChatContext, call_next: Callable[[], Awaitable[None]]) -> None:
    @retry(stop=stop_after_attempt(3), wait=wait_exponential(...),
           retry=retry_if_exception_type(RateLimitError), reraise=True)
    async def _call_next_with_retry() -> None:
        await call_next()

    await _call_next_with_retry()

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Original prompt

This section details on the original issue you should resolve

<issue_title>Python: Add typical rate limiting handling for model clients</issue_title>
<issue_description>Every single model inference API is rate limited these days, so any practical use of the method AzureOpenAIChatClient.create_agent would need to deal with retry logic leading to boiler plate code.

I would like Agent Framework to solve this typical pattern, helping the developer to focus on the rest. A simple implementation using tenacity library would be as follows (but streaming needs a more delicate handling):

class AzureOpenAIChatClientWithRetry(AzureOpenAIChatClient):
    """Azure OpenAI Chat Client with built-in retry logic for handling rate limits."""

    retry_attempts = 3
    """Number of retry attempts for rate limit errors."""

    @staticmethod
    def _before_sleep_log(retry_state: RetryCallState) -> None:
        """Log when rate limiting is reached and retry is about to sleep."""
        attempt_number = retry_state.attempt_number
        wait_time = retry_state.next_action.sleep if retry_state.next_action else 0
        logger.warning(
            "Rate limiting reached. Attempt %d failed. Retrying in %.2f seconds...",
            attempt_number,
            wait_time,
        )

    @override
    @retry(
        stop=stop_after_attempt(retry_attempts),
        wait=wait_exponential(multiplier=1, min=4, max=10),
        retry=retry_if_exception_type(RateLimitError),
        reraise=True,
        before_sleep=_before_sleep_log
    )
    def get_response(self, *args, **kwargs):
        """Get response with retry on rate limit errors (429 status code only)."""
        return super().get_response(*args, **kwargs)
```</issue_description>

<agent_instructions>Create a new sample in python/samples/02-agents called `auto_retry.py` and build a retry setup in two ways, 1) the way outlined in this issue, with a class wrapper that can take any ChatClient and wraps the get_response with the `@retry` decorator and 2) with a chat middleware that does a wraps the call to `call_next` </agent_instructions>

## Comments on the Issue (you are @copilot in this section)

<comments>
<comment_new><author>@eavanvalkenburg</author><body>
Given how simple this is to write, I don't think we need to do this uniformly, would you be willing to create a sample out of this @santiagxf ?</body></comment_new>
<comment_new><author>@eavanvalkenburg</author><body>
The reason we do not want to include this directly in is because there is no single standard (even though tenacity does come close) and so we won't want to get into a situation that means we have to support a number of variants, while the base implementation with any of them is quite simple.</body></comment_new>
</comments>

Fixes Python: Add typical rate limiting handling for model clients #1362

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

Initial plan

533b1e6

Copilot AI assigned Copilot and eavanvalkenburg Feb 24, 2026

Copilot started work on behalf of eavanvalkenburg February 24, 2026 19:21 View session

Add auto_retry.py sample for rate limiting handling

d12e0b7

Co-authored-by: eavanvalkenburg <13749212+eavanvalkenburg@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add rate limiting handling for model clients~~ Python: Add auto_retry.py sample for rate limit handling Feb 24, 2026

Copilot finished work on behalf of eavanvalkenburg February 24, 2026 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Python: Add auto_retry.py sample for rate limit handling#4223

Python: Add auto_retry.py sample for rate limit handling#4223
Copilot wants to merge 2 commits intomainfrom
copilot/add-rate-limiting-handling

Copilot AI commented Feb 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Copilot AI commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

Contribution Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 24, 2026 •

edited

Loading