Skip to content

feat(auto_cl): add error rate threshold for punishment attenuation#3219

Open
feng-y wants to merge 5 commits intoapache:masterfrom
feng-y:feat/auto-cl-error-rate-threshold
Open

feat(auto_cl): add error rate threshold for punishment attenuation#3219
feng-y wants to merge 5 commits intoapache:masterfrom
feng-y:feat/auto-cl-error-rate-threshold

Conversation

@feng-y
Copy link

@feng-y feng-y commented Feb 9, 2026

Add new GFlag auto_cl_error_rate_punish_threshold to enable error-rate-based punishment attenuation in AutoConcurrencyLimiter.

Problem: Low error rates (e.g., 1.3% sporadic timeouts) cause disproportionate avg_latency inflation (+31%), leading the limiter to mistakenly shrink max_concurrency and trigger ELIMIT rejections.

Solution: Inspired by Alibaba Sentinel's threshold-based approach:

  • threshold=0 (default): Original behavior preserved (backward compat)
  • threshold>0 (e.g., 0.1): Error rates below threshold produce zero punishment; above it, punishment scales linearly from 0 to full

Example: With threshold=0.1, a 5% error rate produces no punishment, while a 50% error rate produces 44% of the original punishment.

What problem does this PR solve?

Issue Number: resolve

Problem Summary:

What is changed and the side effects?

Changed:

Side effects:

  • Performance effects:

  • Breaking backward compatibility:


Check List:

Add new GFlag `auto_cl_error_rate_punish_threshold` to enable
error-rate-based punishment attenuation in AutoConcurrencyLimiter.

Problem: Low error rates (e.g., 1.3% sporadic timeouts) cause
disproportionate avg_latency inflation (+31%), leading the limiter
to mistakenly shrink max_concurrency and trigger ELIMIT rejections.

Solution: Inspired by Alibaba Sentinel's threshold-based approach:
- threshold=0 (default): Original behavior preserved (backward compat)
- threshold>0 (e.g., 0.1): Error rates below threshold produce zero
  punishment; above it, punishment scales linearly from 0 to full

Example: With threshold=0.1, a 5% error rate produces no punishment,
while a 50% error rate produces 44% of the original punishment.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a configurable “dead-zone” for AutoConcurrencyLimiter’s failed-request punishment so low error rates don’t disproportionately inflate computed latency and shrink max concurrency.

Changes:

  • Introduces new GFlag auto_cl_error_rate_punish_threshold to attenuate/disable failed-request punishment below a configured error-rate threshold.
  • Applies threshold-based linear scaling of failed_punish in AutoConcurrencyLimiter::UpdateMaxConcurrency.
  • Adds unit tests covering threshold=0 (backward compatible), below/at/above-threshold behavior, and scaling cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/brpc/policy/auto_concurrency_limiter.cpp Adds the new GFlag and applies threshold-based attenuation to failed-request punishment during window updates.
test/brpc_auto_concurrency_limiter_unittest.cpp Adds new unit tests validating the threshold attenuation behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

feng-y and others added 2 commits February 9, 2026 18:18
- Use synthetic timestamps instead of sleeping for deterministic tests
- Fix trigger sample counting to preserve exact error rates
- Consolidate 7 tests to 4 core tests with two-sided assertions
- Add expected value range validation in assertions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Document error punishment related GFlags
- Add detailed explanation for auto_cl_error_rate_punish_threshold
- Include table of all configurable parameters with defaults

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix avg_latency comments to reflect std::ceil() rounding behavior
- Add cc_test target in BUILD.bazel for Bazel CI coverage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@feng-y feng-y force-pushed the feat/auto-cl-error-rate-threshold branch from b90b9fb to 1a01df3 Compare February 9, 2026 11:58
@feng-y feng-y requested a review from Copilot February 9, 2026 12:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@feng-y feng-y force-pushed the feat/auto-cl-error-rate-threshold branch from 1a01df3 to dbd4089 Compare February 9, 2026 13:16
- Skip attenuation logic when threshold <= 0 or >= 1
- Update GFlag description to document valid range (0, 1)
- Add documentation for the new parameter

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@feng-y feng-y force-pushed the feat/auto-cl-error-rate-threshold branch from dbd4089 to a18e2bf Compare February 9, 2026 13:17
@feng-y feng-y requested a review from Copilot February 9, 2026 13:18
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@feng-y
Copy link
Author

feng-y commented Feb 9, 2026

@wwbmmm fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant