feat(auto_cl): add error rate threshold for punishment attenuation#3219
feat(auto_cl): add error rate threshold for punishment attenuation#3219feng-y wants to merge 5 commits intoapache:masterfrom
Conversation
Add new GFlag `auto_cl_error_rate_punish_threshold` to enable error-rate-based punishment attenuation in AutoConcurrencyLimiter. Problem: Low error rates (e.g., 1.3% sporadic timeouts) cause disproportionate avg_latency inflation (+31%), leading the limiter to mistakenly shrink max_concurrency and trigger ELIMIT rejections. Solution: Inspired by Alibaba Sentinel's threshold-based approach: - threshold=0 (default): Original behavior preserved (backward compat) - threshold>0 (e.g., 0.1): Error rates below threshold produce zero punishment; above it, punishment scales linearly from 0 to full Example: With threshold=0.1, a 5% error rate produces no punishment, while a 50% error rate produces 44% of the original punishment. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a configurable “dead-zone” for AutoConcurrencyLimiter’s failed-request punishment so low error rates don’t disproportionately inflate computed latency and shrink max concurrency.
Changes:
- Introduces new GFlag
auto_cl_error_rate_punish_thresholdto attenuate/disable failed-request punishment below a configured error-rate threshold. - Applies threshold-based linear scaling of
failed_punishinAutoConcurrencyLimiter::UpdateMaxConcurrency. - Adds unit tests covering threshold=0 (backward compatible), below/at/above-threshold behavior, and scaling cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/brpc/policy/auto_concurrency_limiter.cpp |
Adds the new GFlag and applies threshold-based attenuation to failed-request punishment during window updates. |
test/brpc_auto_concurrency_limiter_unittest.cpp |
Adds new unit tests validating the threshold attenuation behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Use synthetic timestamps instead of sleeping for deterministic tests - Fix trigger sample counting to preserve exact error rates - Consolidate 7 tests to 4 core tests with two-sided assertions - Add expected value range validation in assertions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Document error punishment related GFlags - Add detailed explanation for auto_cl_error_rate_punish_threshold - Include table of all configurable parameters with defaults Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix avg_latency comments to reflect std::ceil() rounding behavior - Add cc_test target in BUILD.bazel for Bazel CI coverage Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
b90b9fb to
1a01df3
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1a01df3 to
dbd4089
Compare
- Skip attenuation logic when threshold <= 0 or >= 1 - Update GFlag description to document valid range (0, 1) - Add documentation for the new parameter Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
dbd4089 to
a18e2bf
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@wwbmmm fix |
Add new GFlag
auto_cl_error_rate_punish_thresholdto enable error-rate-based punishment attenuation in AutoConcurrencyLimiter.Problem: Low error rates (e.g., 1.3% sporadic timeouts) cause disproportionate avg_latency inflation (+31%), leading the limiter to mistakenly shrink max_concurrency and trigger ELIMIT rejections.
Solution: Inspired by Alibaba Sentinel's threshold-based approach:
Example: With threshold=0.1, a 5% error rate produces no punishment, while a 50% error rate produces 44% of the original punishment.
What problem does this PR solve?
Issue Number: resolve
Problem Summary:
What is changed and the side effects?
Changed:
Side effects:
Performance effects:
Breaking backward compatibility:
Check List: