<fix>[host]: add reconnect jitter and concurrency control [ZSTAC-61971]#3392
<fix>[host]: add reconnect jitter and concurrency control [ZSTAC-61971]#3392zstack-robot-1 wants to merge 1 commit into5.5.12from
Conversation
…cale clusters Resolves: ZSTAC-61971 Change-Id: I267818cfc24ad3b358087f47274749c65ad3fc66
概述向主机全局配置中添加两个新配置项(重连抖动和并发限制),并在主机追踪模块中实现相应的信号量并发控制和随机延迟机制。 变更
代码审查工作量估计🎯 3 (中等) | ⏱️ ~25 分钟 诗歌
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java`:
- Around line 76-80: The reconnectSemaphore is read twice from the field (once
for tryAcquire and again for release), which can cause acquire/release to happen
on different Semaphore instances after a hot-reload; fix by capturing the field
into a local final variable (e.g., final Semaphore localReconnectSemaphore =
this.reconnectSemaphore) immediately before tryAcquire in the HostTrackImpl
method that does reconnect handling, use that local variable for both tryAcquire
and release (and any early returns) so the same Semaphore instance is used
regardless of concurrent updates.
ℹ️ Review info
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
compute/src/main/java/org/zstack/compute/host/HostGlobalConfig.javacompute/src/main/java/org/zstack/compute/host/HostTrackImpl.java
| if (!reconnectSemaphore.tryAcquire()) { | ||
| logger.debug(String.format("[Host Tracker]: reconnect concurrency limit reached, deferring reconnect for host[uuid:%s]", uuid)); | ||
| completion.success(); | ||
| return; | ||
| } |
There was a problem hiding this comment.
动态配置更新后可能释放到错误的 Semaphore,导致并发上限失真。
Line 76 获取许可时使用的是“当前实例”,但 Line 89 释放时再次读取 reconnectSemaphore 字段。若期间触发 Line 381-385 的热更新替换实例,释放会落到新实例,造成旧实例许可泄漏、新实例许可超发。
🔧 建议修复(确保 acquire/release 使用同一实例)
private void reconnectNow(String uuid, Completion completion) {
- if (!reconnectSemaphore.tryAcquire()) {
+ final Semaphore acquiredSemaphore = reconnectSemaphore;
+ if (!acquiredSemaphore.tryAcquire()) {
logger.debug(String.format("[Host Tracker]: reconnect concurrency limit reached, deferring reconnect for host[uuid:%s]", uuid));
completion.success();
return;
}
ReconnectHostMsg msg = new ReconnectHostMsg();
msg.setHostUuid(uuid);
msg.setSkipIfHostConnected(true);
bus.makeTargetServiceIdByResourceUuid(msg, HostConstant.SERVICE_ID, uuid);
- bus.send(msg, new CloudBusCallBack(completion) {
- `@Override`
- public void run(MessageReply reply) {
- reconnectSemaphore.release();
- if (reply.isSuccess()) {
- completion.success();
- } else {
- completion.fail(reply.getError());
- }
- }
- });
+ try {
+ bus.send(msg, new CloudBusCallBack(completion) {
+ `@Override`
+ public void run(MessageReply reply) {
+ acquiredSemaphore.release();
+ if (reply.isSuccess()) {
+ completion.success();
+ } else {
+ completion.fail(reply.getError());
+ }
+ }
+ });
+ } catch (Throwable t) {
+ acquiredSemaphore.release();
+ throw t;
+ }
}Also applies to: 89-89, 381-385
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java` around
lines 76 - 80, The reconnectSemaphore is read twice from the field (once for
tryAcquire and again for release), which can cause acquire/release to happen on
different Semaphore instances after a hot-reload; fix by capturing the field
into a local final variable (e.g., final Semaphore localReconnectSemaphore =
this.reconnectSemaphore) immediately before tryAcquire in the HostTrackImpl
method that does reconnect handling, use that local variable for both tryAcquire
and release (and any early returns) so the same Semaphore instance is used
regardless of concurrent updates.
Summary
connection.reconnectMaxConcurrency)connection.reconnectJitterMaxSeconds)Changes
HostGlobalConfig.java: 新增RECONNECT_JITTER_MAX_SECONDS和RECONNECT_MAX_CONCURRENCY配置项HostTrackImpl.java:reconnectNow()增加 Semaphore acquire/releasestartTrack()增加随机 jitter 延迟Test Plan
Resolves: ZSTAC-61971
sync from gitlab !9240