<fix>[ceph]: filter stale watchers from disconnected hosts before VM start#3375
<fix>[ceph]: filter stale watchers from disconnected hosts before VM start#3375ZStack-Robot wants to merge 1 commit into5.5.12from
Conversation
…start Resolves: ZSTAC-73476 Change-Id: Ifb12d3d457f4f1ff803f2540d20d3d2460bee2bc
Walkthrough在 Ceph 主存储工厂中添加了过滤逻辑,移除与断开连接主机关联的 RBD 观察器。新增辅助方法用于从观察器字符串中提取 IP 地址,并更新了现有流程以重新计算观察器列表。 Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (1 passed)
✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageFactory.java`:
- Around line 1237-1256: The current stale-watcher filtering builds
disconnectedHostIps only from HostVO_.managementIp, missing additional addresses
stored in HostSystemTags.EXTRA_IPS; update the logic in
CephPrimaryStorageFactory where disconnectedHostIps is constructed so it also
queries and parses EXTRA_IPS for each non-Connected HostVO (split/tag value
parsing as done for Ceph Monitor disconnect handling), add those extra IPs into
the disconnectedHostIps set, and then continue to use that enriched set in the
existing loop that calls extractWatcherIp(watcher) to filter watchers so stale
watchers originating from EXTRA_IPS are correctly removed.
| // Filter out stale watchers from disconnected hosts (ZSTAC-73476) | ||
| // When libvirt hangs on a host, ceph rbd watchers become stale but persist, | ||
| // blocking VM start on other hosts. Only watchers from connected hosts are valid. | ||
| Set<String> disconnectedHostIps = new HashSet<>(Q.New(HostVO.class) | ||
| .select(HostVO_.managementIp) | ||
| .notEq(HostVO_.status, HostStatus.Connected) | ||
| .listValues()); | ||
| if (!disconnectedHostIps.isEmpty()) { | ||
| List<String> activeWatchers = new ArrayList<>(); | ||
| for (String watcher : watchers) { | ||
| String watcherIp = extractWatcherIp(watcher); | ||
| if (watcherIp != null && disconnectedHostIps.contains(watcherIp)) { | ||
| logger.info(String.format("filtered stale watcher from disconnected host[ip:%s] for volume[uuid:%s]: %s", | ||
| watcherIp, msg.getVolumeUuid(), watcher)); | ||
| continue; | ||
| } | ||
| activeWatchers.add(watcher); | ||
| } | ||
| watchers = activeWatchers; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for EXTRA_IPS usage in Ceph plugin and related Ceph connection code
rg -n "EXTRA_IPS" plugin/ceph --type=java -A 3 -B 3Repository: MatheMatrix/zstack
Length of output: 4661
🏁 Script executed:
# Search for Ceph client connection IP configuration or setup
rg -n -C 5 "monitorAddress|ceph.*ip|RBDClient.*connect|storageNetworkInterface" plugin/ceph --type=javaRepository: MatheMatrix/zstack
Length of output: 10579
🏁 Script executed:
# Look for extractWatcherIp implementation and watcher format documentation
rg -n "extractWatcherIp|watchers|rbd status" plugin/ceph --type=java -A 5 -B 2Repository: MatheMatrix/zstack
Length of output: 12838
🏁 Script executed:
# Search for how watchers are obtained and their expected format
rg -n "GetVolumeWatchers|ceph.*watcher|libvirt.*watcher" plugin/ceph --type=java -A 3Repository: MatheMatrix/zstack
Length of output: 8241
🏁 Script executed:
# Check if there are existing comments or tests about watcher IP sources
rg -n "watcher.*ip|disconnected.*host|stale.*watcher" plugin/ceph --type=java -A 5 -B 2Repository: MatheMatrix/zstack
Length of output: 3726
🏁 Script executed:
# Search in the full repository for how Ceph is configured to use specific networks
rg -n "CephMonitorAddress|CephMonitor.*ip" . --type=java | head -20Repository: MatheMatrix/zstack
Length of output: 44
🏁 Script executed:
# Look at the actual file around the changed lines to get more context
sed -n '1220,1300p' plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageFactory.javaRepository: MatheMatrix/zstack
Length of output: 3590
断开主机的 stale watcher 过滤需要同时考虑 EXTRA_IPS
Line 1240-1243 仅使用 HostVO_.managementIp 来构建断开主机的 IP 集合,但在 Line 537-540 处理 Ceph Monitor 断开连接时,代码同时使用了 managementIp 和 HostSystemTags.EXTRA_IPS。若 Ceph RBD 客户端实际使用了数据网或存储网 IP(记录在 EXTRA_IPS 中),来自这些 IP 的 stale watcher 将无法被过滤出来,VM 仍会被阻塞。
建议补充 EXTRA_IPS 的处理,与 Ceph Mon 断开连接逻辑保持一致:
修复方案
- Set<String> disconnectedHostIps = new HashSet<>(Q.New(HostVO.class)
- .select(HostVO_.managementIp)
- .notEq(HostVO_.status, HostStatus.Connected)
- .listValues());
+ Set<String> disconnectedHostIps = new HashSet<>();
+ List<HostVO> disconnectedHosts = Q.New(HostVO.class)
+ .notEq(HostVO_.status, HostStatus.Connected)
+ .list();
+ for (HostVO host : disconnectedHosts) {
+ disconnectedHostIps.add(host.getManagementIp());
+ String extraIps = HostSystemTags.EXTRA_IPS.getTokenByResourceUuid(
+ host.getUuid(), HostSystemTags.EXTRA_IPS_TOKEN);
+ if (!Strings.isEmpty(extraIps)) {
+ disconnectedHostIps.addAll(Arrays.asList(extraIps.split(",")));
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageFactory.java`
around lines 1237 - 1256, The current stale-watcher filtering builds
disconnectedHostIps only from HostVO_.managementIp, missing additional addresses
stored in HostSystemTags.EXTRA_IPS; update the logic in
CephPrimaryStorageFactory where disconnectedHostIps is constructed so it also
queries and parses EXTRA_IPS for each non-Connected HostVO (split/tag value
parsing as done for Ceph Monitor disconnect handling), add those extra IPs into
the disconnectedHostIps set, and then continue to use that enriched set in the
existing loop that calls extractWatcherIp(watcher) to filter watchers so stale
watchers originating from EXTRA_IPS are correctly removed.
Root Cause
CephPrimaryStorageFactory.preInstantiateVmResource()performs a binary watcher check — if ANY ceph rbd watchers exist on a volume, VM start is blocked to prevent split-brain. However, when libvirt hangs on a host, stale watchers from that host's QEMU process persist in ceph even after the host becomes disconnected. These stale watchers block VM start on other healthy hosts.Solution
After retrieving watchers via
rbd status, filter out stale watchers from disconnected hosts before making the block decision:status != Connected(disconnected/connecting hosts)rbd statusoutput format (watcher=IP:port/nonce client.ID cookie=COOKIE)Safety: watchers from unknown IPs (not matching any known host) are conservatively kept to preserve the anti-split-brain protection.
Testing
Compile verified:
mvn compile -pl plugin/ceph -am -Dmaven.test.skipSUCCESS.Logic review: filtering only removes watchers from known-disconnected hosts; connected host watchers and unknown IP watchers still block VM start (preserving anti-split-brain safety).
Jira
Resolves: ZSTAC-73476
sync from gitlab !9210