Failed to create RBD storage pool after KVM agent upgrade from 4.20 to 4.22: "org.libvirt.LibvirtException: failed to create the RBD IoCTX" #12154
Replies: 7 comments 1 reply
-
|
@tuanhoangth1603 |
Beta Was this translation helpful? Give feedback.
-
Hello, |
Beta Was this translation helpful? Give feedback.
-
@tuanhoangth1603 |
Beta Was this translation helpful? Give feedback.
-
|
I'll try, thank you! |
Beta Was this translation helpful? Give feedback.
-
|
@tuanhoangth1603 |
Beta Was this translation helpful? Give feedback.
-
Yes, |
Beta Was this translation helpful? Give feedback.
-
|
@weizhouapache Today I just discovered a strange issue as follows, which is related to this issue. and output of virsh pool-list --all showed the line: While others agent still connected normally so I think the issue is not on the Ceph side. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
problem
After upgrading the KVM agent on a compute node from CloudStack 4.20 to 4.22, the agent fails to recreate or connect to the existing RBD storage pool. The error manifests in the agent logs as a LibvirtException during pool initialization, querying if the RBD pool exists (which it does on the Ceph cluster). This prevents the host from fully reconnecting and handling VM operations (e.g., volume attach/detach).
The issue appears tied to changes in libvirt (8.0+) or Ceph client libraries post-upgrade, causing IoCTX creation to fail due to temporary secret/cached state mismatches. Notably, a full reboot of the compute node resolves the issue immediately, allowing clean recreation of the pool and secret. However, this introduces unwanted downtime for running VMs on that node, which is unacceptable in production.
versions
Environment
CloudStack version: Management server upgraded to 4.22.0 (from 4.20.0)
Agent version: KVM agent upgraded from 4.20.0 to 4.22.0 on compute nodes
Hypervisor: KVM
Primary Storage: Ceph RBD (pool name: cloudstack-zone1; Ceph version: 14)
OS on compute nodes: Ubuntu 20.04
The steps to reproduce the bug
I also do these commands on CEPH but it's still error
Expected Behavior
The agent should successfully redefine the RBD storage pool using the existing Ceph configuration (monitors, secrets) without failure, allowing seamless host reconnection post-upgrade.
Actual Behavior
Agent logs show repeated failures to create the RBD IoCTX, followed by cleanup of the libvirt secret. Host status remains "Disconnected" or "Alert" in UI until manual intervention. Full reboot of the compute node resolves the issue immediately (it's so bad solution)
Beta Was this translation helpful? Give feedback.
All reactions