Failed to create RBD storage pool after KVM agent upgrade from 4.20 to 4.22: "org.libvirt.LibvirtException: failed to create the RBD IoCTX" #12154

tuanhoangth1603 · 2025-11-19T05:02:08Z

tuanhoangth1603
Nov 19, 2025

problem

After upgrading the KVM agent on a compute node from CloudStack 4.20 to 4.22, the agent fails to recreate or connect to the existing RBD storage pool. The error manifests in the agent logs as a LibvirtException during pool initialization, querying if the RBD pool exists (which it does on the Ceph cluster). This prevents the host from fully reconnecting and handling VM operations (e.g., volume attach/detach).
The issue appears tied to changes in libvirt (8.0+) or Ceph client libraries post-upgrade, causing IoCTX creation to fail due to temporary secret/cached state mismatches. Notably, a full reboot of the compute node resolves the issue immediately, allowing clean recreation of the pool and secret. However, this introduces unwanted downtime for running VMs on that node, which is unacceptable in production.

versions

Environment

CloudStack version: Management server upgraded to 4.22.0 (from 4.20.0)
Agent version: KVM agent upgraded from 4.20.0 to 4.22.0 on compute nodes
Hypervisor: KVM
Primary Storage: Ceph RBD (pool name: cloudstack-zone1; Ceph version: 14)
OS on compute nodes: Ubuntu 20.04

The steps to reproduce the bug

Upgrade mgmt to 4.22
upgrade agent to 4.22
log error from agent.log: Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to create the RBD IoCTX. Does the pool 'cloudstack-zone1' exist?
I also do these commands on CEPH but it's still error

# ceph config set mon auth_expose_insecure_global_id_reclaim false

# ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false

# ceph config set mon auth_allow_insecure_global_id_reclaim false

Expected Behavior
The agent should successfully redefine the RBD storage pool using the existing Ceph configuration (monitors, secrets) without failure, allowing seamless host reconnection post-upgrade.

Actual Behavior
Agent logs show repeated failures to create the RBD IoCTX, followed by cleanup of the libvirt secret. Host status remains "Disconnected" or "Alert" in UI until manual intervention. Full reboot of the compute node resolves the issue immediately (it's so bad solution)

weizhouapache · 2025-11-19T10:44:02Z

weizhouapache
Nov 19, 2025
Collaborator

@tuanhoangth1603
did you upgrade some other packages other than cloudstack-agent and cloudstack-common during upgrade ?

0 replies

tuanhoangth1603 · 2025-11-19T10:47:56Z

tuanhoangth1603
Nov 19, 2025
Author

@tuanhoangth1603 did you upgrade some other packages other than cloudstack-agent and cloudstack-common during upgrade ?

Hello,
When i ran apt upgrade cloudstack-agent, many others package were automatically installed like this

Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages were automatically installed and are no longer required:
  aria2 eatmydata libaria2-0 libeatmydata1 libssh2-1 python3-importlib-metadata python3-jinja2 python3-json-pointer python3-jsonpatch python3-jsonschema python3-markupsafe python3-more-itertools
  python3-pyrsistent python3-zipp
Use 'apt autoremove' to remove them.
The following security updates require Ubuntu Pro with 'esm-infra' enabled:
  libsoup-gnome2.4-1 libopenjp2-7 bind9-dnsutils libcups2 intel-microcode
  linux-headers-generic linux-libc-dev xserver-common libxml2-utils
  libpython3.8-dev gir1.2-soup-2.4 openssl libblockdev-swap2 xserver-xorg-core
  gir1.2-gdkpixbuf-2.0 libgdk-pixbuf2.0-0 libssh-4 libpython3.8-minimal
  libwbclient0 git-man libsystemd0 gcc-10-base linux-image-generic
  gstreamer1.0-plugins-good libsqlite3-0 python3-protobuf python3-urllib3
  libsnmp-base bind9-host libitm1 sudo python3-pip libpython3.8 python3.8
  open-vm-tools xserver-xorg-legacy git libblockdev-crypto2 udev
  gstreamer1.0-plugins-base libblockdev-loop2 libquadmath0 libnss-mymachines
  libblockdev-fs2 libblockdev-part2 openjdk-11-jre-headless python3-requests
  libgstreamer-plugins-good1.0-0 libudev1 libsoup2.4-1 gstreamer1.0-pulseaudio
  snmpd samba-libs xserver-xephyr libtiff5 udisks2 libsnmp35 libtsan0
  libubsan1 python3.8-minimal libgstreamer-gl1.0-0 systemd-sysv libblockdev2
  libpam-systemd libgstreamer-plugins-base1.0-0 xwayland gstreamer1.0-x
  liblsan0 systemd libgomp1 libgdk-pixbuf2.0-bin libblockdev-utils2
  libgdk-pixbuf2.0-common libsmbclient libmysqlclient21 libnss-systemd
  libblockdev-part-err2 libgcc-s1 libxml2 libpython3.8-stdlib libgnutls30
  libudisks2-0 systemd-container python3.8-dev libatomic1 libssl1.1 libcc1-0
  libprotobuf17 libstdc++6 linux-generic python-pip-whl bind9-libs
  gstreamer1.0-gl libxslt1.1
Learn more about Ubuntu Pro at https://ubuntu.com/pro
The following NEW packages will be installed:
  python3-packaging python3-pyparsing sysstat ubuntu-pro-client ubuntu-pro-client-l10n
The following packages will be upgraded:
  apport apt apt-utils cloudstack-agent cloudstack-common cryptsetup cryptsetup-bin cryptsetup-initramfs cryptsetup-run dirmngr distro-info-data dns-root-data e2fsprogs gnome-shell
  gnome-shell-common gnupg gnupg-l10n gnupg-utils gpg gpg-agent gpg-wks-client gpg-wks-server gpgconf gpgsm gpgv initramfs-tools initramfs-tools-bin initramfs-tools-core landscape-common
  libapt-pkg6.0 libcom-err2 libcryptsetup12 libext2fs2 libnss-mymachines libnss-systemd libpam-systemd libpcap0.8 libprotobuf-c1 libsmbclient libss2 libsystemd0 libudev1 libwbclient0 logsave
  open-iscsi pollinate python3-apport python3-problem-report python3-update-manager samba-libs snapd sosreport systemd systemd-container systemd-sysv thermald ubuntu-advantage-tools udev
  update-manager-core xfsprogs xserver-common xserver-xephyr xserver-xorg-core xserver-xorg-legacy xwayland
65 upgraded, 5 newly installed, 0 to remove and 0 not upgraded.
15 standard LTS security updates
Need to get 269 MB of archives.
After this operation, 11.0 MB of additional disk space will be used.
Do you want to continue? [Y/n]

0 replies

weizhouapache · 2025-11-19T11:03:14Z

weizhouapache
Nov 19, 2025
Collaborator

@tuanhoangth1603 did you upgrade some other packages other than cloudstack-agent and cloudstack-common during upgrade ?

Hello, When i ran apt upgrade cloudstack-agent, many others package were automatically installed like this

Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages were automatically installed and are no longer required:
  aria2 eatmydata libaria2-0 libeatmydata1 libssh2-1 python3-importlib-metadata python3-jinja2 python3-json-pointer python3-jsonpatch python3-jsonschema python3-markupsafe python3-more-itertools
  python3-pyrsistent python3-zipp
Use 'apt autoremove' to remove them.
The following security updates require Ubuntu Pro with 'esm-infra' enabled:
  libsoup-gnome2.4-1 libopenjp2-7 bind9-dnsutils libcups2 intel-microcode
  linux-headers-generic linux-libc-dev xserver-common libxml2-utils
  libpython3.8-dev gir1.2-soup-2.4 openssl libblockdev-swap2 xserver-xorg-core
  gir1.2-gdkpixbuf-2.0 libgdk-pixbuf2.0-0 libssh-4 libpython3.8-minimal
  libwbclient0 git-man libsystemd0 gcc-10-base linux-image-generic
  gstreamer1.0-plugins-good libsqlite3-0 python3-protobuf python3-urllib3
  libsnmp-base bind9-host libitm1 sudo python3-pip libpython3.8 python3.8
  open-vm-tools xserver-xorg-legacy git libblockdev-crypto2 udev
  gstreamer1.0-plugins-base libblockdev-loop2 libquadmath0 libnss-mymachines
  libblockdev-fs2 libblockdev-part2 openjdk-11-jre-headless python3-requests
  libgstreamer-plugins-good1.0-0 libudev1 libsoup2.4-1 gstreamer1.0-pulseaudio
  snmpd samba-libs xserver-xephyr libtiff5 udisks2 libsnmp35 libtsan0
  libubsan1 python3.8-minimal libgstreamer-gl1.0-0 systemd-sysv libblockdev2
  libpam-systemd libgstreamer-plugins-base1.0-0 xwayland gstreamer1.0-x
  liblsan0 systemd libgomp1 libgdk-pixbuf2.0-bin libblockdev-utils2
  libgdk-pixbuf2.0-common libsmbclient libmysqlclient21 libnss-systemd
  libblockdev-part-err2 libgcc-s1 libxml2 libpython3.8-stdlib libgnutls30
  libudisks2-0 systemd-container python3.8-dev libatomic1 libssl1.1 libcc1-0
  libprotobuf17 libstdc++6 linux-generic python-pip-whl bind9-libs
  gstreamer1.0-gl libxslt1.1
Learn more about Ubuntu Pro at https://ubuntu.com/pro
The following NEW packages will be installed:
  python3-packaging python3-pyparsing sysstat ubuntu-pro-client ubuntu-pro-client-l10n
The following packages will be upgraded:
  apport apt apt-utils cloudstack-agent cloudstack-common cryptsetup cryptsetup-bin cryptsetup-initramfs cryptsetup-run dirmngr distro-info-data dns-root-data e2fsprogs gnome-shell
  gnome-shell-common gnupg gnupg-l10n gnupg-utils gpg gpg-agent gpg-wks-client gpg-wks-server gpgconf gpgsm gpgv initramfs-tools initramfs-tools-bin initramfs-tools-core landscape-common
  libapt-pkg6.0 libcom-err2 libcryptsetup12 libext2fs2 libnss-mymachines libnss-systemd libpam-systemd libpcap0.8 libprotobuf-c1 libsmbclient libss2 libsystemd0 libudev1 libwbclient0 logsave
  open-iscsi pollinate python3-apport python3-problem-report python3-update-manager samba-libs snapd sosreport systemd systemd-container systemd-sysv thermald ubuntu-advantage-tools udev
  update-manager-core xfsprogs xserver-common xserver-xephyr xserver-xorg-core xserver-xorg-legacy xwayland
65 upgraded, 5 newly installed, 0 to remove and 0 not upgraded.
15 standard LTS security updates
Need to get 269 MB of archives.
After this operation, 11.0 MB of additional disk space will be used.
Do you want to continue? [Y/n]

@tuanhoangth1603
ok. ..
I normally use "apt install" instead of "apt upgrade"

0 replies

tuanhoangth1603 · 2025-11-19T11:10:44Z

tuanhoangth1603
Nov 19, 2025
Author

I'll try, thank you!

0 replies

weizhouapache · 2025-11-24T08:38:48Z

weizhouapache
Nov 24, 2025
Collaborator

@tuanhoangth1603
has the issure been resolved now ?

0 replies

tuanhoangth1603 · 2025-11-24T10:45:55Z

tuanhoangth1603
Nov 24, 2025
Author

@tuanhoangth1603 has the issure been resolved now ?

Yes, apt install cloudstack-agent work fine

0 replies

tuanhoangth1603 · 2025-11-26T16:01:50Z

tuanhoangth1603
Nov 26, 2025
Author

@weizhouapache Today I just discovered a strange issue as follows, which is related to this issue.
I had to correct a wrong slave interface name inside a bond. After editing the netplan YAML and running netplan apply, the bond came up correctly and network was fully restored and then I restart cloudstack-agent.
However, immediately after that, CloudStack permanently lost the Ceph RBD primary storage mount with this repeated agent error:

Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to create the RBD IoCTX. Does the pool 'cloudstack-prod' exist?: No such file or directory

and output of virsh pool-list --all showed the line:

error: Could not retrieve pool information

While others agent still connected normally so I think the issue is not on the Ceph side.
Interestingly, if I reboot the entire host, the problem disappears and the RBD pool is mounted normally again.
Maybe, does a simple netplan apply (which only briefly interrupts the network) permanently break libvirt’s ability to create the RBD pool, while a full host reboot fixes it?

1 reply

weizhouapache Nov 27, 2025
Collaborator

@weizhouapache Today I just discovered a strange issue as follows, which is related to this issue. I had to correct a wrong slave interface name inside a bond. After editing the netplan YAML and running netplan apply, the bond came up correctly and network was fully restored and then I restart cloudstack-agent. However, immediately after that, CloudStack permanently lost the Ceph RBD primary storage mount with this repeated agent error:
Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to create the RBD IoCTX. Does the pool 'cloudstack-prod' exist?: No such file or directory
and output of virsh pool-list --all showed the line:
error: Could not retrieve pool information
While others agent still connected normally so I think the issue is not on the Ceph side. Interestingly, if I reboot the entire host, the problem disappears and the RBD pool is mounted normally again. Maybe, does a simple netplan apply (which only briefly interrupts the network) permanently break libvirt’s ability to create the RBD pool, while a full host reboot fixes it?

hmm, when you change the network configuration, nobody knows what will happen unexpectedly ...
before that , you'd better move all vms away and remove all storage pools which are impacted by the network change

Failed to create RBD storage pool after KVM agent upgrade from 4.20 to 4.22: "org.libvirt.LibvirtException: failed to create the RBD IoCTX" #12154

Uh oh!

Uh oh!

tuanhoangth1603 Nov 19, 2025

problem

versions

The steps to reproduce the bug

Replies: 7 comments · 1 reply

Uh oh!

weizhouapache Nov 19, 2025 Collaborator

Uh oh!

Uh oh!

tuanhoangth1603 Nov 19, 2025 Author

Uh oh!

weizhouapache Nov 19, 2025 Collaborator

Uh oh!

tuanhoangth1603 Nov 19, 2025 Author

Uh oh!

weizhouapache Nov 24, 2025 Collaborator

Uh oh!

tuanhoangth1603 Nov 24, 2025 Author

Uh oh!

Uh oh!

tuanhoangth1603 Nov 26, 2025 Author

Uh oh!

weizhouapache Nov 27, 2025 Collaborator

tuanhoangth1603
Nov 19, 2025

Replies: 7 comments 1 reply

weizhouapache
Nov 19, 2025
Collaborator

tuanhoangth1603
Nov 19, 2025
Author

weizhouapache
Nov 19, 2025
Collaborator

tuanhoangth1603
Nov 19, 2025
Author

weizhouapache
Nov 24, 2025
Collaborator

tuanhoangth1603
Nov 24, 2025
Author

tuanhoangth1603
Nov 26, 2025
Author

weizhouapache Nov 27, 2025
Collaborator