Skip to content

Conversation

@stuggi
Copy link
Contributor

@stuggi stuggi commented Jan 29, 2026

Add support for pausing OpenStackControlPlane deployment after infrastructure creation. This enables a user to perform an action like, database restore before creating OpenStack services. This is useful for backup/restore scenarios where databases need to be restored to empty infrastructure before services initialize fresh schemas.

conditions.go:

  • Add OpenStackControlPlaneInfrastructureReadyCondition type
  • Add condition messages (Init, Ready, Running, Error, Waiting, Paused)
  • Add OpenStackControlPlaneInfrastructureReadyWaitingMessage to show blocking components
  • Add OpenStackControlPlaneInfrastructureReadyPausedMessage for infrastructure-only mode
  • Infrastructure includes: CAs, DNSMasq, RabbitMQ, Galera, Memcached, OVN

openstackcontrolplane_types.go:

  • Add DeploymentStageAnnotation constant ("core.openstack.org/deployment-stage")
  • Add DeploymentStageInfrastructureOnly constant ("infrastructure-only")
  • Add InfrastructureReady condition to InitConditions()

Enhanced infrastructure status reporting:

  • isInfrastructureReady(): Returns ready status AND list of not-ready components
    • Always checks: CAs (no enabled flag)
    • Conditionally checks: DNS, RabbitMQ, Galera, Memcached, OVN (only if enabled)
    • Returns which components are blocking when infrastructure not ready
    • Fixes test failures when OVN or other components are disabled
  • InfrastructureReady condition shows detailed waiting message:
    • "Infrastructure in progress - waiting for: RabbitMQs, Galeras"
    • Makes debugging deployment issues much easier

Ready condition handling in infrastructure-only mode:

  • Defer block now checks for infrastructure-only mode
  • When infrastructure-only AND infrastructure ready:
    • Mirror InfrastructureReady pause message to Ready condition (as False)
    • Prevents service conditions (Unknown/Init) from leaking into Ready
  • When infrastructure not ready OR normal mode:
    • Use default mirror behavior (first not-ready condition)
  • Result: Ready condition shows appropriate message at each stage

Staged deployment logic:

  • Move OVN reconciliation to infrastructure section (before services)
  • Check deployment-stage annotation after infrastructure reconciliation
  • When annotation = "infrastructure-only":
    • Set InfrastructureReady condition with pause message
    • Set Ready condition to False with pause message (via defer block)
    • Return early (skip service reconciliation)
    • Message: "Infrastructure ready - deployment paused. Remove annotation to resume deployment of OpenStack services"
  • When annotation not set (normal deployment):
    • Set InfrastructureReady condition with standard message
    • Continue with full service reconciliation
    • Message: "Infrastructure ready"
  • When infrastructure still deploying:
    • Set InfrastructureReady = False/Requested
    • Message: "Infrastructure in progress - waiting for: "

Kuttl test for staged deployment:

  • New test: test/kuttl/tests/ctlplane-staged-deployment/
  • Validates full workflow:
    1. Deploy with infrastructure-only annotation
    2. Assert infrastructure ready, services Unknown, Ready shows pause message
    3. Remove annotation
    4. Assert full controlplane reaches Ready
  • Tests the pause/resume cycle for backup/restore scenarios

Update all kuttl test assertions to expect InfrastructureReady condition:

  • common/assert-sample-deployment.yaml
  • ctlplane-basic-deployment/03-assert-deploy-custom-cacert.yaml
  • ctlplane-collapsed/01-assert-collapsed-cell.yaml
  • ctlplane-galera-3replicas/01-assert-galera-3replicas.yaml
  • ctlplane-tls-cert-rotation/00-assert-deploy-openstack.yaml
  • ctlplane-tls-cert-rotation/03-assert-new-certs.yaml
  • ctlplane-tls-custom-issuers/01-assert-deploy-openstack.yaml
  • ctlplane-tls-custom-issuers/09-assert-deploy-openstack.yaml
  • ctlplane-tls-custom-route/03-assert-deploy-openstack.yaml

This allows a workflow like this, which can be used for backup/restore:

  1. Apply OpenStackControlPlane CR with annotation: core.openstack.org/deployment-stage: infrastructure-only

  2. Wait for InfrastructureReady condition: oc wait --for=condition=InfrastructureReady openstackcontrolplane/openstack

  3. Restore databases (MariaDB, OVN) to empty infrastructure

  4. Restore RabbitMQ user credentials for EDPM compatibility

  5. Remove annotation to resume deployment: oc annotate openstackcontrolplane openstack core.openstack.org/deployment-stage-

  6. Services start with already-restored databases

Jira: OSPRH-25752

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/8b0af494d3c547e197f3fe193603b345

openstack-k8s-operators-content-provider FAILURE in 8m 39s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

Add support for pausing OpenStackControlPlane deployment after
infrastructure creation. This enables a user to perform an action like,
database restore before creating OpenStack services.
This is useful for backup/restore scenarios where databases need to be
restored to empty infrastructure before services initialize fresh schemas.

**conditions.go:**
- Add OpenStackControlPlaneInfrastructureReadyCondition type
- Add condition messages (Init, Ready, Running, Error, Waiting, Paused)
- Add OpenStackControlPlaneInfrastructureReadyWaitingMessage to show blocking components
- Add OpenStackControlPlaneInfrastructureReadyPausedMessage for infrastructure-only mode
- Infrastructure includes: CAs, DNSMasq, RabbitMQ, Galera, Memcached, OVN

**openstackcontrolplane_types.go:**
- Add DeploymentStageAnnotation constant ("core.openstack.org/deployment-stage")
- Add DeploymentStageInfrastructureOnly constant ("infrastructure-only")
- Add InfrastructureReady condition to InitConditions()

**Enhanced infrastructure status reporting:**
- isInfrastructureReady(): Returns ready status AND list of not-ready components
  - Always checks: CAs (no enabled flag)
  - Conditionally checks: DNS, RabbitMQ, Galera, Memcached, OVN (only if enabled)
  - Returns which components are blocking when infrastructure not ready
  - Fixes test failures when OVN or other components are disabled
- InfrastructureReady condition shows detailed waiting message:
  - "Infrastructure in progress - waiting for: RabbitMQs, Galeras"
  - Makes debugging deployment issues much easier

**Ready condition handling in infrastructure-only mode:**
- Defer block now checks for infrastructure-only mode
- When infrastructure-only AND infrastructure ready:
  - Mirror InfrastructureReady pause message to Ready condition (as False)
  - Prevents service conditions (Unknown/Init) from leaking into Ready
- When infrastructure not ready OR normal mode:
  - Use default mirror behavior (first not-ready condition)
- Result: Ready condition shows appropriate message at each stage

**Staged deployment logic:**
- Move OVN reconciliation to infrastructure section (before services)
- Check deployment-stage annotation after infrastructure reconciliation
- When annotation = "infrastructure-only":
  - Set InfrastructureReady condition with pause message
  - Set Ready condition to False with pause message (via defer block)
  - Return early (skip service reconciliation)
  - Message: "Infrastructure ready - deployment paused. Remove annotation to resume deployment of OpenStack services"
- When annotation not set (normal deployment):
  - Set InfrastructureReady condition with standard message
  - Continue with full service reconciliation
  - Message: "Infrastructure ready"
- When infrastructure still deploying:
  - Set InfrastructureReady = False/Requested
  - Message: "Infrastructure in progress - waiting for: <components>"

**Kuttl test for staged deployment:**
- New test: test/kuttl/tests/ctlplane-staged-deployment/
- Validates full workflow:
  1. Deploy with infrastructure-only annotation
  2. Assert infrastructure ready, services Unknown, Ready shows pause message
  3. Remove annotation
  4. Assert full controlplane reaches Ready
- Tests the pause/resume cycle for backup/restore scenarios

Update all kuttl test assertions to expect InfrastructureReady condition:
- common/assert-sample-deployment.yaml
- ctlplane-basic-deployment/03-assert-deploy-custom-cacert.yaml
- ctlplane-collapsed/01-assert-collapsed-cell.yaml
- ctlplane-galera-3replicas/01-assert-galera-3replicas.yaml
- ctlplane-tls-cert-rotation/00-assert-deploy-openstack.yaml
- ctlplane-tls-cert-rotation/03-assert-new-certs.yaml
- ctlplane-tls-custom-issuers/01-assert-deploy-openstack.yaml
- ctlplane-tls-custom-issuers/09-assert-deploy-openstack.yaml
- ctlplane-tls-custom-route/03-assert-deploy-openstack.yaml

This allows a workflow like this, which can be used for backup/restore:

1. Apply OpenStackControlPlane CR with annotation:
   core.openstack.org/deployment-stage: infrastructure-only

2. Wait for InfrastructureReady condition:
   oc wait --for=condition=InfrastructureReady openstackcontrolplane/openstack

3. Restore databases (MariaDB, OVN) to empty infrastructure

4. Restore RabbitMQ user credentials for EDPM compatibility

5. Remove annotation to resume deployment:
   oc annotate openstackcontrolplane openstack core.openstack.org/deployment-stage-

6. Services start with already-restored databases

Jira: OSPRH-25752

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
@stuggi stuggi force-pushed the backup_restore_pause branch from 31812ba to 1a160db Compare January 29, 2026 17:04
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/46f2d4d97dc546199a60bc57abd32047

openstack-k8s-operators-content-provider FAILURE in 7m 31s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@stuggi
Copy link
Contributor Author

stuggi commented Jan 30, 2026

recheck

@stuggi
Copy link
Contributor Author

stuggi commented Jan 30, 2026

/retest

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c79127330ded44b4a0de90b89defc3ec

openstack-k8s-operators-content-provider FAILURE in 6m 54s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@stuggi
Copy link
Contributor Author

stuggi commented Jan 30, 2026

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 30, 2026

@stuggi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/openstack-operator-build-deploy-kuttl-4-18 1a160db link true /test openstack-operator-build-deploy-kuttl-4-18
ci/prow/openstack-operator-build-deploy-kuttl 1a160db link true /test openstack-operator-build-deploy-kuttl

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant