Skip to content

Conversation

@hjensas
Copy link
Contributor

@hjensas hjensas commented Jan 28, 2026

Resources may not be immediately available in the API after oc apply completes, causing wait commands to fail with NotFound errors. This adds retry logic with 5 attempts and 3-second delays to handle transient errors during resource registration.

Assisted-By: Claude Code/claude-4.5-sonnet

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 28, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign eshulman2 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hjensas hjensas force-pushed the retry-wait-cmd branch 2 times, most recently from c4ad46a to cd9109c Compare January 28, 2026 13:12
evallesp
evallesp previously approved these changes Jan 29, 2026
_wait_cmd_result.stderr is defined and
not (_wait_cmd_result.stderr is search('no matching resources found', ignorecase=True) or
_wait_cmd_result.stderr is search('NotFound') or
_wait_cmd_result.stderr is search('timed out.*condition.*clusterserviceversions/openstack-operator', ignorecase=True) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we'll treat all actual wait timeouts as errors still, except for if waiting for the OpenStack operator to install times-out?

Copy link
Contributor Author

@hjensas hjensas Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a wait for csv, for example oc wait -n openstack-operators csv -l operators.coreos.com/openstack-operator.openstack-operators= --for jsonpath='{.status.phase}'=Succeeded --timeout=300s would end up having a retry on timeout.

I do however wonder if this is redundant here, because the operator's are installed via a separate tasks file roles/kustomize_deploy/tasks/install_operators.yml.

Let me update the patch, and remove that regex.

Resources may not be immediately available in the API after `oc apply`
completes, causing wait commands to fail with NotFound errors. This adds
retry logic with 5 attempts and 3-second delays to handle transient errors
during resource registration.

Assisted-By: Claude Code/claude-4.5-sonnet
Signed-off-by: Harald Jensås <hjensas@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants