🐛 Add teardown on revision archival#2502
🐛 Add teardown on revision archival#2502perdasilva wants to merge 1 commit intooperator-framework:mainfrom
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
01760b0 to
06afd4a
Compare
There was a problem hiding this comment.
Pull request overview
Updates ClusterExtensionRevision teardown behavior to actively tear down Boxcutter-managed resources when a revision is archived, and extends test coverage/data to validate resource removal across bundle versions.
Changes:
- Invoke
RevisionEngine.Teardownduring teardown for archived revisions, with retry + requeue when teardown is incomplete. - Expand deletion/archival reconciliation tests to cover incomplete teardown and teardown failures (including expected requeue behavior).
- Add a dummy ConfigMap manifest to the v1.0.0 test bundle to validate removal of resources that disappear in later bundle versions.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
internal/operator-controller/controllers/clusterextensionrevision_controller.go |
Calls Boxcutter teardown for archived revisions and requeues while teardown is incomplete. |
internal/operator-controller/controllers/clusterextensionrevision_controller_test.go |
Adds test cases for archived teardown retry/error paths and asserts reconcile results. |
testdata/images/bundles/test-operator/v1.0.0/manifests/dummy.configmap.yaml |
Introduces a version-specific manifest used to test removal of resources across upgrades. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextensionrevision_controller_test.go
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextensionrevision_controller.go
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2502 +/- ##
==========================================
+ Coverage 69.46% 73.22% +3.76%
==========================================
Files 103 103
Lines 8509 8524 +15
==========================================
+ Hits 5911 6242 +331
+ Misses 2129 1801 -328
- Partials 469 481 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
06afd4a to
5ca3f6e
Compare
5ca3f6e to
6e1e134
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextensionrevision_controller.go
Outdated
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextensionrevision_controller.go
Outdated
Show resolved
Hide resolved
6e1e134 to
7c6037b
Compare
7c6037b to
b51648b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pedjak
left a comment
There was a problem hiding this comment.
what if two revisions are sharing the same resource? how do ensure that we do not delete something accidentally?
| setRetryingConditions(rev, err.Error()) | ||
| return ctrl.Result{}, fmt.Errorf("converting to boxcutter revision: %v", err) | ||
| } | ||
|
|
There was a problem hiding this comment.
why do we need to move this further down?
There was a problem hiding this comment.
depends on whether we want to remove revision creation from the deletion path or not. Since we want the garbage collector to handle deletion, it could avoid transient (or maybe even persistent) errors from getting in the way. If so, it might also make sense to move it closer to where revision is first needed
| } | ||
|
|
||
| func (c *ClusterExtensionRevisionReconciler) archive(ctx context.Context, cer *ocv1.ClusterExtensionRevision) (machinery.RevisionTeardownResult, error) { | ||
| revision, _, err := c.toBoxcutterRevision(ctx, cer) |
There was a problem hiding this comment.
why do we convert this multiple times, the caller can pass it?
There was a problem hiding this comment.
it was a suggestion by copilot to remove revision creation from the deletion path
| if err != nil { | ||
| return nil, fmt.Errorf("converting to boxcutter revision: %v", err) | ||
| } | ||
| revisionEngine, err := c.RevisionEngineFactory.CreateRevisionEngine(ctx, cer) |
There was a problem hiding this comment.
revision engine is also already available at the caller side?
There was a problem hiding this comment.
I think this is a similar argument as with creating the revision. It's not needed in the deletion path. Maybe we could just separate the deletion and archival paths further. Then we only repeat finalizer removal.
Actually, what do you think about removing the owner ref from archived revisions? They would still carry the owner labels we apply, so they are still attached. But, it gives a cleaner kubectl lineage output XD and might avoid CE reconciliations (although we don't really expect any changes to archived revisions...)
There was a problem hiding this comment.
The revision engine also has a dependency on the service account, which might be deleted together with the revision. So, it's probably safer to not have revision engine creation on the deletion path.
test/e2e/features/update.feature
Outdated
| And ClusterExtensionRevision "${NAME}-2" reports Available as True with Reason ProbesSucceeded | ||
| And ClusterExtensionRevision "${NAME}-1" is archived | ||
| # dummy-config map exists only in v1.0.0 - once the v1.0.0 revision is archived, it should be gone from the cluster | ||
| And resource "configmap/dummy-configmap" is eventually not found |
There was a problem hiding this comment.
should we check if other resources are there?
There was a problem hiding this comment.
should we move that check to a more general upgrade test?
There was a problem hiding this comment.
I mean, there is overlap between the resources. So, some of the v1.0.0 resources would still be present.
There was a problem hiding this comment.
I mean, there is overlap between the resources. So, some of the v1.0.0 resources would still be present.
right, should have steps that checks for that explicitly in this test?
There was a problem hiding this comment.
Claude and I updated things to check that the revision-1 resources either don't exist or are no longer in the object's ownerrefs
b51648b to
fdcc6d9
Compare
fdcc6d9 to
9105af6
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| func (c *ClusterExtensionRevisionReconciler) reconcile(ctx context.Context, rev *ocv1.ClusterExtensionRevision) (ctrl.Result, error) { | ||
| func (c *ClusterExtensionRevisionReconciler) reconcile(ctx context.Context, cer *ocv1.ClusterExtensionRevision) (ctrl.Result, error) { |
There was a problem hiding this comment.
is really needed in this PR to rename arg from rev to cer? It makes the diff bigger than it needs to be.
There was a problem hiding this comment.
I feel it reads better, especially when you have two parallel revision objects. It makes it easier to know that you're operating over the CER vs the Boxcutter revision. It might even be worth going forward and renaming revision to boxcutterRevision or bcRev, or something like that.
There was a problem hiding this comment.
If it would be helpful, I could revert it for this PR and do a follow up with just that change
test/e2e/steps/hooks.go
Outdated
| metricsResponse map[string]string | ||
|
|
||
| extensionObjects []client.Object | ||
| extensionObjects [][]client.Object |
There was a problem hiding this comment.
can you document the semantic of indices?
There was a problem hiding this comment.
oh whoops - I was supposed to have removed that. My bad.
9105af6 to
b306567
Compare
b306567 to
d9f672a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
internal/operator-controller/controllers/clusterextensionrevision_controller_test.go:659
- The "teardown finalizer is removed" test case doesn’t set a
DeletionTimestamporLifecycleState=Archived, but the reconciler only removes the teardown finalizer indelete()(whenDeletionTimestampis set) or after archival teardown completes. As written, this test appears to be exercising the normal reconcile path and should keep the finalizer, so the assertion is likely incorrect.
Suggestion: set rev1.DeletionTimestamp in existingObjs() (to cover the delete path) or adjust the expectation to match the non-deletion reconcile behavior.
name: "teardown finalizer is removed",
revisionResult: mockRevisionResult{},
existingObjs: func() []client.Object {
ext := newTestClusterExtension()
rev1 := newTestClusterExtensionRevision(t, clusterExtensionRevisionName, ext, testScheme)
rev1.Finalizers = []string{
"olm.operatorframework.io/teardown",
}
return []client.Object{ext, rev1}
},
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if err := c.TrackingCache.Free(ctx, cer); err != nil { | ||
| markAsAvailableUnknown(cer, ocv1.ClusterExtensionRevisionReasonReconciling, err.Error()) | ||
| return ctrl.Result{}, fmt.Errorf("error stopping informers: %v", err) | ||
| } |
There was a problem hiding this comment.
reconcile() currently calls ensureFinalizer() unconditionally for non-deleted revisions, including revisions already in LifecycleState=Archived. Once an archived revision completes teardown and archive() removes the finalizer, the next reconcile will re-add the finalizer (via ensureFinalizer) and then remove it again in archive(), generating update events and potentially causing an infinite reconcile loop / API churn on archived revisions.
Suggestion: only add the teardown finalizer for non-archived (active) revisions, or gate ensureFinalizer so it does not re-add the finalizer when LifecycleState=Archived and teardown is already complete. Another option is to check controllerutil.ContainsFinalizer and if archived+no-finalizer, skip archive/teardown entirely and return.
| } | |
| } | |
| // If the teardown finalizer has already been removed, teardown/archive is complete. | |
| // In that case, skip calling archive() to avoid unnecessary updates and reconcile churn. | |
| if !controllerutil.ContainsFinalizer(cer, clusterExtensionRevisionTeardownFinalizer) { | |
| return ctrl.Result{}, nil | |
| } |
There was a problem hiding this comment.
I've restructured things to only ensureFinalizer in the Active code path
| obj := phaseObj.Object | ||
| kind := obj.GetKind() | ||
| name := obj.GetName() | ||
| namespace := obj.GetNamespace() | ||
|
|
||
| waitFor(ctx, func() bool { | ||
| args := []string{"get", kind, name, "--ignore-not-found", "-o", "json"} | ||
| if namespace != "" { | ||
| args = append(args, "-n", namespace) | ||
| } | ||
| out, err := k8sClient(args...) |
There was a problem hiding this comment.
In ClusterExtensionRevisionObjectsNotFoundOrNotOwned, phase objects may omit metadata.namespace (common for namespaced manifests; the controller applies them into spec.namespace/install namespace). When namespace == "", this step runs kubectl get <kind> <name> without -n, which will query the kubectl context’s default namespace and can incorrectly report the object as "not found" even if it exists in ${TEST_NAMESPACE}.
Suggestion: when namespace is empty, resolve the correct namespace (e.g., default to the scenario’s ${TEST_NAMESPACE} / install namespace annotation on the CER), and also handle truly cluster-scoped objects by falling back to a non-namespaced get if the namespaced get fails due to scope.
There was a problem hiding this comment.
I think this should be fine as the revision resources should always contain the namespace they are to be placed in - and they won't necessarily always be the install namespace
When a ClusterExtensionRevision transitions to the Archived lifecycle state, invoke the revision engine's Teardown method to clean up managed resources that are no longer part of the active revision. This ensures resources removed between bundle versions (e.g. a ConfigMap present in v1.0.0 but absent in v1.2.0) are deleted from the cluster. Changes: - Split teardown() into delete() (CER deletion) and archive() (CER archival); only archive() calls revisionEngine.Teardown() - Move RevisionEngine creation and watch establishment before the archive check so they are available for both archive and reconcile paths - Handle incomplete teardown (requeue after 5s) and teardown errors (return error for controller retry) - Rename rev variable to cer for consistency with the type name - Update unit tests: rename to ArchivalAndDeletion, add test cases for incomplete teardown requeue, teardown error propagation, and factory creation error during archived teardown - Add dummy-configmap to v1.0.0 test bundle (absent in v1.2.0) to validate teardown on archival in e2e tests - Add e2e step ClusterExtensionRevision phase objects are not found or not owned by the revision: fetches all objects from the CER's phases and asserts each one either does not exist or does not list the CER in its ownerReferences - Change extensionObjects in scenarioContext from []client.Object to [][]client.Object to track objects per revision rollout; add GetClusterExtensionObjectsForRevision helper - Replace specific configmap-not-found assertion in the update e2e scenario with the new generic phase-objects step Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com> Signed-off-by: Per G. da Silva <pegoncal@redhat.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Per G. da Silva <pegoncal@redhat.com>
d9f672a to
ea6fe54
Compare
Description
When a
ClusterExtensionRevisiontransitions to theArchivedlifecycle state, invoke the revision engine'sTeardownmethod to clean up managed resources that are no longer part of the active revision. This ensures resources removed between bundle versions (e.g. a ConfigMap present in v1.0.0 but absent in v1.2.0) are deleted from the cluster.teardown()intodelete()(CER deletion) andarchive()(CER archival); onlyarchive()callsrevisionEngine.Teardown()RevisionEnginecreation and watch establishment before the archive check so they are available for both archive and reconcile pathsrevvariable tocerfor consistency with the type nameClusterExtensionRevision "<name>" phase objects are not found or not owned by the revision: fetches all objects from the CER's phases and asserts each one either does not exist on the cluster or does not list the CER in its ownerReferencesextensionObjectsinscenarioContextfrom[]client.Objectto[][]client.Objectto track objects per revision rollout; addGetClusterExtensionObjectsForRevisionhelperconfigmap/dummy-configmapnot-found assertion in the update e2e scenario with the new generic phase-objects stepTest plan
Test_ClusterExtensionRevisionReconciler_Reconcile_ArchivalAndDeletion)🤖 Generated with Claude Code
Reviewer Checklist