Skip to content

Comments

feat: improve status on team operator resources#94

Open
ian-flores wants to merge 14 commits intomainfrom
improve-status-fields
Open

feat: improve status on team operator resources#94
ian-flores wants to merge 14 commits intomainfrom
improve-status-fields

Conversation

@ian-flores
Copy link
Collaborator

@ian-flores ian-flores commented Feb 19, 2026

Summary

  • Add Kubernetes-native conditions-based status reporting across all product CRDs
  • Add CommonProductStatus type with Conditions, ObservedGeneration, and Version fields, embedded in all status structs including PostgresDatabase
  • Add internal/status helper package with SetReady(), SetProgressing(), IsReady(), ExtractVersion() and reason constants (ReasonReconcileComplete, ReasonDeploymentReady/NotReady, ReasonStatefulSetReady/NotReady, ReasonAllComponentsReady, ReasonComponentsNotReady, ReasonDatabaseReady)
  • Add deployment health monitoring to Connect, Workbench, PackageManager, Flightdeck controllers and statefulset health monitoring to Chronicle
  • Progressing condition correctly reflects rollout state: True during active rollouts, False with ReasonReconcileComplete when done
  • Add conditions to PostgresDatabase for database provisioning success/failure tracking
  • Add aggregate Site status with per-component readiness bools (ConnectReady, WorkbenchReady, etc.) and Owns() watches on child CRs
  • Add Ready, Version, and Age printcolumns to all CRDs for kubectl get output
  • Migrate status updates from Status().Update() to Status().Patch() with MergeFrom to avoid conflicts
  • Mark existing Ready bool field as +optional for patch compatibility; value now derived from the Ready condition
  • Update fake test client with WithStatusSubresource for all CRD types
  • ExtractVersion handles registry ports (e.g., localhost:5000/image:v1) and digest references (image:tag@sha256:...)

Closes #25

Test plan

  • just build passes
  • just mgenerate regenerates CRDs and deepcopy without errors
  • just helm-generate keeps Helm chart in sync
  • go fmt produces no changes
  • Flightdeck controller tests pass with status patching
  • Deploy to test cluster and verify kubectl get connects shows READY/VERSION/AGE columns
  • Verify conditions populate correctly on reconciliation
  • Verify Site aggregate status reflects child component health
  • Verify status reverts to not-ready when deployments become unhealthy

@claude
Copy link

claude bot commented Feb 19, 2026

Claude finished @ian-flores's task —— View job


Review: feat: improve status on team operator resources

Overall this is a well-structured change that follows Kubernetes conventions for conditions-based status reporting. The CommonProductStatus type, the internal/status helper package, and the migration from Status().Update() to Status().Patch() with MergeFrom are all solid patterns. Here are findings organized by priority:


Critical: ExtractVersion breaks on registry ports

internal/status/status.go:58-71

ExtractVersion uses strings.LastIndex(image, ":") which will incorrectly parse image references that include a registry port. For example, localhost:5000/myimage:v1.0 would extract 5000/myimage after the first :, not v1.0.

Fix by isolating the last path segment before checking for the tag:

func ExtractVersion(image string) string {
    if idx := strings.LastIndex(image, "@"); idx != -1 {
        image = image[:idx]
    }
    lastSlash := strings.LastIndex(image, "/")
    nameTag := image
    if lastSlash != -1 {
        nameTag = image[lastSlash+1:]
    }
    if idx := strings.LastIndex(nameTag, ":"); idx != -1 {
        tag := nameTag[idx+1:]
        if tag == "latest" {
            return ""
        }
        return tag
    }
    return ""
}

Important: Site aggregate readiness assumes all 5 components must exist

internal/controller/core/site_controller.go:97

allReady := s.Status.ConnectReady && s.Status.WorkbenchReady && s.Status.PackageManagerReady && s.Status.ChronicleReady && s.Status.FlightdeckReady

If a Site doesn't configure all five components, the missing child CRs won't be found, defaulting to false, so the Site can never become Ready. Consider making the aggregate check aware of which components are actually enabled in the Site spec.


Important: Repeated error-path status boilerplate across all controllers

Every controller has 5+ instances of this pattern:

status.SetReady(&c.Status.Conditions, c.Generation, metav1.ConditionFalse, status.ReasonReconcileError, err.Error())
status.SetProgressing(&c.Status.Conditions, c.Generation, metav1.ConditionFalse, status.ReasonReconcileError, err.Error())
_ = r.Status().Patch(ctx, c, patchBase)

Consider extracting a helper to reduce the boilerplate, e.g.:

func PatchErrorStatus(ctx context.Context, statusWriter client.StatusWriter, obj client.Object, patchBase client.Patch, conditions *[]metav1.Condition, generation int64, err error) {
    SetReady(conditions, generation, metav1.ConditionFalse, ReasonReconcileError, err.Error())
    SetProgressing(conditions, generation, metav1.ConditionFalse, ReasonReconcileError, err.Error())
    _ = statusWriter.Patch(ctx, obj, patchBase)
}

Suggestion: Inconsistent JSONPath quoting in printcolumn annotations

Some CRDs use single quotes ('Ready') and some use double quotes ("Ready") in the JSONPath filter expression. Both work, but pick one style for consistency across all CRDs. Connect, PackageManager, PostgresDatabase, and Site use 'Ready'; Chronicle, Flightdeck, and Workbench use "Ready".


Suggestion: DeploymentReady reason used for StatefulSet

internal/controller/core/chronicle_controller.go:139-140

Chronicle uses a StatefulSet but the condition reason is ReasonDeploymentReady / ReasonDeploymentNotReady. Consider adding ReasonStatefulSetReady or generalizing to ReasonReplicasReady.


Suggestion: Inline reason string for PostgresDatabase

internal/controller/core/postgresdatabase_controller.go:99

"DatabaseReady" is used as an inline string while all other reason values are centralized as constants in internal/status/status.go. Add it as a constant there for consistency.


Suggestion: Stale TODO in PackageManager

internal/controller/core/package_manager.go:221-222

The comment // TODO: Verify if PackageManager has an Image field is stale -- pm.Spec.Image is used unconditionally on the happy path. Remove the TODO.


Suggestion: PostgresDatabase duplicates Conditions/ObservedGeneration

api/core/v1beta1/postgresdatabase_types.go:51-62

PostgresDatabaseStatus defines its own Conditions and ObservedGeneration fields instead of embedding CommonProductStatus. This works but could drift over time. Consider embedding CommonProductStatus and simply not populating Version.


Suggestion: No tests for internal/status package

The internal/status package has no test file. Given the parsing logic in ExtractVersion (and the registry-port bug above), unit tests are warranted.


Suggestion: Document child-name-lookup assumption

internal/controller/core/site_controller.go:433

aggregateChildStatus uses site.Name as the object key for every child CR. A brief comment documenting this naming-convention assumption would help future maintainers.


Checklist (per review guidelines)

API Changes:

  • Kubebuilder annotations are correct
  • New fields have sensible defaults (all +optional)
  • Backward compatible (existing Ready bool field kept)
  • JSONPath quoting consistency (minor)

Controller Changes:

  • Reconciliation is idempotent
  • Error handling reports status correctly
  • Migration from Update to Patch with MergeFrom is correct
  • Unit/integration tests exist (no tests for internal/status)

CRDs:

  • CRDs are up to date (regenerated)
  • PrintColumns added correctly

|

@ian-flores
Copy link
Collaborator Author

Tested on ganso01-staging

Deployed adhoc image ghcr.io/posit-dev/team-operator:adhoc-improve-status-fields-v1.12.0-16-g7c8d4b5 to ganso01-staging and verified end-to-end.

kubectl get printcolumns

=== Connects ===
NAME   READY   VERSION                AGE
alt    True    ubuntu2204-2026.01.1   16d
main   True    ubuntu2204-2026.01.1   16d

=== Workbenches ===
NAME   READY   VERSION                AGE
alt    False   ubuntu2204-2026.01.0   16d
main   True    ubuntu2204-2026.01.0   16d

=== PackageManagers ===
NAME   READY   VERSION   AGE
alt    False             16d
main                     16d

=== Chronicles ===
NAME   READY   VERSION     AGE
alt    False               16d
main   True    2026.01.0   16d

=== Flightdecks ===
NAME   READY   VERSION   AGE
alt    True              16d
main   True              16d

=== Sites ===
NAME   READY   AGE
alt    False   16d
main   False   16d

=== PostgresDatabases ===
NAME                  READY   AGE
alt-connect                   16d
main-connect                  16d

Condition detail (Connect main)

status:
  conditions:
  - lastTransitionTime: "2026-02-19T22:09:21Z"
    message: Reconciliation complete
    observedGeneration: 2
    reason: ReconcileComplete
    status: "False"
    type: Progressing
  - lastTransitionTime: "2026-02-19T22:09:21Z"
    message: Deployment has minimum availability
    observedGeneration: 2
    reason: DeploymentReady
    status: "True"
    type: Ready
  keySecretRef: {}
  observedGeneration: 2
  ready: true
  version: ubuntu2204-2026.01.1

Observations

  • Printcolumns (READY, VERSION, AGE) working across all CRDs
  • Conditions correctly reflect deployment/statefulset health
  • Progressing=False with ReasonReconcileComplete on healthy resources
  • Site Ready=False is correct — alt site has unhealthy children (workbench, packagemanager, chronicle)
  • Version extracted from image tags where available
  • Backward-compatible ready bool derived from Ready condition
  • PostgresDatabases haven't been re-reconciled yet so conditions are empty — they'll populate on next spec change

@ian-flores ian-flores marked this pull request as ready for review February 19, 2026 22:18
@timtalbot
Copy link
Contributor

Nice! It looks like the comment Site aggregate readiness assumes all 5 components must exist is still true? Will we address that after #93?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Status on the Team Operator Resources

2 participants