Skip to content

Conversation

@qweeah
Copy link
Contributor

@qweeah qweeah commented Dec 24, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:

This PR adds support for service-account-based image pull authentication (aka Identity Binding) from Azure Container Registry (ACR), implementing KEP-4412 projected service account tokens for kubelet image credential providers.

Changes

Data Model

Added ServiceAccountImagePullProfile to SecurityProfile with fields:

  • Enabled: Enable/disable service account-based image pull
  • DefaultClientID: Default managed identity client ID
  • DefaultTenantID: Default managed identity tenant ID
  • LocalAuthoritySNI: SNI endpoint for Identity Bindings Local Authority

Added getter methods to SecurityProfile for null-safe access.

Implementation Paths

1. Legacy CSE (Template-based)

  • pkg/agent/variables.go: Template variables for CSE scripts (using the naming convention serviceAccountImagePull...)
  • parts/linux/cloud-init/artifacts/cse_cmd.sh: Environment variable declarations (e.g., SERVICE_ACCOUNT_IMAGE_PULL_ENABLED, SERVICE_ACCOUNT_IMAGE_PULL_DEFAULT_CLIENT_ID)
  • parts/linux/cloud-init/artifacts/cse_config.sh: Credential provider config generation

2. AKSNodeConfig (Proto-based)

  • aks-node-controller/proto/aksnodeconfig/v1/security_profile.proto: Proto definitions (ServiceAccountImagePullProfile)
  • aks-node-controller/parser/parser.go: Environment variable generation
  • aks-node-controller/parser/helper.go: Null-safe helper functions

Both paths converge at cse_config.sh to generate /var/lib/kubelet/credential-provider-config.yaml with identity binding arguments (--ib-default-client-id, --ib-default-tenant-id, --ib-sni-name) and token attributes.

Testing

  • Unit Tests: Updated tests across variables_test.go, datamodel/types_test.go, and parser/helper_test.go to reflect the new naming convention.
  • E2E Tests: Verified scenarios (enabled, disabled, network isolated, Azure Linux V3).
  • Test Data: Regenerated pkg/agent/testdata (117 files) to reflect the new SERVICE_ACCOUNT_IMAGE_PULL_ environment variables.

All tests validate that the credential provider config file contains the correct identity binding configuration.

Cluster Support

  • Public cloud (Azure Commercial)
  • AKS Custom Cloud
  • Network Isolated (NI/airgap) clusters

Which issue(s) this PR fixes:

Fixes #

Requirements:

  • uses conventional commit messages
  • includes documentation
  • adds unit tests
  • adds e2e tests
  • tested upgrade from previous version
  • commits are GPG signed and Github marks them as verified

Special notes for your reviewer:

Dual implementation approach (legacy CSE + modern AKSNodeConfig) for backward compatibility. Both paths are fully tested and generate identical credential provider configuration. The renaming from ImagePullIdentity to ServiceAccountImagePull ensures consistency with the upstream feature name.

Release note:

AgentBaker now supports service-account-based image pull authentication from Azure Container Registry (ACR) via ServiceAccountImagePullProfile in SecurityProfile, using projected service account tokens (KEP-4412) for authentication.

@qweeah qweeah changed the title feat: Add ImagePullIdentityProfile to SecurityProfile for identity binding-based image pull feat: add ImagePullIdentityProfile for identity binding-based image pull Dec 24, 2025
@qweeah qweeah marked this pull request as ready for review December 29, 2025 00:35
@norshtein
Copy link
Member

This PR's logic LGTM. But I don't quiet understand what is step 2 "RP integration and configuration flow". Agentbaker won't rely on anything from AKS RP, I think you could make all changes in the same PR and add E2E for it? Here is an example PR: #7059 .

@github-actions
Copy link
Contributor

github-actions bot commented Dec 29, 2025

The latest Buf updates on your PR. Results from workflow Buf CI / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJan 7, 2026, 10:26 AM

@qweeah qweeah changed the title feat: add ImagePullIdentityProfile for identity binding-based image pull feat: add ServiceAccountImagePullProfile for identity-binding-based image pull Jan 7, 2026
@qweeah qweeah force-pushed the qweeah/ibip branch 2 times, most recently from 1a47af0 to 67be361 Compare January 7, 2026 03:59
@github-actions github-actions bot added the components This pull request updates cached components on Linux or Windows VHDs label Jan 7, 2026
type SecurityProfile struct {
PrivateEgress *PrivateEgress `json:"privateEgress,omitempty"`
PrivateEgress *PrivateEgress `json:"privateEgress,omitempty"`
ServiceAccountImagePullProfile *ServiceAccountImagePullProfile `json:"serviceAccountImagePullProfile,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very sure, but this profile belongs to ContainerService.Properties, it's not very common(?) to directly edit this struct, we typically append NodeBootstrappingConfiguration for feature implementation. Node sig reviewers may have more insight into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the comment here mentioning that Properties stands for the AKS cluster definition. Therefore, I thought that AKS-related settings ought to be placed in properties?

- kubernetes.azure.com/acr-client-id"
IB_ARGS="
- --ib-sni-name=${IDENTITY_BINDINGS_LOCAL_AUTHORITY_SNI}
- --ib-default-client-id=${SERVICE_ACCOUNT_IMAGE_PULL_DEFAULT_CLIENT_ID}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameter, requires specific version credential provider binary. We need to ensure new version credential provider binary is present when these parameters are used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I also have a question, for old VHDs which does not have new credential provider binary, this feature shall not be available. Where can we verify and guard this? CC @cameronmeissner do you have any insight into this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial idea is to have $SERVICE_ACCOUNT_IMAGE_PULL_ENABLED completely indicate credential provider readiness by:

  1. Allowing this feature to be enabled only for k8s version >= v1.35 (as specified in PRD) and enforcing the version verification in RP
  2. Ensuring that the credential provider released for 1.35 supports this flag

qweeah added 2 commits January 7, 2026 10:20
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

components This pull request updates cached components on Linux or Windows VHDs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants