Skip to content

Audit and unify usage of alias vs validation_alias in Pydantic models #807

@vdusek

Description

@vdusek

Problem

The codebase uses alias= and validation_alias= in Pydantic Field() definitions inconsistently. These have different semantics:

  • alias — affects both serialization (model_dump(by_alias=True)) and validation (input parsing). The field serializes under the alias name, not the Python field name.
  • validation_alias — affects only validation (input parsing). The field still serializes under its Python name.

Currently the choice between them appears accidental rather than intentional.

Where

src/apify/_configuration.py

This is the most problematic file. It mixes both patterns:

Fields using validation_alias=AliasChoices(...) (correct for env var parsing — multiple legacy names, no serialization impact):

  • actor_id, actor_run_id, default_dataset_id, default_key_value_store_id, default_request_queue_id, input_key, started_at, timeout_at, token, api_base_url, etc.

Fields using alias='...' (single env var, but also changes serialization name):

  • fact, is_at_home, proxy_hostname, proxy_password, proxy_port, proxy_status_url, max_paid_dataset_items, max_total_charge_usd, test_pay_per_event, meta_origin, metamorph_after_sleep, log_format, disable_outdated_warning, input_secrets_private_key_file, input_secrets_private_key_passphrase, charged_event_counts, actor_pricing_info, etc.

The consequence: config.model_dump(by_alias=True) would serialize is_at_home as "apify_is_at_home" but started_at as "started_at". This inconsistency also affects get_env() in _actor.py (line 816-827), which has to handle both paths with branching logic.

src/apify/_models.py, src/apify/events/_types.py, src/apify/storage_clients/_apify/_models.py

These consistently use alias= for camelCase mapping (e.g., Field(alias='memAvgBytes')). This is correct for API response/request models that need round-trip serialization with camelCase keys.

Suggested approach

  1. Configuration fields — For fields that only need env-var-based input parsing, switch from alias= to validation_alias=. For single-alias fields, validation_alias='env_var_name' is sufficient (no AliasChoices needed when there's only one name).

  2. API models — Keep alias= as-is. These models deserialize from and serialize to JSON with camelCase keys, so alias (affecting both directions) is the right choice.

  3. Document the convention — Add a brief comment or note in the codebase (e.g., in CLAUDE.md or as a module-level comment) stating:

    • Use validation_alias for Configuration fields (env var parsing only)
    • Use alias for API/event models (camelCase round-trip serialization)
  4. Review get_env() — After unifying, the branching logic in _actor.py:820-827 can potentially be simplified.

Context

Noticed during review of #797, which adds a new actor_storages field using alias='actor_storages_json'.

Metadata

Metadata

Assignees

No one assigned

    Labels

    t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions