-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Problem
The codebase uses alias= and validation_alias= in Pydantic Field() definitions inconsistently. These have different semantics:
alias— affects both serialization (model_dump(by_alias=True)) and validation (input parsing). The field serializes under the alias name, not the Python field name.validation_alias— affects only validation (input parsing). The field still serializes under its Python name.
Currently the choice between them appears accidental rather than intentional.
Where
src/apify/_configuration.py
This is the most problematic file. It mixes both patterns:
Fields using validation_alias=AliasChoices(...) (correct for env var parsing — multiple legacy names, no serialization impact):
actor_id,actor_run_id,default_dataset_id,default_key_value_store_id,default_request_queue_id,input_key,started_at,timeout_at,token,api_base_url, etc.
Fields using alias='...' (single env var, but also changes serialization name):
fact,is_at_home,proxy_hostname,proxy_password,proxy_port,proxy_status_url,max_paid_dataset_items,max_total_charge_usd,test_pay_per_event,meta_origin,metamorph_after_sleep,log_format,disable_outdated_warning,input_secrets_private_key_file,input_secrets_private_key_passphrase,charged_event_counts,actor_pricing_info, etc.
The consequence: config.model_dump(by_alias=True) would serialize is_at_home as "apify_is_at_home" but started_at as "started_at". This inconsistency also affects get_env() in _actor.py (line 816-827), which has to handle both paths with branching logic.
src/apify/_models.py, src/apify/events/_types.py, src/apify/storage_clients/_apify/_models.py
These consistently use alias= for camelCase mapping (e.g., Field(alias='memAvgBytes')). This is correct for API response/request models that need round-trip serialization with camelCase keys.
Suggested approach
-
Configuration fields — For fields that only need env-var-based input parsing, switch from
alias=tovalidation_alias=. For single-alias fields,validation_alias='env_var_name'is sufficient (noAliasChoicesneeded when there's only one name). -
API models — Keep
alias=as-is. These models deserialize from and serialize to JSON with camelCase keys, soalias(affecting both directions) is the right choice. -
Document the convention — Add a brief comment or note in the codebase (e.g., in
CLAUDE.mdor as a module-level comment) stating:- Use
validation_aliasfor Configuration fields (env var parsing only) - Use
aliasfor API/event models (camelCase round-trip serialization)
- Use
-
Review
get_env()— After unifying, the branching logic in_actor.py:820-827can potentially be simplified.
Context
Noticed during review of #797, which adds a new actor_storages field using alias='actor_storages_json'.