-
Notifications
You must be signed in to change notification settings - Fork 22
feat: Support Actor schema storages with Alias mechanism #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
c4adb74
19113e7
b12e27e
fd0716c
3b36459
72c2f35
b7604cb
ec6e071
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -262,3 +262,41 @@ async def _get_default_kvs_client(configuration: Configuration) -> KeyValueStore | |||||
| raise ValueError("'Configuration.default_key_value_store_id' must be set.") | ||||||
|
|
||||||
| return apify_client_async.key_value_store(key_value_store_id=configuration.default_key_value_store_id) | ||||||
|
|
||||||
| @classmethod | ||||||
| async def register_aliases(cls, configuration: Configuration) -> None: | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
# register_aliases — no error handling:
existing_mapping = ((await client.get_record(...)) or {'value': {}}).get('value', {})
await client.set_record(cls._ALIAS_MAPPING_KEY, existing_mapping)Consider wrapping in try/except with a warning, consistent with |
||||||
| """Load alias mapping from configuration to the default kvs.""" | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||
| if configuration.actor_storages is None: | ||||||
| return | ||||||
|
|
||||||
| configuration_mapping = {} | ||||||
|
|
||||||
| if configuration.default_dataset_id != configuration.actor_storages.datasets.get('default'): | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Conflict check is only for datasets
if configuration.default_dataset_id != configuration.actor_storages.datasets.get('default'):
logger.warning(...)
# No similar check for KVS or RQ |
||||||
| logger.warning( | ||||||
| f'Conflicting default dataset ids: {configuration.default_dataset_id=},' | ||||||
| f" {configuration.actor_storages.datasets.get('default')=}" | ||||||
| ) | ||||||
|
|
||||||
| for mapping, storage_type in ( | ||||||
| (configuration.actor_storages.key_value_stores, 'KeyValueStore'), | ||||||
| (configuration.actor_storages.datasets, 'Dataset'), | ||||||
| (configuration.actor_storages.request_queues, 'RequestQueue'), | ||||||
| ): | ||||||
| for storage_alias, storage_id in mapping.items(): | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Redundant
# Current — creates N instances just for the key:
for storage_alias, storage_id in mapping.items():
configuration_mapping[
cls(
storage_type=storage_type,
alias='__default__' if storage_alias == 'default' else storage_alias,
configuration=configuration,
)._storage_key
] = storage_id |
||||||
| configuration_mapping[ | ||||||
| cls( # noqa: SLF001# It is ok in own classmethod. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Malformed noqa comment
Suggested change
|
||||||
| storage_type=storage_type, | ||||||
| alias='__default__' if storage_alias == 'default' else storage_alias, | ||||||
| configuration=configuration, | ||||||
| )._storage_key | ||||||
| ] = storage_id | ||||||
|
|
||||||
| # Bulk update the mapping in the default KVS with the configuration mapping. | ||||||
| client = await cls._get_default_kvs_client(configuration=configuration) | ||||||
| existing_mapping = ((await client.get_record(cls._ALIAS_MAPPING_KEY)) or {'value': {}}).get('value', {}) | ||||||
|
|
||||||
| # Update the existing mapping with the configuration mapping. | ||||||
| existing_mapping.update(configuration_mapping) | ||||||
| # Store the updated mapping back in the KVS and in memory. | ||||||
| await client.set_record(cls._ALIAS_MAPPING_KEY, existing_mapping) | ||||||
|
Comment on lines
+295
to
+301
|
||||||
| cls._alias_map.update(existing_mapping) | ||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,24 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. spaces please |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "actorSpecification": 1, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "version": "0.0", | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "storages": { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "datasets": { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "default": { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "actorSpecification": 1, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "fields": { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "properties": { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "id": { "type": "string" } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| }, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "custom": { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "actorSpecification": 1, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "fields": { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "properties": { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "id": { "type": "string" } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+6
to
+20
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| "default": { | |
| "actorSpecification": 1, | |
| "fields": { | |
| "properties": { | |
| "id": { "type": "string" } | |
| } | |
| } | |
| }, | |
| "custom": { | |
| "actorSpecification": 1, | |
| "fields": { | |
| "properties": { | |
| "id": { "type": "string" } | |
| } | |
| } | |
| "default": { | |
| "actorSpecification": 1, | |
| "fields": { | |
| "properties": { | |
| "id": { "type": "string" } | |
| } | |
| } | |
| }, | |
| "custom": { | |
| "actorSpecification": 1, | |
| "fields": { | |
| "properties": { | |
| "id": { "type": "string" } | |
| } | |
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| from apify import Actor | ||
|
|
||
|
|
||
| async def main() -> None: | ||
| async with Actor: | ||
| assert Actor.configuration.actor_storages | ||
| assert (await Actor.open_dataset(alias='custom')).id == Actor.configuration.actor_storages.datasets['custom'] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| from __future__ import annotations | ||
|
|
||
| from pathlib import Path | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| if TYPE_CHECKING: | ||
| from ..conftest import MakeActorFunction, RunActorFunction | ||
|
|
||
| _ACTOR_SOURCE_DIR = Path(__file__).parent / 'actor_source' | ||
|
|
||
|
|
||
| def read_actor_source(filename: str) -> str: | ||
| return (_ACTOR_SOURCE_DIR / filename).read_text() | ||
|
|
||
|
|
||
| async def test_configuration_storages(make_actor: MakeActorFunction, run_actor: RunActorFunction) -> None: | ||
| actor = await make_actor( | ||
| label='schema_storages', | ||
| source_files={ | ||
| 'src/main.py': read_actor_source('main.py'), | ||
| '.actor/actor.json': read_actor_source('actor.json'), | ||
| }, | ||
| ) | ||
| run_result = await run_actor(actor) | ||
|
|
||
| assert run_result.status == 'SUCCEEDED' |
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,14 +1,17 @@ | ||||||||||||||
| from __future__ import annotations | ||||||||||||||
|
|
||||||||||||||
| import asyncio | ||||||||||||||
| from typing import cast | ||||||||||||||
|
|
||||||||||||||
| import pytest | ||||||||||||||
|
|
||||||||||||||
| from crawlee import service_locator | ||||||||||||||
| from crawlee.storages import Dataset, KeyValueStore, RequestQueue | ||||||||||||||
|
|
||||||||||||||
| from apify import Actor, Configuration | ||||||||||||||
| from apify._configuration import ActorStorages | ||||||||||||||
| from apify.storage_clients import ApifyStorageClient, MemoryStorageClient, SmartApifyStorageClient | ||||||||||||||
| from apify.storage_clients._apify._alias_resolving import AliasResolver | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| @pytest.mark.parametrize( | ||||||||||||||
|
|
@@ -125,3 +128,53 @@ async def test_actor_implicit_storage_init(apify_token: str) -> None: | |||||||||||||
| assert await Actor.open_dataset() is not await Actor.open_dataset(force_cloud=True) | ||||||||||||||
| assert await Actor.open_key_value_store() is not await Actor.open_key_value_store(force_cloud=True) | ||||||||||||||
| assert await Actor.open_request_queue() is not await Actor.open_request_queue(force_cloud=True) | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| async def test_actor_storages_alias_resolving(apify_token: str) -> None: | ||||||||||||||
| """Test that `AliasResolver.register_aliases` correctly updates default KVS with Actor storages.""" | ||||||||||||||
|
|
||||||||||||||
| # Actor storages | ||||||||||||||
| datasets = {'default': 'default_dataset_id', 'custom': 'custom_dataset_id'} | ||||||||||||||
| request_queues = {'default': 'default_dataset_id', 'custom': 'custom_dataset_id'} | ||||||||||||||
| key_value_stores = {'default': 'default_dataset_id', 'custom': 'custom_dataset_id'} | ||||||||||||||
|
Comment on lines
+137
to
+139
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
|
||||||||||||||
| # Set up the configuration and storage client for the test | ||||||||||||||
| configuration = Configuration( | ||||||||||||||
| default_key_value_store_id='default_kvs_id', | ||||||||||||||
| token=apify_token, | ||||||||||||||
| actor_storages=ActorStorages( | ||||||||||||||
| datasets=datasets, request_queues=request_queues, key_value_stores=key_value_stores | ||||||||||||||
| ), | ||||||||||||||
| ) | ||||||||||||||
| storage_client = ApifyStorageClient() | ||||||||||||||
| service_locator.set_configuration(configuration) | ||||||||||||||
| service_locator.set_storage_client(storage_client) | ||||||||||||||
|
|
||||||||||||||
| client_cache_key = cast('tuple', storage_client.get_storage_client_cache_key(configuration))[-1] | ||||||||||||||
|
||||||||||||||
| client_cache_key = cast('tuple', storage_client.get_storage_client_cache_key(configuration))[-1] | |
| cache_key = storage_client.get_storage_client_cache_key(configuration) | |
| client_cache_key = cache_key[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integration test doesn't clean up _alias_map
tests/integration/test_storages.py:176-180 — The finally block only drops the KVS but doesn't reset AliasResolver._alias_map. While the test isolation fixture does this between tests, it's good practice to clean up what you dirty — especially since _alias_map is a class variable that persists across the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change closes #762, but the issue description specifies the platform-provided env var name as
ACTOR_STORAGE_IDS(object withdatasets,keyValueStores,requestQueues). The new field only declaresalias='actor_storages_json'(envACTOR_STORAGES_JSON). If the platform actually usesACTOR_STORAGE_IDS, configuration loading will silently miss the mapping. Consider supportingACTOR_STORAGE_IDSviavalidation_alias=AliasChoices(...)(keeping backward compatibility ifACTOR_STORAGES_JSONis intentional).