Skip to content

feat: Make list methods of clients iterable#769

Open
Pijukatel wants to merge 1 commit intomasterfrom
make-list-methods-iterable
Open

feat: Make list methods of clients iterable#769
Pijukatel wants to merge 1 commit intomasterfrom
make-list-methods-iterable

Conversation

@Pijukatel
Copy link
Copy Markdown
Contributor

Description

  • All collection clients list method returns an iterator as well.
  • All async collection clients list method returns an async iterator as well.
  • List of modified clients (same for async clients):
    • ActorCollectionClient
    • BuildCollectionClient
    • RunCollectionClient
    • ScheduleCollectionClient
    • TaskCollectionClient
    • WebhookCollectionClient
    • WebhookDispatchCollectionClient
    • DatasetCollectionClient
    • KeyValueStoreCollectionClient
    • RequestQueueCollectionClient
    • StoreCollectionClient
    • ActorEnvVarCollectionClient
    • ActorVersionCollectionClient
  • Additionally, the following storage-related list methods were modified to support iteration as well:
    • DatasetClient.list_items (and marking Dataset.iterate_items as deprecated)
    • KeyValueStoreClient.list_keys (and marking Dataset.iterate_items as deprecated)
    • RequestQueueClient.list_requests

Example usage

...
# Sync
datasets_client = ApifyClient(token='...').datasets()

# Same as before
list_page = datasets_client.list(...)

# New functionality
individual_items = [item for item in datasets_client.list(...)]

...
# Async
datasets_client = ApifyClientAsync(token='...').datasets()

# Same as before
list_page = await datasets_client.list(...)

# New functionality
individual_items = [item async for item in datasets_client.list(...)]

Issues

Testing

  • Unit tests
  • Manual API tests

Checklist

  • CI passed

@github-actions github-actions Bot added this to the 139th sprint - Tooling team milestone Apr 29, 2026
@github-actions github-actions Bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Apr 29, 2026
sort_by: Literal['created_at', 'last_run_started_at'] | None = 'created_at',
timeout: Timeout = 'medium',
) -> ListOfActors:
) -> IterablePageOfActors:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming parity with JS is not 100% as there it was possible to use generics, but the typing there allows more flexibility.

JS name:
PaginatedIterator<ActorCollectionListItem>

@Pijukatel Pijukatel force-pushed the make-list-methods-iterable branch from 775d3ca to fec16e9 Compare April 29, 2026 08:19
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 97.02602% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.92%. Comparing base (ff9817c) to head (fec16e9).

Files with missing lines Patch % Lines
src/apify_client/_pagination.py 81.81% 16 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master     #769    +/-   ##
========================================
  Coverage   95.92%   95.92%            
========================================
  Files          48       50     +2     
  Lines        5226     5600   +374     
========================================
+ Hits         5013     5372   +359     
- Misses        213      228    +15     
Flag Coverage Δ
integration 95.92% <97.02%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.



@dataclass
class IterablePageOfActorsAsync(AwaitablePage[ActorShort], AsyncIterableOf[ActorShort]): ...
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried different approaches with generics, but this turned out to be the best approach:

  • It truly describes all the attributes
  • It has both static safety, and it benefits from run-time validation when these dataclasses are created from Pydantic models
  • It has as little code duplication as possible due to the autogenerated composable init of dataclass
  • It gives a clear overview of which endpoints are "weird" - not having the normal pagination metadata

STORAGE_CLIENTS = DATASET_CLIENTS | RQ_CLIENTS | KVS_CLIENTS
ALL_CLIENTS = COLLECTION_CLIENTS | NO_OPTIONS_CLIENTS | STORAGE_CLIENTS

TEST_CASES = (
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test setup is somewhat complex and this is due to test code being nearly identical for each endpoint, but different endpoints not being consistent in which pagination it supports (different params and different pagination metadata).

Through this parameter definition, over 200 test cases is generated based on just a few actual tests


@pytest.fixture(autouse=True)
def _relax_item_validation() -> Any:
"""Relax only the element type of `items` on paginated list models for the test run.
Copy link
Copy Markdown
Contributor Author

@Pijukatel Pijukatel Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are only about pagination logic and constructing each item correctly is totally out of scope, so in all Pydantic models the items: list[whatever] validation is replaced by items: list[dict]

@Pijukatel Pijukatel requested review from janbuchar and vdusek April 29, 2026 08:37
@Pijukatel Pijukatel marked this pull request as ready for review April 29, 2026 08:38
Copy link
Copy Markdown
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I'm really not sure about merging pagination and iteration into a single list method. These are two distinct use cases - you either want a page (with metadata), or you want to stream items - and choosing the right method is a way to express that intent.

I would say this also goes against the single responsibility principle - the method now serves two different concerns, which makes the API worse in terms of DX.

If we look beyond the interface and go to the implementation, it becomes even worse. The current approach introduces a 14 IterablePageOf... classes plus mixins just to make the typing work. The return type carries both responsibilities; the async variant is no longer a coroutine but a lazy awaitable-iterable wrapper, and each resource client ends up using a closure-over-callback pattern (where there used to be two simple lines before).

Could we keep list() for pagination and iterate() for iteration? It's a more intuitive interface; callers don't have to learn the quirks of a hybrid object, it will be more understandable for type checkers, and the implementation would be significantly simpler without this mess.

import IterateItemsSyncExample from '!!raw-loader!./code/08_iterate_items_sync.py';


Most methods named `list` or `list_something` in the Apify client return a <ApiLink to="class/ListPage">`ListPage`</ApiLink> object. This object provides a consistent interface for working with paginated data and includes the following properties:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still have ListPage? Could you ask Claude to also review and update the docs?

@Pijukatel
Copy link
Copy Markdown
Contributor Author

Could we keep list() for pagination and iterate() for iteration? It's a more intuitive interface; callers don't have to learn the quirks of a hybrid object, it will be more understandable for type checkers, and the implementation would be significantly simpler without this mess.

If we agree on having it split into two methods in Python client, then it will be way simpler and cleaner internally.
@janbuchar , what do you think?

@janbuchar
Copy link
Copy Markdown
Contributor

janbuchar commented Apr 29, 2026

Could we keep list() for pagination and iterate() for iteration? It's a more intuitive interface; callers don't have to learn the quirks of a hybrid object, it will be more understandable for type checkers, and the implementation would be significantly simpler without this mess.

If we agree on having it split into two methods in Python client, then it will be way simpler and cleaner internally. @janbuchar , what do you think?

Well, in https://github.com/apify/apify-client-js, we have a list (or listItems for dataset items) method that returns a PaginatedIterator. This object can either be await-ed to receive a single page of the results or async-iterated to process all items in all pages.

Our current doctrine is pretty much "1:1 parity unless we have a clearly superior approach that we want to eventually use in the other flavor as well". Also, we mostly tend to sacrifice maintainability if it means that the API will become easier to use.

Thus the one thing to consider from @vdusek response is if returning a "polymorphic" object that is both an async iterable and a simple promise is strictly worse DX than having two distinct methods for each list-shaped endpoint. If it is, then we should also decide if this is something that we'd like to adjust in the JS version or if there is a compelling argument for letting the JS version and the Python version diverge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants