Skip to content

Commit 701185e

Browse files
authored
fix: raise NotFoundError on ambiguous 404 responses (#755)
Follow-up to #737 (comment) `.get()` previously collapsed every 404 into `None` via `catch_not_found_or_throw`. That works for direct, ID-identified fetches (`client.dataset(id).get()`), where a 404 unambiguously means the named resource is missing. It's misleading whenever a 404 is ambiguous — the client can't tell which resource in the path is actually gone, and silently returning `None` hides the cause. This PR identifies three categories where a 404 is ambiguous and propagates `NotFoundError` instead: 1. **Chained calls without an ID** (`run.dataset()`, `run.key_value_store()`, `run.request_queue()`, `run.log()`) — a 404 could mean either the parent run is missing or the default sub-resource. Covered by the base `_get` / `_delete` via a `resource_id is None → raise` guard, so `.delete()` on a chained client also raises now; direct `dataset(id).delete()` keeps its idempotent-DELETE semantics. 2. **Chained `LogClient`** — `run.log().get()` / `.get_as_bytes()` / `.stream()` raise; direct `client.log(id).get()` still returns `None`. 3. **Singleton sub-path endpoints** — `ScheduleClient.get_log`, `TaskClient.get_input`, `DatasetClient.get_statistics`, `UserClient.monthly_usage`, `UserClient.limits`, `WebhookClient.test`. These hit a fixed path (`/.../{id}/log`, `/.../{id}/input`, etc.) where a 404 effectively always means the parent is missing. Return types moved from `T | None` to `T`. Record-by-key lookups (`KeyValueStoreClient.get_record(key)`, `RequestQueueClient.get_request(request_id)`) keep the existing "None on missing" behavior — the 404 is specifically about the record/request, which is the natural meaning. A shared helper `catch_not_found_for_resource_or_throw(exc, resource_id)` in `_utils.py` centralizes the `resource_id is None → raise` pattern across all 10 call sites. The v3 upgrade guide documents the new semantics. Sync/async tests cover every code path, including `client.actor('id').last_run().dataset().get()` (happy path + ambiguous-404 case).
1 parent 0995ca0 commit 701185e

10 files changed

Lines changed: 414 additions & 212 deletions

File tree

docs/04_upgrading/upgrading_to_v3.mdx

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -218,11 +218,36 @@ except RateLimitError:
218218
...
219219
```
220220

221-
### Behavior change: `.get()` now returns `None` on any 404
221+
### Behavior change: 404 on ambiguous endpoints now raises `NotFoundError`
222222

223-
As a consequence of the dispatch above, `.get()`-style convenience methods — which use `catch_not_found_or_throw` internally to swallow 404 responses and return `None`now swallow **every** 404, regardless of the `error.type` string in the response body. Previously only 404 responses carrying the types `record-not-found` or `record-or-token-not-found` were swallowed; any other 404 was re-raised as `ApifyApiError`.
223+
Direct, ID-identified fetches like `client.dataset(id).get()` or `client.run(id).get()` continue to swallow 404 into `None`a 404 there unambiguously means the named resource does not exist. Similarly, `.delete()` on an ID-identified client keeps its idempotent behavior (404 is silently swallowed).
224224

225-
In practice this matters only if you relied on a `.get()` call raising for a 404 with an unusual error type — such cases now return `None` instead. If your code needs to distinguish between "resource missing" and "404 with an unexpected type", inspect `.type` on the returned response or catch <ApiLink to="class/NotFoundError">`NotFoundError`</ApiLink> from non-`.get()` calls that do not use `catch_not_found_or_throw`.
225+
For calls where a 404 is *ambiguous*, the client now propagates <ApiLink to="class/NotFoundError">`NotFoundError`</ApiLink> instead of returning `None` / silently succeeding. Three categories of endpoints are affected:
226+
227+
1. **Chained calls that target a default sub-resource without an ID**`run.dataset()`, `run.key_value_store()`, `run.request_queue()`, `run.log()`. A 404 here could mean the parent run is missing OR the default sub-resource is missing, and the API body does not disambiguate. Applies to both `.get()` and `.delete()`.
228+
2. **`.get()` / `.get_as_bytes()` / `.stream()` on a chained `LogClient`** — e.g. `run.log().get()`. Direct `client.log(build_or_run_id).get()` still returns `None` on 404.
229+
3. **Singleton sub-resource endpoints fetched via a fixed path** — <ApiLink to="class/ScheduleClient#get_log">`ScheduleClient.get_log()`</ApiLink>, <ApiLink to="class/TaskClient#get_input">`TaskClient.get_input()`</ApiLink>, <ApiLink to="class/DatasetClient#get_statistics">`DatasetClient.get_statistics()`</ApiLink>, <ApiLink to="class/UserClient#monthly_usage">`UserClient.monthly_usage()`</ApiLink>, <ApiLink to="class/UserClient#limits">`UserClient.limits()`</ApiLink>, <ApiLink to="class/WebhookClient#test">`WebhookClient.test()`</ApiLink>. These hit paths like `/schedules/{id}/log` or `/actor-tasks/{id}/input`, so a 404 effectively means the parent is missing. Return types moved from `T | None` to `T`.
230+
231+
```python
232+
from apify_client import ApifyClient
233+
from apify_client.errors import NotFoundError
234+
235+
client = ApifyClient(token='MY-APIFY-TOKEN')
236+
237+
try:
238+
dataset = client.run('some-run-id').dataset().get()
239+
except NotFoundError:
240+
# Previously this returned `None`; now you must handle it explicitly.
241+
dataset = None
242+
243+
try:
244+
schedule_log = client.schedule('some-schedule-id').get_log()
245+
except NotFoundError:
246+
# `get_log()` previously returned `None` when the schedule was missing; now it raises.
247+
schedule_log = None
248+
```
249+
250+
Direct `.get()` also now swallows every 404 regardless of the `error.type` string in the response body (previously only `record-not-found` and `record-or-token-not-found` types were swallowed). If your code needs to distinguish between "resource missing" and "404 with an unexpected type", inspect `.type` on a caught <ApiLink to="class/NotFoundError">`NotFoundError`</ApiLink> from a non-`.get()` call path.
226251

227252
## Snake_case `sort_by` values on `actors().list()`
228253

src/apify_client/_resource_clients/_resource_client.py

Lines changed: 31 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,13 @@
1010
from apify_client._docs import docs_group
1111
from apify_client._logging import WithLogDetailsClient
1212
from apify_client._types import ActorJobResponse
13-
from apify_client._utils import catch_not_found_or_throw, response_to_dict, to_safe_id, to_seconds
13+
from apify_client._utils import (
14+
catch_not_found_for_resource_or_throw,
15+
catch_not_found_or_throw,
16+
response_to_dict,
17+
to_safe_id,
18+
to_seconds,
19+
)
1420
from apify_client.errors import ApifyApiError
1521

1622
if TYPE_CHECKING:
@@ -194,7 +200,11 @@ def __init__(
194200
)
195201

196202
def _get(self, *, timeout: Timeout) -> dict | None:
197-
"""Perform a GET request for this resource, returning the parsed response or None if not found."""
203+
"""Perform a GET request for this resource, returning the parsed response or None if not found.
204+
205+
404s collapse to `None` only for ID-identified clients. Chained clients without a `resource_id`
206+
(e.g. `run.dataset()`) propagate `NotFoundError` — see `catch_not_found_for_resource_or_throw`.
207+
"""
198208
try:
199209
response = self._http_client.call(
200210
url=self._build_url(),
@@ -204,7 +214,7 @@ def _get(self, *, timeout: Timeout) -> dict | None:
204214
)
205215
return response_to_dict(response)
206216
except ApifyApiError as exc:
207-
catch_not_found_or_throw(exc)
217+
catch_not_found_for_resource_or_throw(exc, self._resource_id)
208218
return None
209219

210220
def _update(self, *, timeout: Timeout, **kwargs: Any) -> dict:
@@ -219,7 +229,11 @@ def _update(self, *, timeout: Timeout, **kwargs: Any) -> dict:
219229
return response_to_dict(response)
220230

221231
def _delete(self, *, timeout: Timeout) -> None:
222-
"""Perform a DELETE request to delete this resource, ignoring 404 errors."""
232+
"""Perform a DELETE request to delete this resource.
233+
234+
404s are swallowed (idempotent DELETE) only for ID-identified clients. Chained clients without a
235+
`resource_id` propagate `NotFoundError` — see `catch_not_found_for_resource_or_throw`.
236+
"""
223237
try:
224238
self._http_client.call(
225239
url=self._build_url(),
@@ -228,7 +242,7 @@ def _delete(self, *, timeout: Timeout) -> None:
228242
timeout=timeout,
229243
)
230244
except ApifyApiError as exc:
231-
catch_not_found_or_throw(exc)
245+
catch_not_found_for_resource_or_throw(exc, self._resource_id)
232246

233247
def _list(self, *, timeout: Timeout, **kwargs: Any) -> dict:
234248
"""Perform a GET request to list resources."""
@@ -374,7 +388,11 @@ def __init__(
374388
)
375389

376390
async def _get(self, *, timeout: Timeout) -> dict | None:
377-
"""Perform a GET request for this resource, returning the parsed response or None if not found."""
391+
"""Perform a GET request for this resource, returning the parsed response or None if not found.
392+
393+
404s collapse to `None` only for ID-identified clients. Chained clients without a `resource_id`
394+
(e.g. `run.dataset()`) propagate `NotFoundError` — see `catch_not_found_for_resource_or_throw`.
395+
"""
378396
try:
379397
response = await self._http_client.call(
380398
url=self._build_url(),
@@ -384,7 +402,7 @@ async def _get(self, *, timeout: Timeout) -> dict | None:
384402
)
385403
return response_to_dict(response)
386404
except ApifyApiError as exc:
387-
catch_not_found_or_throw(exc)
405+
catch_not_found_for_resource_or_throw(exc, self._resource_id)
388406
return None
389407

390408
async def _update(self, *, timeout: Timeout, **kwargs: Any) -> dict:
@@ -399,7 +417,11 @@ async def _update(self, *, timeout: Timeout, **kwargs: Any) -> dict:
399417
return response_to_dict(response)
400418

401419
async def _delete(self, *, timeout: Timeout) -> None:
402-
"""Perform a DELETE request to delete this resource, ignoring 404 errors."""
420+
"""Perform a DELETE request to delete this resource.
421+
422+
404s are swallowed (idempotent DELETE) only for ID-identified clients. Chained clients without a
423+
`resource_id` propagate `NotFoundError` — see `catch_not_found_for_resource_or_throw`.
424+
"""
403425
try:
404426
await self._http_client.call(
405427
url=self._build_url(),
@@ -408,7 +430,7 @@ async def _delete(self, *, timeout: Timeout) -> None:
408430
timeout=timeout,
409431
)
410432
except ApifyApiError as exc:
411-
catch_not_found_or_throw(exc)
433+
catch_not_found_for_resource_or_throw(exc, self._resource_id)
412434

413435
async def _list(self, *, timeout: Timeout, **kwargs: Any) -> dict:
414436
"""Perform a GET request to list resources."""

src/apify_client/_resource_clients/dataset.py

Lines changed: 26 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,10 @@
1010
from apify_client._models import Dataset, DatasetResponse, DatasetStatistics, DatasetStatisticsResponse
1111
from apify_client._resource_clients._resource_client import ResourceClient, ResourceClientAsync
1212
from apify_client._utils import (
13-
catch_not_found_or_throw,
1413
create_storage_content_signature,
1514
response_to_dict,
1615
response_to_list,
1716
)
18-
from apify_client.errors import ApifyApiError
1917

2018
if TYPE_CHECKING:
2119
from collections.abc import AsyncIterator, Iterator
@@ -628,7 +626,7 @@ def push_items(self, items: JsonSerializable, *, timeout: Timeout = 'medium') ->
628626
timeout=timeout,
629627
)
630628

631-
def get_statistics(self, *, timeout: Timeout = 'short') -> DatasetStatistics | None:
629+
def get_statistics(self, *, timeout: Timeout = 'short') -> DatasetStatistics:
632630
"""Get the dataset statistics.
633631
634632
https://docs.apify.com/api/v2#tag/DatasetsStatistics/operation/dataset_statistics_get
@@ -637,21 +635,19 @@ def get_statistics(self, *, timeout: Timeout = 'short') -> DatasetStatistics | N
637635
timeout: Timeout for the API HTTP request.
638636
639637
Returns:
640-
The dataset statistics or None if the dataset does not exist.
641-
"""
642-
try:
643-
response = self._http_client.call(
644-
url=self._build_url('statistics'),
645-
method='GET',
646-
params=self._build_params(),
647-
timeout=timeout,
648-
)
649-
result = response_to_dict(response)
650-
return DatasetStatisticsResponse.model_validate(result).data
651-
except ApifyApiError as exc:
652-
catch_not_found_or_throw(exc)
638+
The dataset statistics.
653639
654-
return None
640+
Raises:
641+
NotFoundError: If the dataset does not exist.
642+
"""
643+
response = self._http_client.call(
644+
url=self._build_url('statistics'),
645+
method='GET',
646+
params=self._build_params(),
647+
timeout=timeout,
648+
)
649+
result = response_to_dict(response)
650+
return DatasetStatisticsResponse.model_validate(result).data
655651

656652
def create_items_public_url(
657653
self,
@@ -1208,7 +1204,7 @@ async def push_items(self, items: JsonSerializable, *, timeout: Timeout = 'mediu
12081204
timeout=timeout,
12091205
)
12101206

1211-
async def get_statistics(self, *, timeout: Timeout = 'short') -> DatasetStatistics | None:
1207+
async def get_statistics(self, *, timeout: Timeout = 'short') -> DatasetStatistics:
12121208
"""Get the dataset statistics.
12131209
12141210
https://docs.apify.com/api/v2#tag/DatasetsStatistics/operation/dataset_statistics_get
@@ -1217,21 +1213,19 @@ async def get_statistics(self, *, timeout: Timeout = 'short') -> DatasetStatisti
12171213
timeout: Timeout for the API HTTP request.
12181214
12191215
Returns:
1220-
The dataset statistics or None if the dataset does not exist.
1221-
"""
1222-
try:
1223-
response = await self._http_client.call(
1224-
url=self._build_url('statistics'),
1225-
method='GET',
1226-
params=self._build_params(),
1227-
timeout=timeout,
1228-
)
1229-
result = response_to_dict(response)
1230-
return DatasetStatisticsResponse.model_validate(result).data
1231-
except ApifyApiError as exc:
1232-
catch_not_found_or_throw(exc)
1216+
The dataset statistics.
12331217
1234-
return None
1218+
Raises:
1219+
NotFoundError: If the dataset does not exist.
1220+
"""
1221+
response = await self._http_client.call(
1222+
url=self._build_url('statistics'),
1223+
method='GET',
1224+
params=self._build_params(),
1225+
timeout=timeout,
1226+
)
1227+
result = response_to_dict(response)
1228+
return DatasetStatisticsResponse.model_validate(result).data
12351229

12361230
async def create_items_public_url(
12371231
self,

src/apify_client/_resource_clients/log.py

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
from apify_client._docs import docs_group
77
from apify_client._resource_clients._resource_client import ResourceClient, ResourceClientAsync
8-
from apify_client._utils import catch_not_found_or_throw
8+
from apify_client._utils import catch_not_found_for_resource_or_throw
99
from apify_client.errors import ApifyApiError
1010

1111
if TYPE_CHECKING:
@@ -39,6 +39,9 @@ def get(self, *, raw: bool = False, timeout: Timeout = 'long') -> str | None:
3939
4040
https://docs.apify.com/api/v2#/reference/logs/log/get-log
4141
42+
404s collapse to `None` only when this client targets a specific log by ID (e.g. `client.log(run_id).get()`).
43+
For chained clients without a `resource_id` (e.g. `run.log().get()`), a 404 is ambiguous and propagates.
44+
4245
Args:
4346
raw: If true, the log will include formatting. For example, coloring character sequences.
4447
timeout: Timeout for the API HTTP request.
@@ -57,7 +60,7 @@ def get(self, *, raw: bool = False, timeout: Timeout = 'long') -> str | None:
5760
return response.text # noqa: TRY300
5861

5962
except ApifyApiError as exc:
60-
catch_not_found_or_throw(exc)
63+
catch_not_found_for_resource_or_throw(exc, self._resource_id)
6164

6265
return None
6366

@@ -84,7 +87,7 @@ def get_as_bytes(self, *, raw: bool = False, timeout: Timeout = 'long') -> bytes
8487
return response.content # noqa: TRY300
8588

8689
except ApifyApiError as exc:
87-
catch_not_found_or_throw(exc)
90+
catch_not_found_for_resource_or_throw(exc, self._resource_id)
8891

8992
return None
9093

@@ -113,7 +116,7 @@ def stream(self, *, raw: bool = False, timeout: Timeout = 'long') -> Iterator[Ht
113116

114117
yield response
115118
except ApifyApiError as exc:
116-
catch_not_found_or_throw(exc)
119+
catch_not_found_for_resource_or_throw(exc, self._resource_id)
117120
yield None
118121
finally:
119122
if response:
@@ -144,6 +147,9 @@ async def get(self, *, raw: bool = False, timeout: Timeout = 'long') -> str | No
144147
145148
https://docs.apify.com/api/v2#/reference/logs/log/get-log
146149
150+
404s collapse to `None` only when this client targets a specific log by ID (e.g. `client.log(run_id).get()`).
151+
For chained clients without a `resource_id` (e.g. `run.log().get()`), a 404 is ambiguous and propagates.
152+
147153
Args:
148154
raw: If true, the log will include formatting. For example, coloring character sequences.
149155
timeout: Timeout for the API HTTP request.
@@ -162,7 +168,7 @@ async def get(self, *, raw: bool = False, timeout: Timeout = 'long') -> str | No
162168
return response.text # noqa: TRY300
163169

164170
except ApifyApiError as exc:
165-
catch_not_found_or_throw(exc)
171+
catch_not_found_for_resource_or_throw(exc, self._resource_id)
166172

167173
return None
168174

@@ -189,7 +195,7 @@ async def get_as_bytes(self, *, raw: bool = False, timeout: Timeout = 'long') ->
189195
return response.content # noqa: TRY300
190196

191197
except ApifyApiError as exc:
192-
catch_not_found_or_throw(exc)
198+
catch_not_found_for_resource_or_throw(exc, self._resource_id)
193199

194200
return None
195201

@@ -218,7 +224,7 @@ async def stream(self, *, raw: bool = False, timeout: Timeout = 'long') -> Async
218224

219225
yield response
220226
except ApifyApiError as exc:
221-
catch_not_found_or_throw(exc)
227+
catch_not_found_for_resource_or_throw(exc, self._resource_id)
222228
yield None
223229
finally:
224230
if response:

0 commit comments

Comments
 (0)