Summary
The SPARQL construct executor retries transient errors via pRetry, but isTransientError (packages/pipeline/src/sparql/executor.ts) only classifies network failures and HTTP 502/503/504 as transient. HTTP 429 (Too Many Requests) is not retried, so a rate-limited endpoint drops the stage on the first 429 instead of backing off and trying again.
Why
429 is the standard, unambiguous signal that the client should slow down and retry — it is transient by definition. Unlike a 403 (which usually means a permanent “forbidden”), a 429 carries no risk of retrying something that can never succeed. Adding it to the transient set is low-risk and clearly correct.
This came up while harvesting a dataset whose endpoint rate-limits a burst of heavy analysis queries: the first one or two stage queries succeed, then the endpoint blocks the rest. The endpoint in that specific case returned a generic 403 rather than a 429 (handled separately), but any endpoint that signals rate-limiting the standard way (429) should be retried rather than dropped.
Proposed change
In isTransientError, treat HTTP 429 as transient alongside 502/503/504:
return status === 429 || status === 502 || status === 503 || status === 504;
Consider giving 429 a longer/jittered backoff than the 5xx path, since the server is explicitly asking us to slow down rather than reporting a momentary glitch. Respecting the Retry-After header is tracked separately.
Scope / non-goals
- This does not address 403 responses that mask rate-limiting — those are ambiguous (usually a genuine permanent deny) and need a more conservative approach.
- It also does not add request throttling to avoid tripping limiters in the first place; that is a separate, complementary improvement.
Summary
The SPARQL construct executor retries transient errors via
pRetry, butisTransientError(packages/pipeline/src/sparql/executor.ts) only classifies network failures and HTTP 502/503/504 as transient. HTTP 429 (Too Many Requests) is not retried, so a rate-limited endpoint drops the stage on the first 429 instead of backing off and trying again.Why
429 is the standard, unambiguous signal that the client should slow down and retry — it is transient by definition. Unlike a 403 (which usually means a permanent “forbidden”), a 429 carries no risk of retrying something that can never succeed. Adding it to the transient set is low-risk and clearly correct.
This came up while harvesting a dataset whose endpoint rate-limits a burst of heavy analysis queries: the first one or two stage queries succeed, then the endpoint blocks the rest. The endpoint in that specific case returned a generic 403 rather than a 429 (handled separately), but any endpoint that signals rate-limiting the standard way (429) should be retried rather than dropped.
Proposed change
In
isTransientError, treat HTTP 429 as transient alongside 502/503/504:Consider giving 429 a longer/jittered backoff than the 5xx path, since the server is explicitly asking us to slow down rather than reporting a momentary glitch. Respecting the
Retry-Afterheader is tracked separately.Scope / non-goals