feat: implement async scheduling admission control#661
Conversation
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Resolve the second review pass over plans/645 by making the Markdown spec and UML source agree on task admission, request admission, capacity, telemetry, benchmark, migration, and issue-map contracts. Key updates include canonical event names, richer AsyncCapacityPlan fields, request waiter and cancellation semantics, timed wakeups, retry/salvage lease ordering, and clearer public/internal documentation boundaries. Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
|
MkDocs preview: https://81ff1f64.dd-docs-preview.pages.dev Fern preview: https://nvidia-preview-pr-661.docs.buildwithfern.com/nemo/datadesigner
|
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
|
Thanks for putting this together, @eric-tramel — this is a substantial reshaping of the runtime control surfaces and the new module ownership reads cleanly. Here are my thoughts. SummaryThis PR splits runtime control into explicit scheduler task admission ( FindingsWarnings — Worth addressing
from data_designer.engine.dataset_builders.scheduling.resources import stable_task_id
# ...
self._frontier = {task for task in self._frontier if stable_task_id(task) not in wanted}
except Exception:
logger.warning("Admission event sink raised; dropping event", exc_info=True)
return
def acquire_sync(self, item: RequestAdmissionItem) -> RequestAdmissionLease:
try:
asyncio.get_running_loop()
except RuntimeError:
pass
else:
raise RuntimeError(
"acquire_sync would block the running event loop; use acquire_async instead."
)
...
@dataclass
class _EndpointBucket:
aliases: list[str] = field(default_factory=list)
caps: list[int] = field(default_factory=list)
endpoints: dict[tuple[str, str, str], _EndpointBucket] = {}
...
bucket = endpoints.setdefault(endpoint, _EndpointBucket())
bucket.aliases.append(alias)
bucket.caps.append(cap)This drops the
Suggestions — Take it or leave it
Duplication between
What Looks Good
VerdictNeeds changes — the duplicated This review was generated by an AI assistant. |
- add adaptive row-group admission and request-pressure advisory telemetry - add idle regression suite, HTML reports, and Perfetto export tooling - add combined adaptive/request-pressure benchmark guardrails - expand scheduler, request admission, and benchmark tests
Idle-time optimization and observability passThis pass adds a layer on top of the original async scheduling plan PR. The original PR established task admission, request admission, scheduling metadata, correlated observability, and capacity snapshots. This follow-up focused on making scheduler idle time visible, measurable, and tunable. What changed
What improvedLatest quick idle regression result: PASS,
One important caveat: the combined case did not improve wall time in the latest quick run ( Verification run in this pass
Raw timeline/perfetto artifacts remain local; the PR includes the reusable tooling and HTML reports rather than committing the large artifact trees. |
|
Did one more shallow final pass on this, plus an independent Claude cross-review at head A few non-blocking nits / follow-ups worth considering:
Overall I'm comfortable with these as non-blocking, assuming #682 lands before we approve/merge this epic PR. The remaining items look like doc hardening, sink-contract polish, or targeted follow-up tests rather than reasons to hold the whole async scheduling PR. |
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
…scheduling-yolo # Conflicts: # plans/645/module-ownership.md
Greptile SummaryThis PR implements async scheduling admission control (issue #645), splitting runtime control into explicit task-admission and request-admission layers. It introduces AIMD-backed request leases, a fair task queue with virtual-time scheduling, adaptive row-group admission, and request-pressure advisory selection.
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py | Central scheduler rewired to use FairTaskQueue + TaskAdmissionController for all work; adds adaptive row-group admission, request-pressure advisory skip, generator call wrapper (_run_generator_call), and dataframe result validation (_require_dataframe_result). Logic for dropped-row handling, AIMD feedback, and advisory skip all verified correct. |
| packages/data-designer-engine/src/data_designer/engine/models/request_admission/controller.py | New AIMD request-admission controller with threading.Condition-based sync path, asyncio.Event-based async path, CancelledError cleanup, and fair waiter queue. Lease lifecycle (acquire/release), startup ramp, and AIMD increase/decrease logic are correct; the commit-returns-None branch in _admit_waiters_locked is unreachable dead code under the lock. |
| packages/data-designer-engine/src/data_designer/engine/models/clients/model_request_executor.py | New model-call boundary wrapping the adapter with request-admission acquire/release; correctly classifies CancelledError, ProviderError, TimeoutError, and unexpected exceptions in both sync and async paths; rate-limit errors propagate to ModelFacade which converts them to ModelRateLimitError for scheduler retry. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/scheduling/completion.py | Completion tracker updated with FrontierDelta and graph-backed frontier; _enqueue_downstream correctly guards against adding downstream tasks for dropped rows (row_index not in rg_dropped check), and _reevaluate_batch_tasks handles batch unblocking on row drop. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/scheduling/queue.py | New FairTaskQueue with virtual-time fair scheduling, lazy discard via _purge_queue_head, and sequence-version-based stale-commit detection. select_next correctly avoids mutating queue state; commit's sequence_version check is sound. |
| packages/data-designer-engine/src/data_designer/engine/models/request_admission/queue.py | New RequestFairQueue with virtual-time fair ordering for request waiters; remove() increments _sequence_version ensuring commit() consistently rejects stale selections; _first_valid_waiter handles lazy cleanup of removed waiters. |
| packages/data-designer-engine/src/data_designer/engine/models/clients/factory.py | Updated to wrap the adapter with ModelRequestExecutor when request_admission is provided; transport-level retries are disabled (_NO_TRANSPORT_RETRY_CONFIG) when ModelRequestExecutor handles application-level retries, correctly preventing double-retry on transient errors. |
Sequence Diagram
sequenceDiagram
participant S as AsyncScheduler
participant FAQ as FairTaskQueue
participant TAC as TaskAdmissionController
participant MF as ModelFacade
participant MRE as ModelRequestExecutor
participant RAC as AdaptiveRequestAdmissionController
participant API as Provider API
S->>FAQ: enqueue(schedulable_tasks)
loop dispatch loop
S->>FAQ: select_next(is_dispatch_eligible)
FAQ-->>S: QueueSelection
S->>TAC: try_acquire(item, view)
TAC-->>S: TaskAdmissionLease
S->>FAQ: commit(selection)
S->>S: spawn_worker(_execute_task)
end
Note over S,API: Worker execution
S->>MF: acompletion(request)
MF->>MRE: acompletion(request)
MRE->>RAC: acquire_async(item)
RAC-->>MRE: RequestAdmissionLease
MRE->>API: HTTP POST /completions
alt success
API-->>MRE: 200 OK
MRE->>RAC: release(lease, success)
RAC->>RAC: AIMD increase (streak++)
else rate limited
API-->>MRE: 429 Too Many Requests
MRE->>RAC: release(lease, rate_limited)
RAC->>RAC: AIMD decrease + cooldown
MRE-->>MF: ProviderError(RATE_LIMIT)
MF-->>S: ModelRateLimitError (retryable)
end
S->>TAC: release(task_lease)
S->>S: wake_event.set()
Reviews (8): Last reviewed commit: "Merge origin/main into scheduling-yolo" | Re-trigger Greptile
|
Final pass after #682 merged: no runtime blockers from my local review or Claude cross-review. The request-controller release semantics now look good, and focused checks passed locally. Two non-blocking cleanup/docs items worth addressing before final merge or explicitly tracking as follow-ups:
Validation on current head |
|
One UX concern: this PR removes the public I agree with moving runtime request control behind Could we either:
Right now the behavior change is easy to miss because the adaptive logic still has tunables internally, but users no longer have a way to adjust them. |
|
I think two compatibility points are important enough to handle in this PR, especially because we just made plugin APIs stable. More broadly, I’d prefer that this backend refactor not introduce public API/UX breakage unless we explicitly decide that in the release plan.
Both seem compatible with the new backend architecture: keep request admission and scheduling metadata internally, but preserve the public config/plugin UX with deprecation/migration guidance rather than hard-breaking it here. |
|
Responding to @andreatgretel's final-pass notes from May 19. I pushed follow-up commits
Addressed before this pass: #682 is already folded into the branch. I also rechecked that this follow-up did not touch or regress the request-controller release / ceiling behavior.
Addressed in
Addressed in
Addressed in
Addressed across
Validation for the latest follow-up:
|
|
Follow-up for @nabinchha's compatibility concern:
Added in The shim maps the old throttle fields into the new request-admission DTO:
It emits a Also added regression coverage for:
Validation for the shim commit:
|
|
Follow-up for Nabin's
Updated in
I also updated |
📋 Summary
Implements the issue 645 async scheduling epic by splitting runtime control into explicit scheduler task admission and concrete model-request admission, with typed scheduling metadata, AIMD-backed request leases, capacity snapshots, and correlated observability.
This PR now also includes the idle-time optimization pass on top of the original plan PR: adaptive row-group admission, request-pressure-aware task selection, richer scheduler telemetry, Perfetto trace export, and a canonical idle regression/report suite for future scheduler tuning.
🔗 Related Issue
Refs #645
🔄 Changes
✨ Added
SchedulingMetadataand validation in the config package.ModelRequestExecutorandAdaptiveRequestAdmissionControllerfor per-attempt provider/model/domain admission.scripts/benchmarks/run_async_scheduling_idle_regression.pyscripts/benchmarks/generate_async_scheduling_idle_report.pyscripts/benchmarks/export_async_scheduling_perfetto.pyreports/async-scheduling-idle-regression.htmlreports/async-scheduling-idle-analysis.html🔧 Changed
🗑️ Removed
📈 Idle Optimization Evidence
Latest quick idle regression:
PASS (0 errors, 0 warnings),22cases,367checks.10.5%to67.7%llm utilization and reduced frontier/dependency idle from81.3%to14.7%.a_pressuredtoz_openand reduced leased request wait from49.8msto3.2msin the dedicated pressure case.82.4%to84.4%, request utilization from85.6%to89.2%, and request idle by3.7 pp, with193advisory skip events. Wall time did not improve in that combined case, so this is currently a flow/utilization improvement rather than a proven throughput improvement.🔍 Attention Areas
async_scheduler.py— central runtime control flow, lease lifecycle, adaptive row-group admission, request-pressure advisory, and scheduler telemetry.model_request_executor.py— concrete model-call attempt boundary and release outcome classification.controller.py— AIMD request admission state machine and exact request lease accounting.observability.py— scheduler/request event contracts used by benchmark evidence and Perfetto export.benchmark_async_scheduling.py— deterministic benchmark scenarios and derived idle/request metrics.async-scheduling-idle-regression.html— current regression report with adaptation and combined-case figures.async-scheduling-epic-benchmark-report.html— high-level QA and live benchmark report.🧪 Testing
.venv/bin/ruff check packages scripts tests_e2e.venv/bin/ruff format --check packages scripts tests_e2egit diff --checkuv run ruff check scripts/benchmarks/benchmark_async_scheduling.py scripts/benchmarks/generate_async_scheduling_idle_report.py scripts/benchmarks/run_async_scheduling_idle_regression.py packages/data-designer-engine/tests/engine/test_async_scheduling_benchmark.pyuv run ruff format --check scripts/benchmarks/benchmark_async_scheduling.py scripts/benchmarks/generate_async_scheduling_idle_report.py scripts/benchmarks/run_async_scheduling_idle_regression.py packages/data-designer-engine/tests/engine/test_async_scheduling_benchmark.pyuv run pytest packages/data-designer-engine/tests/engine/test_async_scheduling_benchmark.py -q— 17 passeduv run python scripts/benchmarks/run_async_scheduling_idle_regression.py --quick ...— PASS, 0 errors, 0 warningsmake testas a single aggregate command was not rerun; equivalent package suites above passed earlier, and focused checks passed for this optimization pass✅ Checklist
Notes
Raw live benchmark traces and large artifact trees remain local because the full artifact tree is large. This PR includes condensed reports and reusable benchmark tooling so reviewers can inspect the evidence and rerun the canonical regression suite without committing raw JSONL/timeline dumps.
Description updated with AI