[ET Device Support] Device-aware memory planning: separate buffers per device type by Gasoonjia · Pull Request #18375 · pytorch/executorch

Gasoonjia · 2026-03-20T18:21:59Z

Stack from ghstack (oldest at bottom):

[ET Device Support] MethodMeta: expose per-buffer device placement API #18474
[ET Device Support] DeviceMemoryBuffer RAII class for device memory lifetime management #18473
[ET Device Support] Emitter reads non_const_buffer_device from graph meta #18472
-> [ET Device Support] Device-aware memory planning: separate buffers per device type #18375
[ET Device Support] Add NonConstBufferDevice schema for per-buffer device mapping #19497
[ET Device Support] DeviceAllocator interface and DeviceAllocatorRegistry #19496

Extends memory planning to separate device tensors from CPU tensors into distinct
memory buffers. Non-CPU TensorSpecs (e.g., CUDA) are pre-assigned device-specific
mem_ids before the greedy/naive algorithm runs, ensuring they get planned into
independent memory buffers that never share space with CPU tensors.

Differential Revision: D97447105

…r device type Extends memory planning to separate device tensors from CPU tensors into distinct memory buffers. Non-CPU TensorSpecs (e.g., CUDA) are pre-assigned device-specific mem_ids before the greedy/naive algorithm runs, ensuring they get planned into independent memory buffers that never share space with CPU tensors. Differential Revision: [D97447105](https://our.internmc.facebook.com/intern/diff/D97447105/) [ghstack-poisoned]

pytorch-bot · 2026-03-20T18:22:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18375

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull jobs on OSDC in pull requests shadow mode

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…r device type Extends memory planning to separate device tensors from CPU tensors into distinct memory buffers. Non-CPU TensorSpecs (e.g., CUDA) are pre-assigned device-specific mem_ids before the greedy/naive algorithm runs, ensuring they get planned into independent memory buffers that never share space with CPU tensors. Differential Revision: [D97447105](https://our.internmc.facebook.com/intern/diff/D97447105/) ghstack-source-id: 355133801 Pull Request resolved: #18375

github-actions · 2026-03-20T18:22:45Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

… buffers per device type" Extends memory planning to separate device tensors from CPU tensors into distinct memory buffers. Non-CPU TensorSpecs (e.g., CUDA) are pre-assigned device-specific mem_ids before the greedy/naive algorithm runs, ensuring they get planned into independent memory buffers that never share space with CPU tensors. Differential Revision: [D97447105](https://our.internmc.facebook.com/intern/diff/D97447105/) [ghstack-poisoned]

digantdesai

Review automatically exported from Phabricator review in Meta.

JacobSzwejbka

Review automatically exported from Phabricator review in Meta.

… buffers per device type" Extends memory planning to separate device tensors from CPU tensors into distinct memory buffers. Non-CPU TensorSpecs (e.g., CUDA) are pre-assigned device-specific mem_ids before the greedy/naive algorithm runs, ensuring they get planned into independent memory buffers that never share space with CPU tensors. Differential Revision: [D97447105](https://our.internmc.facebook.com/intern/diff/D97447105/) [ghstack-poisoned]

…stry (#19496) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #18474 * #18473 * #18472 * #18375 * #19497 * __->__ #19496 This diff introduces the `DeviceAllocator` abstract interface and `DeviceAllocatorRegistry` for device-specific memory allocation. This is a foundational abstraction that enables the runtime to dispatch memory operations to the appropriate device backend other than CPU (CUDA, etc.). **DeviceAllocator interface provides:** - `allocate()` / `deallocate()` - Dynamic device memory allocation - `copy_host_to_device()` / `copy_device_to_host()` - Data transfer between host and device - `device_type()` - Returns the device type this allocator handles **DeviceAllocatorRegistry provides:** - Singleton registry mapping DeviceType → DeviceAllocator - `register_allocator()` / `get_allocator()` methods - Fixed-size array indexed by device type (no dynamic allocation, embedded-friendly) **Design notes:** - Registry stores raw pointers (non-owning) - allocators are expected to be singletons with static lifetime - Follows ExecuTorch's embedded-first philosophy (no std::unique_ptr, no heap allocation in registry) - Convenience free functions `register_device_allocator()` and `get_device_allocator()` for ease of use Differential Revision: [D93635656](https://our.internmc.facebook.com/intern/diff/D93635656/)

…vice mapping (#19497) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #18474 * #18473 * #18472 * #18375 * __->__ #19497 * #19496 Adds the NonConstBufferDevice table to the FlatBuffer schema (program.fbs) and the corresponding Python dataclass to schema.py. This enables mapping each non-constant planned memory buffer to a specific device type (CPU, CUDA, etc.). The field is optional and absent for CPU-only programs, ensuring zero binary size regression. Differential Revision: [D97335597](https://our.internmc.facebook.com/intern/diff/D97335597/)

Gasoonjia requested review from JacobSzwejbka and larryliu0820 as code owners March 20, 2026 18:22

This was referenced Mar 20, 2026

[ET Device Support] Schema changes: device info on Tensor and buffer-level device array #17533

Merged

[ET Device Support] TensorImpl carries device info #17534

Merged

This was referenced Mar 20, 2026

[executorch] Propagate device metadata from partitioner result onto TensorSpecs #18078

Merged

[ET Device Support] Propagate device info from TensorSpec into serialized Tensor #18079

Merged

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 20, 2026

meta-codesync Bot added fb-exported meta-exported labels Mar 20, 2026

Gasoonjia requested a review from mergennachin as a code owner March 24, 2026 21:24

Gasoonjia mentioned this pull request Mar 24, 2026

[ET Device Support] CudaAllocator: device memory allocator for CUDA backend #18477

Merged

This was referenced Apr 6, 2026

[ET Device Support] Define AOT device copy ops registry #18728

Merged

[ET Device Support] Define et_copy runtime h2d and d2h copy ops #18729

Merged

[ET Device Support] PropagateDevicePass inserts H2D/D2H copy ops at delegate boundaries #18730

Open

digantdesai approved these changes Apr 15, 2026

View reviewed changes

JacobSzwejbka requested changes Apr 29, 2026

View reviewed changes

JacobSzwejbka approved these changes May 8, 2026

View reviewed changes

Gasoonjia added 3 commits May 8, 2026 19:15

This was referenced May 12, 2026

[ET Device Support] DeviceAllocator interface and DeviceAllocatorRegistry #19496

Merged

[ET Device Support] Add NonConstBufferDevice schema for per-buffer device mapping #19497

Merged

Gasoonjia merged commit 6bb6ab3 into gh/gasoonjia/145/base May 12, 2026
165 of 171 checks passed

Gasoonjia deleted the gh/gasoonjia/145/head branch May 12, 2026 05:21

Gasoonjia had a problem deploying to cherry-pick-bot May 12, 2026 05:21 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET Device Support] Device-aware memory planning: separate buffers per device type#18375

[ET Device Support] Device-aware memory planning: separate buffers per device type#18375
Gasoonjia merged 8 commits into
gh/gasoonjia/145/basefrom
gh/gasoonjia/145/head

Gasoonjia commented Mar 20, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 20, 2026

Uh oh!

digantdesai left a comment

Uh oh!

JacobSzwejbka left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Gasoonjia commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18375

❗ 1 Active SEVs

Uh oh!

github-actions Bot commented Mar 20, 2026

This PR needs a release notes: label

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

JacobSzwejbka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Gasoonjia commented Mar 20, 2026 •

edited

Loading

pytorch-bot Bot commented Mar 20, 2026 •

edited

Loading

This PR needs a `release notes:` label