From a0e7458a1e722d2ffe905a73c1c123eb97718206 Mon Sep 17 00:00:00 2001 From: ssjia Date: Fri, 10 Apr 2026 07:25:54 -0700 Subject: [PATCH] [ET-VK] Lower reduce_peak_memory threshold from 500 MB to 10 MB During prepack, staging buffers accumulate in buffers_to_clear_ until flush() is called. Previously, the reduce_peak_memory path (which calls submit_and_wait + flush to free staging buffers incrementally) only triggered when total constant data exceeded 500 MB. This meant models with moderate weight sizes (e.g. 42 MB) never benefited from incremental cleanup, causing all staging buffers to coexist in memory until the final flush. Lowering the threshold to 10 MB enables incremental staging buffer cleanup for most models. On SceneX V9 FP16 (42 MB weights, Samsung S24 Adreno 750), this reduces transient VMA peak during prepack from 89.6 MB to 57.3 MB (-36%) at a cost of ~15 ms additional load latency (+4.4%). Steady-state memory and inference performance are unaffected. Authored with Claude. Differential Revision: [D100332227](https://our.internmc.facebook.com/intern/diff/D100332227/) [ghstack-poisoned] --- backends/vulkan/runtime/graph/ComputeGraph.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/backends/vulkan/runtime/graph/ComputeGraph.cpp b/backends/vulkan/runtime/graph/ComputeGraph.cpp index 5e9c7b7ad2a..b14d0f6ab0b 100644 --- a/backends/vulkan/runtime/graph/ComputeGraph.cpp +++ b/backends/vulkan/runtime/graph/ComputeGraph.cpp @@ -1134,8 +1134,9 @@ void ComputeGraph::clear_deferred_cmds() { void ComputeGraph::prepack() { int i = 0; bool submitted = false; - const bool reduce_peak_memory = total_constant_nbytes_ > 500 * MB; + const bool reduce_peak_memory = total_constant_nbytes_ > 10 * MB; // int count = 0; + context_->set_cmd(); for (std::unique_ptr& node : prepack_nodes_) { // Do not trigger on the first or last prepack node.