Fix VisualWakeWords on GAP9 by Xeratec · Pull Request #187 · pulp-platform/Deeploy

Xeratec · 2026-04-30T09:34:32Z

Summary

Two independent correctness bugs surfaced while bringing up the MLPerf VisualWakeWords (VWW) model on GAP9 with tiling + double buffering. Each is fixable on its own, and either alone is enough to corrupt VWW outputs. Both are fixed here, and VWW is wired into CI on GAP9 so we don't regress.

1. DWConv im2col scratch buffer is undersized on PULPOpen

PULP2DDWConvTemplate was inheriting computeTransientBuffersSize from the regular-conv template, which sizes the scratch as 2 * 8 * ch_in * KH * KW — independent of H_in. The upstream pulp_nn_depthwise_* kernel actually walks a column-shaped im2col of dim_kernel_x * (dim_in_y + pad_top + pad_bot) + dim_kernel_x bytes per core, so for any input taller than the inherited bound the kernel writes past its scratch into the next L1 tensor.

With 8 channels and a 3×3 kernel the two formulas coincide at H=45 and diverge from H=46 onward — VWW PASS_1's 1×8×48×48 DW reproduces the corruption with ~74% of outputs wrong.

Fix: override computeTransientBuffersSize in PULP2DDWConvTemplate to use the kernel's actual per-core formula × NUM_CORES.

Also adds a Kernels/Integer/Conv/DW_2D_RQ_8x16x16 repro test exercising the previously-overflowing geometry, so CI catches it.

2. `pulp_nn_linear_*` leftover-neuron RQS parameter aliasing

When num_o_neurons is not a multiple of 2*NUM_CORES, the trailing odd-neuron if (lft_neurons …) branch in the pulp_nn_linear_* family called pulp_nn_bn_quant_*(sum, *pKappa, *pLambda, …) instead of the per-core advanced *k1, *lambda1 pointers. Every core ended up writing element-0's RQS parameters into its own neuron, so any layer with an odd channel count (e.g. VWW's 2-class classifier) was silently quantized with the wrong kappa/lambda.

Fix lives in pulp-nn-mixed (PR #11 — 1d8eeee0); this PR bumps the submodule and rebuilds the prebuilt GAP9 library against the fixed source.

While in there: split the dropped Clang-only -Wno-incompatible-pointer-types-discards-qualifiers on PULPOpen into the two flags GCC accepts (-Wno-incompatible-pointer-types, -Wno-discards-qualifiers), and add the same -Wno-incompatible-pointer-types plus -Wno-implicit-function-declaration to GAP9 so pulp-nn-mixed builds cleanly from source under the GCC GAP9 toolchain.

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

`PULP2DDWConvTemplate` inherited `computeTransientBuffersSize` from the regular-conv template, which sizes the scratch as `2 * 8 * ch_in * KH * KW` — independent of `H_in`. The upstream `pulp_nn_depthwise_*` kernel actually walks a column-shaped im2col of `dim_kernel_x * (dim_in_y + pad_top + pad_bot) + dim_kernel_x` bytes per core, so for any input taller than the inherited bound the kernel writes past its scratch into the next L1 tensor. With 8 channels and a 3×3 kernel the two formulas coincide at H=45 and diverge from H=46 on (e.g. VWW PASS_1's 1×8×48×48 DW reproduces the corruption with ~74% of outputs wrong). Override `computeTransientBuffersSize` in `PULP2DDWConvTemplate` to use the kernel's actual per-core formula × NUM_CORES. Add a `Kernels/Integer/Conv/DW_2D_RQ_8x16x16` repro test exercising the previously-overflowing geometry, and wire VWW + the new kernel into the GAP9 L2 single- and double-buffer tiling rosters so CI catches any future regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bump pulp-nn-mixed to 1d8eeee0 (PR pulp-platform#11), which fixes the `pulp_nn_linear_*` family: when `num_o_neurons` is not a multiple of 2*NUM_CORES, the trailing odd neuron's `if (lft_neurons …)` branch called `pulp_nn_bn_quant_*(sum, *pKappa, *pLambda, …)` instead of the per-core advanced `*k1, *lambda1` pointers, so every core wrote element 0's RQS parameters into its own neuron. Any layer with an odd channel count (e.g. VWW's 2-class classifier) ended up with the wrong kappa/lambda. Rebuild the prebuilt GAP9 library against the fixed source. Split the dropped Clang-only `-Wno-incompatible-pointer-types- discards-qualifiers` on PULPOpen into the two flags GCC accepts (`-Wno-incompatible-pointer-types`, `-Wno-discards-qualifiers`) and add the same `-Wno-incompatible-pointer-types` plus `-Wno-implicit-function-declaration` to the GAP9 target so `pulp-nn-mixed` builds clean from source under the GCC GAP9 toolchain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-30T11:08:31Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8daff050-e44d-4101-9957-52106424d1ac

📥 Commits

Reviewing files that changed from the base of the PR and between 06e2250 and 8c769c7.

📒 Files selected for processing (4)

Deeploy/Targets/Generic/Templates/iSoftmaxPreAllocatedBuffTemplate.py
Deeploy/Targets/PULPOpen/Templates/ConvTemplate.py
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
Deeploy/Targets/PULPOpen/Templates/iSoftmaxTemplate.py

🚧 Files skipped from review as they are similar to previous changes (1)

Deeploy/Targets/PULPOpen/Templates/ConvTemplate.py

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added VisualWakeWords model to test configurations.
Improvements
- Improved transient buffer sizing for depthwise convolution, reducing per-core scratch usage and improving memory behavior.
- Adjusted compiler warnings to improve build stability and diagnostics.
Chores
- Updated third-party library revision.

Walkthrough

Converted several transient-buffer sizing helpers from static to instance methods, added depthwise-specific per-core im2col sizing in PULP DW conv templates, updated tiled test configs to include a DW kernel and VisualWakeWords model entries, tweaked compiler warning flags, and advanced a pulp-nn-mixed submodule pointer.

Changes

Cohort / File(s)	Summary
PULPOpen Conv Templates `Deeploy/Targets/PULPOpen/Templates/ConvTemplate.py`	Converted `computeTransientBuffersSize` to instance methods in `PULP2DConvTemplate` and `PULP1DConvTemplate`; added depthwise overrides `computeTransientBuffersSize` in `PULP2DDWConvTemplate` and `PULP1DDWConvTemplate` that return a per-core im2col `${nodeName}_buffer` sized with depthwise-specific formula (im2col_dim = 8 * per_core). Updated callers to use `self.computeTransientBuffersSize(...)`.
Float / iSoftmax Templates `Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py`, `Deeploy/Targets/PULPOpen/Templates/iSoftmaxTemplate.py`, `Deeploy/Targets/Generic/Templates/iSoftmaxPreAllocatedBuffTemplate.py`	Replaced several `@staticmethod` buffer-size helpers with instance methods and updated `hoistTransientBuffers` call sites to invoke `self.computeTransientBuffersSize(...)` instead of class-qualified static calls.
Test Configs `DeeployTest/test_gap9_tiled_config.py`, `DeeployTest/test_siracusa_tiled_config.py`	Added specialized DW kernel entry `Kernels/Integer/Conv/DW_2D_RQ_8x16x16` and expanded `Kernels/Integer/Conv/DW_2D_RQ` values; added `Models/MLPerf/VisualWakeWords` to single- and double-buffer model lists with platform-specific L2 sizes.
CMake warning flags `TargetLibraries/GAP9/CMakeLists.txt`, `TargetLibraries/PULPOpen/CMakeLists.txt`	Adjusted compiler warning suppression flags: added `-Wno-incompatible-pointer-types` and `-Wno-implicit-function-declaration` for GAP9; split a combined suppression into `-Wno-incompatible-pointer-types` and `-Wno-discards-qualifiers` for PULPOpen.
Third-party submodule `TargetLibraries/third_party/pulp-nn-mixed`	Advanced the git submodule pointer to a new commit hash (third-party revision update).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

Victor-Jung

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix VisualWakeWords on GAP9' directly corresponds to the main objective of the PR—fixing VWW corruption bugs on GAP9. It is concise, clear, and accurately summarizes the primary change.
Description check	✅ Passed	The description comprehensively documents two independent correctness bugs, their root causes, fixes, and CI integration. It directly relates to all file changes (template methods, test configurations, compiler flags, submodule bump) and provides clear context for the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Deeploy/Targets/PULPOpen/Templates/ConvTemplate.py`:
- Around line 65-80: The hoisting path still calls
PULP2DConvTemplate.computeTransientBuffersSize directly so the depthwise
override in PULP2DDWConvTemplate is never used; update
PULP2DConvTemplate.hoistTransientBuffers to call the class/instance
computeTransientBuffersSize dynamically (e.g., self.computeTransientBuffersSize
or type(obj).computeTransientBuffersSize) and then pass each returned tuple into
ctxt.hoistTransientBuffer so that
PULP2DDWConvTemplate.computeTransientBuffersSize results are actually hoisted
via ctxt.hoistTransientBuffer.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7512f408-b7f3-4da9-8f16-434097f40bfb

📥 Commits

Reviewing files that changed from the base of the PR and between 3b011bb and 06e2250.

📒 Files selected for processing (11)

Deeploy/Targets/PULPOpen/Templates/ConvTemplate.py
DeeployTest/Tests/Kernels/Integer/Conv/DW_2D_RQ_8x16x16/activations.npz
DeeployTest/Tests/Kernels/Integer/Conv/DW_2D_RQ_8x16x16/inputs.npz
DeeployTest/Tests/Kernels/Integer/Conv/DW_2D_RQ_8x16x16/network.onnx
DeeployTest/Tests/Kernels/Integer/Conv/DW_2D_RQ_8x16x16/outputs.npz
DeeployTest/test_gap9_tiled_config.py
DeeployTest/test_siracusa_tiled_config.py
TargetLibraries/GAP9/CMakeLists.txt
TargetLibraries/GAP9/prebuilt/libpulp-nn-mixed.a
TargetLibraries/PULPOpen/CMakeLists.txt
TargetLibraries/third_party/pulp-nn-mixed

`hoistTransientBuffers` called `computeTransientBuffersSize` via the hard-coded class name (e.g. `PULP2DConvTemplate.computeTransient...`), which bypassed any subclass override regardless of the actual `self` type. As a result the `PULP2DDWConvTemplate` override added in 65eda17 was dead code: VWW codegen kept using the regular-conv formula instead of the depthwise per-core size. Convert `computeTransientBuffersSize` from `@staticmethod` to an instance method on each affected template (PULP 1D/2D conv, PULP 2D float conv + DW, PULP iSoftmax, generic iSoftmax pre-allocated) and call it via `self.computeTransientBuffersSize(...)` so dispatch goes through the actual class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`PULP1DDWConvTemplate` previously inherited `computeTransientBuffersSize` from the regular 1D conv template, which sizes scratch as `8 * 2 * ch_in * KY` — independent of `H_in`. The upstream `pulp_nn_depthwise_*` kernel walks a column-shaped im2col of `dim_kernel_y * (dim_in_y + pad_top + pad_bot) + dim_kernel_y` bytes per core, so for tall enough inputs the kernel overflows the inherited buffer. Add the depthwise per-core formula × NUM_CORES, mirroring the 2D fix in 65eda17. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Victor-Jung · 2026-05-03T20:06:06Z

NITPICK: Update the link #11 to point at the pulp-nn-mixed repo PR.

Victor-Jung

Changes look good to me. Thanks for the fixes! Why is the CI failing? Hasn't the Docker been updated with the new pulp-nn-mixed version? As soon as it passes, we can merge.

Xeratec and others added 3 commits April 30, 2026 11:22

Enable VisualWakeWord Test on GAP9

06e2250

Xeratec self-assigned this Apr 30, 2026

Xeratec added the Bug Something isn't working label Apr 30, 2026

Xeratec added this to Deeploy Apr 30, 2026

Xeratec moved this to In progress in Deeploy Apr 30, 2026

Xeratec moved this from In progress to Need Reviewer in Deeploy Apr 30, 2026

Xeratec added this to the Release 0.2.2 milestone Apr 30, 2026

Xeratec moved this from Need Reviewer to In progress in Deeploy Apr 30, 2026

Xeratec changed the title ~~Fix VisualWakeWords on GAP9: DWConv scratch sizing + pulp-nn linear RQS aliasing~~ Fix VisualWakeWords on GAP9 Apr 30, 2026

Xeratec marked this pull request as ready for review April 30, 2026 11:04

Xeratec requested review from Victor-Jung and runwangdl as code owners April 30, 2026 11:04

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread Deeploy/Targets/PULPOpen/Templates/ConvTemplate.py Outdated

Xeratec and others added 2 commits April 30, 2026 17:09

Victor-Jung reviewed May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix VisualWakeWords on GAP9#187

Fix VisualWakeWords on GAP9#187
Xeratec wants to merge 5 commits intopulp-platform:develfrom
Xeratec:pr/fix_vww

Xeratec commented Apr 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Victor-Jung commented May 3, 2026

Uh oh!

Victor-Jung left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Xeratec commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. DWConv im2col scratch buffer is undersized on PULPOpen

2. pulp_nn_linear_* leftover-neuron RQS parameter aliasing

PR Merge Checklist

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Victor-Jung commented May 3, 2026

Uh oh!

Victor-Jung left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Xeratec commented Apr 30, 2026 •

edited

Loading

2. `pulp_nn_linear_*` leftover-neuron RQS parameter aliasing

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading