Skip to content

Fix VisualWakeWords on GAP9#187

Open
Xeratec wants to merge 5 commits intopulp-platform:develfrom
Xeratec:pr/fix_vww
Open

Fix VisualWakeWords on GAP9#187
Xeratec wants to merge 5 commits intopulp-platform:develfrom
Xeratec:pr/fix_vww

Conversation

@Xeratec
Copy link
Copy Markdown
Member

@Xeratec Xeratec commented Apr 30, 2026

Summary

Two independent correctness bugs surfaced while bringing up the MLPerf VisualWakeWords (VWW) model on GAP9 with tiling + double buffering. Each is fixable on its own, and either alone is enough to corrupt VWW outputs. Both are fixed here, and VWW is wired into CI on GAP9 so we don't regress.

1. DWConv im2col scratch buffer is undersized on PULPOpen

PULP2DDWConvTemplate was inheriting computeTransientBuffersSize from the regular-conv template, which sizes the scratch as 2 * 8 * ch_in * KH * KW — independent of H_in. The upstream pulp_nn_depthwise_* kernel actually walks a column-shaped im2col of dim_kernel_x * (dim_in_y + pad_top + pad_bot) + dim_kernel_x bytes per core, so for any input taller than the inherited bound the kernel writes past its scratch into the next L1 tensor.

With 8 channels and a 3×3 kernel the two formulas coincide at H=45 and diverge from H=46 onward — VWW PASS_1's 1×8×48×48 DW reproduces the corruption with ~74% of outputs wrong.

Fix: override computeTransientBuffersSize in PULP2DDWConvTemplate to use the kernel's actual per-core formula × NUM_CORES.

Also adds a Kernels/Integer/Conv/DW_2D_RQ_8x16x16 repro test exercising the previously-overflowing geometry, so CI catches it.

2. pulp_nn_linear_* leftover-neuron RQS parameter aliasing

When num_o_neurons is not a multiple of 2*NUM_CORES, the trailing odd-neuron if (lft_neurons …) branch in the pulp_nn_linear_* family called pulp_nn_bn_quant_*(sum, *pKappa, *pLambda, …) instead of the per-core advanced *k1, *lambda1 pointers. Every core ended up writing element-0's RQS parameters into its own neuron, so any layer with an odd channel count (e.g. VWW's 2-class classifier) was silently quantized with the wrong kappa/lambda.

Fix lives in pulp-nn-mixed (PR #111d8eeee0); this PR bumps the submodule and rebuilds the prebuilt GAP9 library against the fixed source.

While in there: split the dropped Clang-only -Wno-incompatible-pointer-types-discards-qualifiers on PULPOpen into the two flags GCC accepts (-Wno-incompatible-pointer-types, -Wno-discards-qualifiers), and add the same -Wno-incompatible-pointer-types plus -Wno-implicit-function-declaration to GAP9 so pulp-nn-mixed builds cleanly from source under the GCC GAP9 toolchain.

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.
  5. If the docker was modified, change back its link after review.

Xeratec and others added 3 commits April 30, 2026 11:22
`PULP2DDWConvTemplate` inherited `computeTransientBuffersSize` from
the regular-conv template, which sizes the scratch as
`2 * 8 * ch_in * KH * KW` — independent of `H_in`. The upstream
`pulp_nn_depthwise_*` kernel actually walks a column-shaped im2col of
`dim_kernel_x * (dim_in_y + pad_top + pad_bot) + dim_kernel_x` bytes
per core, so for any input taller than the inherited bound the kernel
writes past its scratch into the next L1 tensor. With 8 channels and
a 3×3 kernel the two formulas coincide at H=45 and diverge from H=46
on (e.g. VWW PASS_1's 1×8×48×48 DW reproduces the corruption with
~74% of outputs wrong).

Override `computeTransientBuffersSize` in `PULP2DDWConvTemplate` to
use the kernel's actual per-core formula × NUM_CORES.

Add a `Kernels/Integer/Conv/DW_2D_RQ_8x16x16` repro test exercising
the previously-overflowing geometry, and wire VWW + the new kernel
into the GAP9 L2 single- and double-buffer tiling rosters so CI
catches any future regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bump pulp-nn-mixed to 1d8eeee0 (PR pulp-platform#11), which fixes the
`pulp_nn_linear_*` family: when `num_o_neurons` is not a multiple of
2*NUM_CORES, the trailing odd neuron's `if (lft_neurons …)` branch
called `pulp_nn_bn_quant_*(sum, *pKappa, *pLambda, …)` instead of the
per-core advanced `*k1, *lambda1` pointers, so every core wrote
element 0's RQS parameters into its own neuron. Any layer with an
odd channel count (e.g. VWW's 2-class classifier) ended up with the
wrong kappa/lambda. Rebuild the prebuilt GAP9 library against the
fixed source.

Split the dropped Clang-only `-Wno-incompatible-pointer-types-
discards-qualifiers` on PULPOpen into the two flags GCC accepts
(`-Wno-incompatible-pointer-types`, `-Wno-discards-qualifiers`) and
add the same `-Wno-incompatible-pointer-types` plus
`-Wno-implicit-function-declaration` to the GAP9 target so
`pulp-nn-mixed` builds clean from source under the GCC GAP9
toolchain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Xeratec Xeratec self-assigned this Apr 30, 2026
@Xeratec Xeratec added the Bug Something isn't working label Apr 30, 2026
@Xeratec Xeratec added this to Deeploy Apr 30, 2026
@Xeratec Xeratec moved this to In progress in Deeploy Apr 30, 2026
@Xeratec Xeratec moved this from In progress to Need Reviewer in Deeploy Apr 30, 2026
@Xeratec Xeratec added this to the Release 0.2.2 milestone Apr 30, 2026
@Xeratec Xeratec moved this from Need Reviewer to In progress in Deeploy Apr 30, 2026
@Xeratec Xeratec changed the title Fix VisualWakeWords on GAP9: DWConv scratch sizing + pulp-nn linear RQS aliasing Fix VisualWakeWords on GAP9 Apr 30, 2026
@Xeratec Xeratec marked this pull request as ready for review April 30, 2026 11:04
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8daff050-e44d-4101-9957-52106424d1ac

📥 Commits

Reviewing files that changed from the base of the PR and between 06e2250 and 8c769c7.

📒 Files selected for processing (4)
  • Deeploy/Targets/Generic/Templates/iSoftmaxPreAllocatedBuffTemplate.py
  • Deeploy/Targets/PULPOpen/Templates/ConvTemplate.py
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
  • Deeploy/Targets/PULPOpen/Templates/iSoftmaxTemplate.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • Deeploy/Targets/PULPOpen/Templates/ConvTemplate.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added VisualWakeWords model to test configurations.
  • Improvements

    • Improved transient buffer sizing for depthwise convolution, reducing per-core scratch usage and improving memory behavior.
    • Adjusted compiler warnings to improve build stability and diagnostics.
  • Chores

    • Updated third-party library revision.

Walkthrough

Converted several transient-buffer sizing helpers from static to instance methods, added depthwise-specific per-core im2col sizing in PULP DW conv templates, updated tiled test configs to include a DW kernel and VisualWakeWords model entries, tweaked compiler warning flags, and advanced a pulp-nn-mixed submodule pointer.

Changes

Cohort / File(s) Summary
PULPOpen Conv Templates
Deeploy/Targets/PULPOpen/Templates/ConvTemplate.py
Converted computeTransientBuffersSize to instance methods in PULP2DConvTemplate and PULP1DConvTemplate; added depthwise overrides computeTransientBuffersSize in PULP2DDWConvTemplate and PULP1DDWConvTemplate that return a per-core im2col ${nodeName}_buffer sized with depthwise-specific formula (im2col_dim = 8 * per_core). Updated callers to use self.computeTransientBuffersSize(...).
Float / iSoftmax Templates
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py, Deeploy/Targets/PULPOpen/Templates/iSoftmaxTemplate.py, Deeploy/Targets/Generic/Templates/iSoftmaxPreAllocatedBuffTemplate.py
Replaced several @staticmethod buffer-size helpers with instance methods and updated hoistTransientBuffers call sites to invoke self.computeTransientBuffersSize(...) instead of class-qualified static calls.
Test Configs
DeeployTest/test_gap9_tiled_config.py, DeeployTest/test_siracusa_tiled_config.py
Added specialized DW kernel entry Kernels/Integer/Conv/DW_2D_RQ_8x16x16 and expanded Kernels/Integer/Conv/DW_2D_RQ values; added Models/MLPerf/VisualWakeWords to single- and double-buffer model lists with platform-specific L2 sizes.
CMake warning flags
TargetLibraries/GAP9/CMakeLists.txt, TargetLibraries/PULPOpen/CMakeLists.txt
Adjusted compiler warning suppression flags: added -Wno-incompatible-pointer-types and -Wno-implicit-function-declaration for GAP9; split a combined suppression into -Wno-incompatible-pointer-types and -Wno-discards-qualifiers for PULPOpen.
Third-party submodule
TargetLibraries/third_party/pulp-nn-mixed
Advanced the git submodule pointer to a new commit hash (third-party revision update).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • Victor-Jung
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix VisualWakeWords on GAP9' directly corresponds to the main objective of the PR—fixing VWW corruption bugs on GAP9. It is concise, clear, and accurately summarizes the primary change.
Description check ✅ Passed The description comprehensively documents two independent correctness bugs, their root causes, fixes, and CI integration. It directly relates to all file changes (template methods, test configurations, compiler flags, submodule bump) and provides clear context for the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Deeploy/Targets/PULPOpen/Templates/ConvTemplate.py`:
- Around line 65-80: The hoisting path still calls
PULP2DConvTemplate.computeTransientBuffersSize directly so the depthwise
override in PULP2DDWConvTemplate is never used; update
PULP2DConvTemplate.hoistTransientBuffers to call the class/instance
computeTransientBuffersSize dynamically (e.g., self.computeTransientBuffersSize
or type(obj).computeTransientBuffersSize) and then pass each returned tuple into
ctxt.hoistTransientBuffer so that
PULP2DDWConvTemplate.computeTransientBuffersSize results are actually hoisted
via ctxt.hoistTransientBuffer.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7512f408-b7f3-4da9-8f16-434097f40bfb

📥 Commits

Reviewing files that changed from the base of the PR and between 3b011bb and 06e2250.

📒 Files selected for processing (11)
  • Deeploy/Targets/PULPOpen/Templates/ConvTemplate.py
  • DeeployTest/Tests/Kernels/Integer/Conv/DW_2D_RQ_8x16x16/activations.npz
  • DeeployTest/Tests/Kernels/Integer/Conv/DW_2D_RQ_8x16x16/inputs.npz
  • DeeployTest/Tests/Kernels/Integer/Conv/DW_2D_RQ_8x16x16/network.onnx
  • DeeployTest/Tests/Kernels/Integer/Conv/DW_2D_RQ_8x16x16/outputs.npz
  • DeeployTest/test_gap9_tiled_config.py
  • DeeployTest/test_siracusa_tiled_config.py
  • TargetLibraries/GAP9/CMakeLists.txt
  • TargetLibraries/GAP9/prebuilt/libpulp-nn-mixed.a
  • TargetLibraries/PULPOpen/CMakeLists.txt
  • TargetLibraries/third_party/pulp-nn-mixed

Comment thread Deeploy/Targets/PULPOpen/Templates/ConvTemplate.py Outdated
Xeratec and others added 2 commits April 30, 2026 17:09
`hoistTransientBuffers` called `computeTransientBuffersSize` via the
hard-coded class name (e.g. `PULP2DConvTemplate.computeTransient...`),
which bypassed any subclass override regardless of the actual `self`
type. As a result the `PULP2DDWConvTemplate` override added in
65eda17 was dead code: VWW codegen kept using the regular-conv
formula instead of the depthwise per-core size.

Convert `computeTransientBuffersSize` from `@staticmethod` to an
instance method on each affected template (PULP 1D/2D conv,
PULP 2D float conv + DW, PULP iSoftmax, generic iSoftmax pre-allocated)
and call it via `self.computeTransientBuffersSize(...)` so dispatch
goes through the actual class.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`PULP1DDWConvTemplate` previously inherited
`computeTransientBuffersSize` from the regular 1D conv template, which
sizes scratch as `8 * 2 * ch_in * KY` — independent of `H_in`. The
upstream `pulp_nn_depthwise_*` kernel walks a column-shaped im2col of
`dim_kernel_y * (dim_in_y + pad_top + pad_bot) + dim_kernel_y` bytes
per core, so for tall enough inputs the kernel overflows the inherited
buffer. Add the depthwise per-core formula × NUM_CORES, mirroring the
2D fix in 65eda17.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Victor-Jung
Copy link
Copy Markdown
Member

NITPICK: Update the link #11 to point at the pulp-nn-mixed repo PR.

Copy link
Copy Markdown
Member

@Victor-Jung Victor-Jung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. Thanks for the fixes! Why is the CI failing? Hasn't the Docker been updated with the new pulp-nn-mixed version? As soon as it passes, we can merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants