[PyTorch] Minor optimizations in fused grouped MLP#2888
[PyTorch] Minor optimizations in fused grouped MLP#2888ksivaman merged 2 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
|
/te-ci pytorch L0 |
Greptile SummaryThis PR introduces minor CPU-overhead and performance optimizations in the fused grouped MLP forward and backward paths. The key changes are: (1) replacing two Confidence Score: 5/5Safe to merge — all changes are semantically equivalent refactors with no correctness impact. All optimizations are either no-ops being removed (backward casts on tensors already in the right dtype), or logically equivalent transformations (fused kernel vs. cumsum+cat, hoisted casts). The C++ implementation of No files require special attention. Important Files Changed
Reviews (1): Last reviewed commit: "Merge branch 'main' into minor_opts_fuse..." | Re-trigger Greptile |
Description
Small perf/cpu overhead improvements in fused grouped MLP
Type of change
Changes
.towherever possible.cumsum.Checklist: