-
Notifications
You must be signed in to change notification settings - Fork 749
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Common] Support scaled & clamped swiglu, srelu for BF16
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3132
opened Jun 16, 2026 by
zhongbozhu
Collaborator
Loading…
13 tasks
[JAX] Remove shard_map from MoEBlock to support quant before FSDP AG using Grouped quant+GEMM custom partitioning rules
#3131
opened Jun 15, 2026 by
jberchtold-nvidia
Collaborator
•
Draft
13 tasks
[torch.compile] Bunch of small changes needed for enabling torch.compile
#3130
opened Jun 15, 2026 by
pggPL
Collaborator
Loading…
8 of 13 tasks
[Common] Add dense router output for fused router
org-contribution
#3129
opened Jun 15, 2026 by
harryzhou2000
Member
Loading…
Expert Parallelism: common C API + NCCL EP backend
2.17
enhancement
New feature or request
MoE
#3127
opened Jun 14, 2026 by
timmoon10
Member
Loading…
feat: add SM_121 (GB10 consumer Blackwell) support for FA4
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3125
opened Jun 12, 2026 by
TyGu1
Loading…
Avoid unpickling the extra state when not needed
#3123
opened Jun 12, 2026 by
ptrendx
Member
Loading…
2 of 6 tasks
docs(readme): update latest news
#3121
opened Jun 11, 2026 by
sbhavani
Collaborator
Loading…
6 of 13 tasks
TE EP integration to MoEBlock
#3116
opened Jun 10, 2026 by
tdophung
Collaborator
Loading…
6 of 13 tasks
[JAX] Collective Gemm test fixes
#3115
opened Jun 10, 2026 by
jberchtold-nvidia
Collaborator
Loading…
13 tasks
Current Scaling Group Quantization + Enabling Varying Last/Both Dims in Group Quantize
#3114
opened Jun 10, 2026 by
vthumbe1503
Collaborator
Loading…
13 tasks
Abstract CUDA hardcodes into configurable te_device_type / te_platform
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3113
opened Jun 10, 2026 by
lxd-cumt
Loading…
[JAX] Return max_logit and softmax aux stats from TE JAX fused attn
2.17
#3112
opened Jun 10, 2026 by
KshitijLakhani
Collaborator
•
Draft
13 tasks
Add entrypoint for flagos multi-backend plugin system
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3107
opened Jun 9, 2026 by
lxd-cumt
Loading…
[PyTorch][torch.compile] Remove process group from quantizers
#3104
opened Jun 8, 2026 by
pggPL
Collaborator
Loading…
3 of 12 tasks
Quantization support for GroupedTensor: FP8 per-tensor
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3102
opened Jun 7, 2026 by
int-smart
Contributor
Loading…
11 of 13 tasks
Introduce Mega-C++ to reduce CPU overhead
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3099
opened Jun 6, 2026 by
zhongbozhu
Collaborator
Loading…
3 of 16 tasks
increased a bit tolerance for pytorch/distributed/run_numerics.py
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3095
opened Jun 5, 2026 by
francesco-bertolotti
Contributor
Loading…
6 of 13 tasks
NVFP4: cache GEMM-swizzled weight scale factors across micro-batches
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3093
opened Jun 5, 2026 by
cael-ling
Contributor
Loading…
3 of 13 tasks
Added thd cudnn guard
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3092
opened Jun 5, 2026 by
francesco-bertolotti
Contributor
Loading…
6 of 13 tasks
Make NVTE tensor handle pool size configurable
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3090
opened Jun 5, 2026 by
lhb8125
Contributor
Loading…
[JAX] Fix norm workspace on global shapes
#3085
opened Jun 4, 2026 by
jberchtold-nvidia
Collaborator
•
Draft
8 of 13 tasks
[JAX] Hopper BF16 grouped GEMM v2 support
#3083
opened Jun 4, 2026 by
jberchtold-nvidia
Collaborator
•
Draft
8 of 13 tasks
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.