Make NVTE tensor handle pool size configurable#3090
Conversation
Signed-off-by: hongbinl <hongbinl@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: hongbinl <hongbinl@nvidia.com>
for more information, see https://pre-commit.ci
|
I am not opposed to creating such a variable, but I would really like to see an example of such legitimate use which goes over this limit. Could you run the experiment that is failing for you with https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/common/transformer_engine.cpp#L487 set to true and send me the log of that? |
te_handle_pool_debug_fakepg_20260616-010629_2134338.pruned.log |
Summary
Motivation
Large model initialization paths can legitimately create more TE tensor handles than the current fixed-size pool allows, even when GPU and CPU memory are otherwise sufficient. Exposing the pool size as an environment variable avoids downstream source patches for these scale-dependent cases.
Testing