Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 47 additions & 1 deletion docs/debug/1_getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -149,10 +149,11 @@ Inspecting the logs
-------------------


Let's look at the files with the logs. Two files will be created:
Let's look at the files with the logs. At least two files will be created:

1. debug logs.
2. statistics logs.
3. optional feature-specific logs (for example AutoswitchGemm metrics).

Let's look inside them!

Expand Down Expand Up @@ -214,6 +215,51 @@ The second log file (``nvdlfw_inspect_statistics_logs/nvdlfw_inspect_globalrank-
INFO - transformer_layer.self_attention.layernorm_qkv_activation_std iteration=000004 value=0.9996
INFO - transformer_layer.self_attention.layernorm_qkv_activation_l1_norm iteration=000004 value=130776.7969

AutoswitchGemm quick guide
--------------------------

``AutoswitchGemm`` monitors quantization quality and can dynamically switch selected GEMMs
to high precision when thresholds are exceeded.

Minimal config example:

.. code-block:: yaml

autoswitch_fc_layers:
enabled: True
layers:
layer_types: [fc1, fc2]
transformer_engine:
AutoswitchGemm:
enabled: True
gemms: [fprop, dgrad, wgrad]
underflow_threshold_pct: 1.0
mse_threshold: 1.0e-4
# Needed only if the layer uses fp8 model parameters and
# you want fprop/dgrad to be able to switch to high precision.
allow_fp8_model_params_dequantized_weight: False
freq: 1

Behavior summary:

1. For each ``(layer, gemm)``, AutoswitchGemm tracks the latest tensor metrics and applies
OR logic across monitored tensors: if any tensor breaches thresholds, that GEMM switches.
2. Metrics computed in iteration ``n`` are consumed in iteration ``n`` only.
3. If thresholds are not breached in the current iteration, the GEMM stays quantized.

When AutoswitchGemm is enabled, an additional directory is created under ``log_dir``:

``nvdlfw_inspect_autoswitchgemm_logs/nvdlfw_inspect_globalrank-<rank>.log``

It contains per-rank, per-iteration metrics such as:

- ``<layer>_<gemm>_<tensor>_underflow_pct``
- ``<layer>_<gemm>_<tensor>_mse``
- ``<layer>_<gemm>_quantized_enabled``
- ``<layer>_<gemm>_disable_until_iter``
- ``<layer>_<gemm>_switch_blocked_fp8_model_params``
- ``<layer>_<gemm>_fp8_model_params_dequantized_fallback``

Logging using TensorBoard
-------------------------

Expand Down
22 changes: 22 additions & 0 deletions docs/debug/2_config_file_structure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,28 @@ We can use both structs for tensors and GEMMs. The tensors_struct should be nest
tensor_feature_param2: value
gemm_feature_param1: value

AutoswitchGemm notes
--------------------

``AutoswitchGemm`` supports both global and per-GEMM configuration.

- Use ``gemms: [...]`` for one shared policy.
- Use ``gemms_struct`` to set per-GEMM thresholds.

If ``tensors``/``tensors_struct`` are omitted, monitored tensors are inferred from GEMMs:

- ``fprop`` -> ``activation``, ``weight``
- ``dgrad`` -> ``gradient``, ``weight``
- ``wgrad`` -> ``activation``, ``gradient``

Other important keys:

- ``underflow_threshold_pct``: switch trigger based on underflow percentage.
- ``mse_threshold``: switch trigger based on quantization MSE.
- metrics are consumed in the same iteration where they are computed.
- ``allow_fp8_model_params_dequantized_weight``: allows ``fprop``/``dgrad`` switching
for layers with FP8 model parameters by using dequantized temporary weights.

Enabling or Disabling Sections and Features
-------------------------------------------

Expand Down
1 change: 1 addition & 0 deletions docs/debug/3_api_features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Debug features
.. autoapiclass:: transformer_engine.debug.features.log_fp8_tensor_stats.LogFp8TensorStats
.. autoapiclass:: transformer_engine.debug.features.log_nvfp4_tensor_stats.LogNvfp4TensorStats
.. autoapiclass:: transformer_engine.debug.features.disable_quantization_gemm.DisableQuantizationGEMM
.. autoapiclass:: transformer_engine.debug.features.autoswitch_gemm.AutoswitchGemm
.. autoapiclass:: transformer_engine.debug.features.disable_quantization_layer.DisableQuantizationLayer
.. autoapiclass:: transformer_engine.debug.features.per_tensor_scaling.PerTensorScaling
.. autoapiclass:: transformer_engine.debug.features.fake_quant.FakeQuant
Expand Down
72 changes: 72 additions & 0 deletions docs/debug/autoswitch_gemm_example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Example config for transformer_engine.debug.features.autoswitch_gemm.AutoswitchGemm
#
# Usage:
# import nvdlfw_inspect.api as debug_api
# debug_api.initialize(
# config_file="docs/debug/autoswitch_gemm_example.yaml",
# feature_dirs=["transformer_engine/debug/features"],
# log_dir="./log",
# )
# ...
# debug_api.step() # call once per training step

autoswitch_attention_blocks:
enabled: True
layers:
# Match attention linear layers, e.g. *.qkv / *.proj
layer_name_regex_pattern: ".*(qkv|proj).*"
transformer_engine:
AutoswitchGemm:
enabled: True

# Optional. If omitted, tensors are inferred from selected gemms:
# fprop -> [activation, weight], dgrad -> [gradient, weight],
# wgrad -> [activation, gradient].
tensors: [activation, weight, gradient]

# Per-GEMM switching policy.
gemms_struct:
- gemm: fprop
underflow_threshold_pct: 1.0
mse_threshold: 1.0e-4
- gemm: dgrad
underflow_threshold_pct: 1.5
mse_threshold: 1.5e-4
- gemm: wgrad
underflow_threshold_pct: 2.0
mse_threshold: 2.0e-4

# For layers with fp8 model parameters:
# - False: keep fprop/dgrad quantized
# - True: allow high-precision switch via temporary dequantized weights
allow_fp8_model_params_dequantized_weight: False

# Collect metrics every step after warmup.
freq: 1
start_step: 10
end_step: 5000


autoswitch_mlp_blocks:
enabled: True
layers:
layer_types: [fc1, fc2]
transformer_engine:
AutoswitchGemm:
enabled: True

# Simpler global policy (shared by selected GEMMs).
gemms: [fprop, wgrad]
tensors: [activation, weight, gradient]

underflow_threshold_pct: 3.0
mse_threshold: 3.0e-4

# Example sparse monitoring windows.
freq: 2
start_end_list:
- [0, 300]
- [800, 3000]

# Autoswitch per-rank metrics are written to:
# <log_dir>/nvdlfw_inspect_autoswitchgemm_logs/nvdlfw_inspect_globalrank-<rank>.log
Loading