bitsandbytes-foundation · Titus-von-Koeller · Apr 20, 2026 · matthewdouglas · Apr 20, 2026 · matthewdouglas
diff --git a/README.md b/README.md
@@ -183,6 +183,43 @@ bitsandbytes has the following minimum requirements for all platforms:
 * 🤗 [Diffusers](https://huggingface.co/docs/diffusers/quantization/bitsandbytes)
 * 🤗 [PEFT](https://huggingface.co/docs/peft/developer_guides/quantization#quantize-a-model)
 
+## Telemetry
+
+`bitsandbytes` sends anonymous, aggregate feature-usage telemetry to the
+Hugging Face Hub. This data is used to prioritize maintenance (which quantization
+methods and optimizers are actually in use?) and to safely retire features that
+are no longer called by anyone.
+
+### What is collected
+
+* A session fingerprint sent once per process: `bitsandbytes` version, OS
+  name/version, CPU architecture, Python/PyTorch versions, accelerator
+  vendor/name/arch/count (e.g. `nvidia`, `NVIDIA H100`, `sm_90`, `1`).
+* One event per distinct feature used, with feature-specific flags. For
+  example: using `Linear4bit` sends `quant_type=nf4`, `blocksize=64`; using
+  `AdamW8bit.step()` sends `name=AdamW8bit`, `is_paged=false`.
+
+### What is never collected
+
+Model names, file paths, tensor shapes, parameter values, user identifiers, or
+anything derived from user input.
+
+### How to opt out
+
+Set any one of these environment variables:
+
+| Variable                     | Scope                        |
+| ---------------------------- | ---------------------------- |
+| `BNB_DISABLE_TELEMETRY=1`    | `bitsandbytes` only          |
+| `HF_HUB_DISABLE_TELEMETRY=1` | all Hugging Face libraries   |
+| `HF_HUB_OFFLINE=1`           | all Hugging Face libraries   |
+
+Telemetry is also automatically suppressed while running under `pytest` (so
+CI and local test runs don't pollute the stream) and a silent no-op when
+`huggingface_hub` is not installed. The implementation lives in
+[`bitsandbytes/_telemetry.py`](bitsandbytes/_telemetry.py) and each event
+fires at most once per process.
-## Telemetry
-
-`bitsandbytes` sends anonymous, aggregate feature-usage telemetry to the
-Hugging Face Hub. This data is used to prioritize maintenance (which quantization
-methods and optimizers are actually in use?) and to safely retire features that
-are no longer called by anyone.
-
-### What is collected
-
-* A session fingerprint sent once per process: `bitsandbytes` version, OS
-  name/version, CPU architecture, Python/PyTorch versions, accelerator
-  vendor/name/arch/count (e.g. `nvidia`, `NVIDIA H100`, `sm_90`, `1`).
-* One event per distinct feature used, with feature-specific flags. For
-  example: using `Linear4bit` sends `quant_type=nf4`, `blocksize=64`; using
-  `AdamW8bit.step()` sends `name=AdamW8bit`, `is_paged=false`.
-
-### What is never collected
-
-Model names, file paths, tensor shapes, parameter values, user identifiers, or
-anything derived from user input.
-
-### How to opt out
-
-Set any one of these environment variables:
-
-| Variable                     | Scope                        |
-| ---------------------------- | ---------------------------- |
-| `BNB_DISABLE_TELEMETRY=1`    | `bitsandbytes` only          |
-| `HF_HUB_DISABLE_TELEMETRY=1` | all Hugging Face libraries   |
-| `HF_HUB_OFFLINE=1`           | all Hugging Face libraries   |
-
-Telemetry is also automatically suppressed while running under `pytest` (so
-CI and local test runs don't pollute the stream) and a silent no-op when
-`huggingface_hub` is not installed. The implementation lives in
-[`bitsandbytes/_telemetry.py`](bitsandbytes/_telemetry.py) and each event
-fires at most once per process.
+## Telemetry
+
+bitsandbytes collects anonymous feature-usage data using the same telemetry
+mechanism as other Hugging Face libraries (Transformers, Gradio, etc.). This
+helps us understand which features are actively used so we can prioritize
+maintenance and make informed decisions about deprecation.
+
+### What is collected
+
+Hardware and version info sent once per process (bitsandbytes version, OS, CPU
+architecture, accelerator type and compute capability) plus one event per
+distinct feature used per process.
+
+### How to opt out
+
+Set any of the following environment variables:
+
+| Variable                     | Effect                              |
+| ---------------------------- | ----------------------------------- |
+| `HF_HUB_DISABLE_TELEMETRY=1` | Disables telemetry in all HF libs   |
+| `HF_HUB_OFFLINE=1`           | Disables all outbound HF Hub calls  |
+| `DO_NOT_TRACK=1`             | Standard cross-tool opt-out signal  |
+
-## Telemetry
-
-`bitsandbytes` sends anonymous, aggregate feature-usage telemetry to the
-Hugging Face Hub. This data is used to prioritize maintenance (which quantization
-methods and optimizers are actually in use?) and to safely retire features that
-are no longer called by anyone.
-
-### What is collected
-
-* A session fingerprint sent once per process: `bitsandbytes` version, OS
-  name/version, CPU architecture, Python/PyTorch versions, accelerator
-  vendor/name/arch/count (e.g. `nvidia`, `NVIDIA H100`, `sm_90`, `1`).
-* One event per distinct feature used, with feature-specific flags. For
-  example: using `Linear4bit` sends `quant_type=nf4`, `blocksize=64`; using
-  `AdamW8bit.step()` sends `name=AdamW8bit`, `is_paged=false`.
-
-### What is never collected
-
-Model names, file paths, tensor shapes, parameter values, user identifiers, or
-anything derived from user input.
-
-### How to opt out
-
-Set any one of these environment variables:
-
-| Variable                     | Scope                        |
-| ---------------------------- | ---------------------------- |
-| `BNB_DISABLE_TELEMETRY=1`    | `bitsandbytes` only          |
-| `HF_HUB_DISABLE_TELEMETRY=1` | all Hugging Face libraries   |
-| `HF_HUB_OFFLINE=1`           | all Hugging Face libraries   |
-
-Telemetry is also automatically suppressed while running under `pytest` (so
-CI and local test runs don't pollute the stream) and a silent no-op when
-`huggingface_hub` is not installed. The implementation lives in
-[`bitsandbytes/_telemetry.py`](bitsandbytes/_telemetry.py) and each event
-fires at most once per process.
+## Telemetry
+
+bitsandbytes collects anonymous feature-usage data using the same telemetry
+mechanism as other Hugging Face libraries (Transformers, Gradio, etc.). This
+helps us understand which features are actively used so we can prioritize
+maintenance and make informed decisions about deprecation.
+
+### What is collected
+
+Hardware and version info sent once per process (bitsandbytes version, OS, CPU
+architecture, accelerator type and compute capability) plus one event per
+distinct feature used per process.
+
+### How to opt out
+
+Set any of the following environment variables:
+
+| Variable                     | Effect                              |
+| ---------------------------- | ----------------------------------- |
+| `HF_HUB_DISABLE_TELEMETRY=1` | Disables telemetry in all HF libs   |
+| `HF_HUB_OFFLINE=1`           | Disables all outbound HF Hub calls  |
+| `DO_NOT_TRACK=1`             | Standard cross-tool opt-out signal  |
+
+
 ## :heart: Sponsors
 The continued maintenance and development of `bitsandbytes` is made possible thanks to the generous support of our sponsors. Their contributions help ensure that we can keep improving the project and delivering valuable updates to the community.
 

diff --git a/bitsandbytes/_telemetry.py b/bitsandbytes/_telemetry.py
@@ -0,0 +1,231 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""Anonymous feature-usage telemetry for bitsandbytes.
+
+Sends one HEAD request per distinct feature per process via
+`huggingface_hub.utils.send_telemetry()`. Data lands in the Hugging Face
+Hub telemetry index under `path_prefix == "/api/telemetry/bitsandbytes/"`
+and informs maintenance and deprecation decisions.
+
+What is collected
+    - Session fingerprint (once per process, first feature use):
+      bnb version, OS name/version, CPU arch, glibc version, Python/torch
+      versions, accelerator vendor/name/arch/count.
+    - Per-feature events: feature name plus feature-specific metadata
+      (e.g. `quant_type="nf4"`, `bits="8"`, `paged="true"`).
+
+What is NOT collected
+    Model names, file paths, parameter shapes, user identifiers, training
+    data, gradient values, or any value derived from user input.
+
+Automatically disabled when running under pytest (detected via
+`pytest` in `sys.modules` or `PYTEST_CURRENT_TEST` env var) so that test
+runs in CI and locally do not pollute the real-usage stream.
+
+Opt-out (any of the following env vars disables all telemetry):
+    - BNB_DISABLE_TELEMETRY=1           (bitsandbytes only)
+    - HF_HUB_DISABLE_TELEMETRY=1        (all HF libraries)
+    - HF_HUB_OFFLINE=1                  (all HF libraries)
+
+End-to-end verification:
+    Set `BNB_TELEMETRY_TAG=<some-id>` before importing bitsandbytes and the
+    value is attached as `bitsandbytes.tag` on every event. Use this to
+    correlate a single run's events in ES.
+
+No-ops silently if `huggingface_hub` is not installed, and never raises.
+
+Keys are namespaced under `bitsandbytes.*` in the resulting
+`metadata.bitsandbytes.*` fields so they do not collide with fields logged
+by other libraries in the shared telemetry index.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import platform
+import sys
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+_REPORTED: set[str] = set()
+_FINGERPRINT: Optional[dict[str, str]] = None
+
+_TRUTHY = frozenset({"1", "true", "yes", "on"})
+
+
+def _is_pytest() -> bool:
+    """Detect whether we are running inside a pytest process.
+
+    Telemetry is suppressed during test runs so that CI and local test
+    invocations don't pollute the real-usage stream. Tests that want to
+    assert on telemetry behavior monkey-patch this function to return False.
+    """
+    return "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
+
+
+def _is_disabled() -> bool:
+    for var in ("BNB_DISABLE_TELEMETRY", "HF_HUB_DISABLE_TELEMETRY", "HF_HUB_OFFLINE"):
+        if os.environ.get(var, "").strip().lower() in _TRUTHY:
+            return True
+    if _is_pytest():
+        return True
+    return False
+
+
+def _os_info() -> tuple[str, str]:
+    os_name = platform.system()
+    os_name = {"Darwin": "macOS"}.get(os_name, os_name)
+    if os_name == "Windows":
+        try:
+            build = sys.getwindowsversion().build
+            os_version = f"11 (build {build})" if build >= 22000 else f"10 (build {build})"
+        except Exception:
+            os_version = platform.release()
+    elif os_name == "macOS":
+        os_version = platform.mac_ver()[0] or platform.release()
+    else:
+        os_version = platform.release()
+    return os_name, os_version
+
+
+def _accel_info() -> dict[str, str]:
+    info: dict[str, str] = {}
+    try:
+        import torch
+    except ImportError:
+        info["bitsandbytes.accel"] = "unknown"
+        return info
+
+    try:
+        if torch.cuda.is_available():
+            vendor = "amd" if getattr(torch.version, "hip", None) else "nvidia"
+            info["bitsandbytes.accel"] = vendor
+            info["bitsandbytes.accel_count"] = str(torch.cuda.device_count())
+            props = torch.cuda.get_device_properties(0)
+            info["bitsandbytes.accel_name"] = props.name
+            if vendor == "nvidia":
+                info["bitsandbytes.accel_arch"] = f"sm_{props.major}{props.minor}"
+            else:
+                info["bitsandbytes.accel_arch"] = getattr(props, "gcnArchName", "unknown")
+            return info
+
+        if hasattr(torch, "xpu") and torch.xpu.is_available():
+            info["bitsandbytes.accel"] = "xpu"
+            info["bitsandbytes.accel_count"] = str(torch.xpu.device_count())
+            try:
+                info["bitsandbytes.accel_name"] = torch.xpu.get_device_properties(0).name
+            except Exception:
+                pass
+            return info
+
+        if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
+            info["bitsandbytes.accel"] = "mps"
+            return info
+
+        if hasattr(torch, "hpu") and torch.hpu.is_available():
+            info["bitsandbytes.accel"] = "hpu"
+            return info
+    except Exception:
+        pass
+
+    info["bitsandbytes.accel"] = "cpu"
+    return info
+
+
+def _fingerprint() -> dict[str, str]:
+    global _FINGERPRINT
+    if _FINGERPRINT is not None:
+        return _FINGERPRINT
+
+    try:
+        import bitsandbytes
+
+        version = bitsandbytes.__version__
+    except Exception:
+        version = "unknown"
+
+    os_name, os_version = _os_info()
+    info = {
+        "bitsandbytes.version": version,
+        "bitsandbytes.os": os_name,
+        "bitsandbytes.os_version": os_version,
+        "bitsandbytes.arch": platform.machine(),
+        "bitsandbytes.python": platform.python_version(),
+    }
+    if os_name == "Linux":
+        try:
+            libc_name, libc_ver = platform.libc_ver()
+            if libc_name:
+                info["bitsandbytes.libc"] = f"{libc_name}-{libc_ver}"
+        except Exception:
+            pass
+    try:
+        import torch
+
+        info["bitsandbytes.torch"] = torch.__version__
+    except ImportError:
+        pass
+
+    info.update(_accel_info())
+
+    _FINGERPRINT = info
+    return info
+
+
+def report_feature(feature: str, details: Optional[dict[str, object]] = None) -> None:
+    """Report that a bitsandbytes feature was used.
+
+    Fires at most once per `feature` per process. Subsequent calls with the
+    same `feature` are O(1) no-ops.
+
+    Args:
+        feature: Short feature name. Becomes the final URL path segment:
+            `/api/telemetry/bitsandbytes/{feature}` (so it appears as
+            `path_filename` in ES queries).
+        details: Optional feature-specific key/value metadata. Keys without a
+            `bitsandbytes.` prefix are prefixed automatically.
+    """
+    if feature in _REPORTED:
+        return
+    _REPORTED.add(feature)
+
+    if _is_disabled():
+        return
+
+    try:
+        from huggingface_hub.utils import send_telemetry
+    except ImportError:
+        return
+
+    fingerprint = _fingerprint()
+    user_agent = dict(fingerprint)
+    user_agent["bitsandbytes.feature"] = feature
+    if details:
+        for k, v in details.items():
+            key = k if k.startswith("bitsandbytes.") else f"bitsandbytes.{k}"
+            user_agent[key] = str(v)
+
+    tag = os.environ.get("BNB_TELEMETRY_TAG", "").strip()
+    if tag:
+        user_agent["bitsandbytes.tag"] = tag
+
+    try:
+        send_telemetry(
+            topic=f"bitsandbytes/{feature}",
+            library_name="bitsandbytes",
+            library_version=fingerprint.get("bitsandbytes.version", "unknown"),
+            user_agent=user_agent,
+        )
+    except Exception as e:
+        logger.debug("bitsandbytes telemetry send failed: %s", e)
+
+
+def _reset_for_testing() -> None:
+    """Clear module state. Intended for use in test fixtures only."""
+    global _FINGERPRINT
+    _REPORTED.clear()
+    _FINGERPRINT = None
diff --git a/bitsandbytes/functional.py b/bitsandbytes/functional.py
@@ -12,6 +12,7 @@
 import torch
 from torch import Tensor
 
+from bitsandbytes._telemetry import report_feature
 from bitsandbytes.utils import pack_dict_to_tensor, unpack_tensor_to_dict
 
 from .cextension import lib
@@ -1593,6 +1594,8 @@ def int8_double_quant(
         - `torch.Tensor` with dtype `torch.int32`, *optional*: A list of column indices which contain outlier features.
     """
 
+    report_feature("int8_double_quant")
+
     if row_stats is not None:
         raise ValueError("row_stats must be None. int8_double_quant() does not support pre-allocated row_stats.")
     if col_stats is not None:

diff --git a/bitsandbytes/nn/modules.py b/bitsandbytes/nn/modules.py
@@ -11,6 +11,7 @@
 import torch.nn.functional as F
 
 import bitsandbytes as bnb
+from bitsandbytes._telemetry import report_feature
 from bitsandbytes.functional import (
     QuantState,
     _convert_weight_packed_for_cpu,
@@ -97,6 +98,7 @@ def __init__(
         )
         self.norm = torch.nn.LayerNorm(embedding_dim, device=device)
         GlobalOptimManager.get_instance().register_module_override(self, "weight", {"optim_bits": 32})
+        report_feature("embedding", {"variant": "stable"})
 
     def reset_parameters(self) -> None:
         torch.nn.init.xavier_uniform_(self.weight)
@@ -179,6 +181,7 @@ def __init__(
             device=device,
         )
         GlobalOptimManager.get_instance().register_module_override(self, "weight", {"optim_bits": 32})
+        report_feature("embedding", {"variant": "standard"})
 
     def reset_parameters(self) -> None:
         torch.nn.init.xavier_uniform_(self.weight)
@@ -239,6 +242,15 @@ def __new__(
         self.bnb_quantized = bnb_quantized
         self.data = data
         self.module = module
+        report_feature(
+            "params_4bit",
+            {
+                "quant_type": quant_type,
+                "blocksize": blocksize,
+                "compress_statistics": compress_statistics,
+                "quant_storage": str(quant_storage).replace("torch.", ""),
+            },
+        )
         return self
 
     def __getstate__(self):
@@ -607,6 +619,16 @@ def _save_to_state_dict(self, destination, prefix, keep_vars):
                 destination[prefix + "weight." + k] = v if keep_vars else v.detach()
 
     def forward(self, x: torch.Tensor):
+        report_feature(
+            "linear_4bit",
+            {
+                "quant_type": getattr(self.weight, "quant_type", "unknown"),
+                "blocksize": getattr(self.weight, "blocksize", 0),
+                "compress_statistics": getattr(self.weight, "compress_statistics", False),
+                "input_dtype": str(x.dtype).replace("torch.", ""),
+                "compute_dtype": (str(self.compute_dtype).replace("torch.", "") if self.compute_dtype else "auto"),
+            },
+        )
         fix_4bit_weight_quant_state_from_module(self)
         quant_state = self.weight.quant_state
 
@@ -732,6 +754,7 @@ def __new__(
         obj.CB = CB
         obj.SCB = SCB
         obj.has_fp16_weights = has_fp16_weights
+        report_feature("int8_params", {"has_fp16_weights": has_fp16_weights})
         return obj
 
     def _quantize(self, device):
@@ -855,6 +878,7 @@ def __init__(self, num_embeddings, embedding_dim, device=None, dtype=None):
         self.dtype = self.weight.data.dtype
 
         self.weight = Int8Params(self.weight.data, has_fp16_weights=False, requires_grad=False)
+        report_feature("embedding", {"variant": "8bit"})
 
     def _save_to_state_dict(self, destination, prefix, keep_vars):
         raise NotImplementedError("Saving Embedding8bit module is not implemented")
@@ -926,6 +950,7 @@ def __init__(
                 f"Embedding size {embedding_dim} is not divisible by block size {blocksize}. "
                 "This will lead to slow inference.",
             )
+        report_feature("embedding", {"variant": "4bit", "quant_type": quant_type})
 
     def _forward_with_partial_dequantize(self, input: Tensor):
         assert self.embedding_dim % self.weight.quant_state.blocksize == 0
@@ -1178,6 +1203,14 @@ def to(self, *args, **kwargs):
         return result
 
     def forward(self, x: torch.Tensor):
+        report_feature(
+            "linear_8bit",
+            {
+                "has_fp16_weights": self.state.has_fp16_weights,
+                "threshold": self.state.threshold,
+                "input_dtype": str(x.dtype).replace("torch.", ""),
+            },
+        )
         self.state.is_training = self.training
         if self.weight.CB is not None:
             self.init_8bit_state()
@@ -1199,6 +1232,7 @@ def __init__(self, input_features, output_features, bias=True, device=None):
         super().__init__(input_features, output_features, bias, device)
         self.outlier_dim = None
         self.is_quantized = False
+        report_feature("outlier_aware_linear")
 
     def forward_with_outliers(self, x, outlier_idx):
         raise NotImplementedError("Please override the `forward_with_outliers(self, x, outlier_idx)` function")