Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,43 @@ bitsandbytes has the following minimum requirements for all platforms:
* 🤗 [Diffusers](https://huggingface.co/docs/diffusers/quantization/bitsandbytes)
* 🤗 [PEFT](https://huggingface.co/docs/peft/developer_guides/quantization#quantize-a-model)

## Telemetry

`bitsandbytes` sends anonymous, aggregate feature-usage telemetry to the
Hugging Face Hub. This data is used to prioritize maintenance (which quantization
methods and optimizers are actually in use?) and to safely retire features that
are no longer called by anyone.

### What is collected

* A session fingerprint sent once per process: `bitsandbytes` version, OS
name/version, CPU architecture, Python/PyTorch versions, accelerator
vendor/name/arch/count (e.g. `nvidia`, `NVIDIA H100`, `sm_90`, `1`).
* One event per distinct feature used, with feature-specific flags. For
example: using `Linear4bit` sends `quant_type=nf4`, `blocksize=64`; using
`AdamW8bit.step()` sends `name=AdamW8bit`, `is_paged=false`.

### What is never collected

Model names, file paths, tensor shapes, parameter values, user identifiers, or
anything derived from user input.

### How to opt out

Set any one of these environment variables:

| Variable | Scope |
| ---------------------------- | ---------------------------- |
| `BNB_DISABLE_TELEMETRY=1` | `bitsandbytes` only |
| `HF_HUB_DISABLE_TELEMETRY=1` | all Hugging Face libraries |
| `HF_HUB_OFFLINE=1` | all Hugging Face libraries |

Telemetry is also automatically suppressed while running under `pytest` (so
CI and local test runs don't pollute the stream) and a silent no-op when
`huggingface_hub` is not installed. The implementation lives in
[`bitsandbytes/_telemetry.py`](bitsandbytes/_telemetry.py) and each event
fires at most once per process.
Comment on lines +186 to +221
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify this:

Suggested change
## Telemetry
`bitsandbytes` sends anonymous, aggregate feature-usage telemetry to the
Hugging Face Hub. This data is used to prioritize maintenance (which quantization
methods and optimizers are actually in use?) and to safely retire features that
are no longer called by anyone.
### What is collected
* A session fingerprint sent once per process: `bitsandbytes` version, OS
name/version, CPU architecture, Python/PyTorch versions, accelerator
vendor/name/arch/count (e.g. `nvidia`, `NVIDIA H100`, `sm_90`, `1`).
* One event per distinct feature used, with feature-specific flags. For
example: using `Linear4bit` sends `quant_type=nf4`, `blocksize=64`; using
`AdamW8bit.step()` sends `name=AdamW8bit`, `is_paged=false`.
### What is never collected
Model names, file paths, tensor shapes, parameter values, user identifiers, or
anything derived from user input.
### How to opt out
Set any one of these environment variables:
| Variable | Scope |
| ---------------------------- | ---------------------------- |
| `BNB_DISABLE_TELEMETRY=1` | `bitsandbytes` only |
| `HF_HUB_DISABLE_TELEMETRY=1` | all Hugging Face libraries |
| `HF_HUB_OFFLINE=1` | all Hugging Face libraries |
Telemetry is also automatically suppressed while running under `pytest` (so
CI and local test runs don't pollute the stream) and a silent no-op when
`huggingface_hub` is not installed. The implementation lives in
[`bitsandbytes/_telemetry.py`](bitsandbytes/_telemetry.py) and each event
fires at most once per process.
## Telemetry
bitsandbytes collects anonymous feature-usage data using the same telemetry
mechanism as other Hugging Face libraries (Transformers, Gradio, etc.). This
helps us understand which features are actively used so we can prioritize
maintenance and make informed decisions about deprecation.
### What is collected
Hardware and version info sent once per process (bitsandbytes version, OS, CPU
architecture, accelerator type and compute capability) plus one event per
distinct feature used per process.
### How to opt out
Set any of the following environment variables:
| Variable | Effect |
| ---------------------------- | ----------------------------------- |
| `HF_HUB_DISABLE_TELEMETRY=1` | Disables telemetry in all HF libs |
| `HF_HUB_OFFLINE=1` | Disables all outbound HF Hub calls |
| `DO_NOT_TRACK=1` | Standard cross-tool opt-out signal |


## :heart: Sponsors
The continued maintenance and development of `bitsandbytes` is made possible thanks to the generous support of our sponsors. Their contributions help ensure that we can keep improving the project and delivering valuable updates to the community.

Expand Down
231 changes: 231 additions & 0 deletions bitsandbytes/_telemetry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
# Copyright (c) Facebook, Inc. and its affiliates.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: copyright here is wrong, let's take this out or replace with more appropriate

#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
"""Anonymous feature-usage telemetry for bitsandbytes.

Sends one HEAD request per distinct feature per process via
`huggingface_hub.utils.send_telemetry()`. Data lands in the Hugging Face
Hub telemetry index under `path_prefix == "/api/telemetry/bitsandbytes/"`
and informs maintenance and deprecation decisions.

What is collected
- Session fingerprint (once per process, first feature use):
bnb version, OS name/version, CPU arch, glibc version, Python/torch
versions, accelerator vendor/name/arch/count.
- Per-feature events: feature name plus feature-specific metadata
(e.g. `quant_type="nf4"`, `bits="8"`, `paged="true"`).

What is NOT collected
Model names, file paths, parameter shapes, user identifiers, training
data, gradient values, or any value derived from user input.

Automatically disabled when running under pytest (detected via
`pytest` in `sys.modules` or `PYTEST_CURRENT_TEST` env var) so that test
runs in CI and locally do not pollute the real-usage stream.

Opt-out (any of the following env vars disables all telemetry):
- BNB_DISABLE_TELEMETRY=1 (bitsandbytes only)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we need to roll our own; it's cleaner to just reuse the existing? I don't see a use case for e.g. opting out of HF Hub telemetry but still opting in for BNB?

- HF_HUB_DISABLE_TELEMETRY=1 (all HF libraries)
- HF_HUB_OFFLINE=1 (all HF libraries)

End-to-end verification:
Set `BNB_TELEMETRY_TAG=<some-id>` before importing bitsandbytes and the
value is attached as `bitsandbytes.tag` on every event. Use this to
correlate a single run's events in ES.
Comment on lines +32 to +35
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unnecessary/overkill?


No-ops silently if `huggingface_hub` is not installed, and never raises.

Keys are namespaced under `bitsandbytes.*` in the resulting
`metadata.bitsandbytes.*` fields so they do not collide with fields logged
by other libraries in the shared telemetry index.
"""

from __future__ import annotations

import logging
import os
import platform
import sys
from typing import Optional

logger = logging.getLogger(__name__)

_REPORTED: set[str] = set()
_FINGERPRINT: Optional[dict[str, str]] = None

_TRUTHY = frozenset({"1", "true", "yes", "on"})


def _is_pytest() -> bool:
"""Detect whether we are running inside a pytest process.

Telemetry is suppressed during test runs so that CI and local test
invocations don't pollute the real-usage stream. Tests that want to
assert on telemetry behavior monkey-patch this function to return False.
"""
return "pytest" in sys.modules or "PYTEST_CURRENT_TEST" in os.environ
Comment on lines +60 to +67
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider looking at other env variables and not bother with the "pytest" in sys.modules condition.

Most CI platforms will have an env var like CI for this.



def _is_disabled() -> bool:
for var in ("BNB_DISABLE_TELEMETRY", "HF_HUB_DISABLE_TELEMETRY", "HF_HUB_OFFLINE"):
if os.environ.get(var, "").strip().lower() in _TRUTHY:
return True
if _is_pytest():
return True
return False


def _os_info() -> tuple[str, str]:
os_name = platform.system()
os_name = {"Darwin": "macOS"}.get(os_name, os_name)
if os_name == "Windows":
try:
build = sys.getwindowsversion().build
os_version = f"11 (build {build})" if build >= 22000 else f"10 (build {build})"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fragile and also ignores Windows Server etc.

except Exception:
os_version = platform.release()
elif os_name == "macOS":
os_version = platform.mac_ver()[0] or platform.release()
else:
os_version = platform.release()
return os_name, os_version


def _accel_info() -> dict[str, str]:
info: dict[str, str] = {}
try:
import torch
except ImportError:
Comment on lines +97 to +99
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch is already a pretty hard dep, this shouldnt need to be caught

info["bitsandbytes.accel"] = "unknown"
return info

try:
if torch.cuda.is_available():
vendor = "amd" if getattr(torch.version, "hip", None) else "nvidia"
info["bitsandbytes.accel"] = vendor
info["bitsandbytes.accel_count"] = str(torch.cuda.device_count())
props = torch.cuda.get_device_properties(0)
info["bitsandbytes.accel_name"] = props.name
if vendor == "nvidia":
info["bitsandbytes.accel_arch"] = f"sm_{props.major}{props.minor}"
else:
info["bitsandbytes.accel_arch"] = getattr(props, "gcnArchName", "unknown")
return info

Comment on lines +104 to +115
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only looks at the first device; I'm not sure but we may be interested when there's multiple devices and they're different. I'm wondering if for that maybe we just add some sort of flag to tell us whether it is a heterogeneous system or not. Likely it is, but may be valuable to find out otherwise.

Let's grab device 0's SM count and memory. We don't really need the name. So this should be for both AMD and NVIDIA the multi_processor_count and total_memory properties. Keep gcnArchName and major/minor.

if hasattr(torch, "xpu") and torch.xpu.is_available():
info["bitsandbytes.accel"] = "xpu"
info["bitsandbytes.accel_count"] = str(torch.xpu.device_count())
try:
info["bitsandbytes.accel_name"] = torch.xpu.get_device_properties(0).name
except Exception:
pass
return info

if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
info["bitsandbytes.accel"] = "mps"
return info

if hasattr(torch, "hpu") and torch.hpu.is_available():
info["bitsandbytes.accel"] = "hpu"
return info
except Exception:
pass

info["bitsandbytes.accel"] = "cpu"
return info


def _fingerprint() -> dict[str, str]:
global _FINGERPRINT
if _FINGERPRINT is not None:
return _FINGERPRINT

try:
import bitsandbytes

version = bitsandbytes.__version__
except Exception:
version = "unknown"

os_name, os_version = _os_info()
info = {
"bitsandbytes.version": version,
"bitsandbytes.os": os_name,
"bitsandbytes.os_version": os_version,
"bitsandbytes.arch": platform.machine(),
"bitsandbytes.python": platform.python_version(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant too, huggingface_hub likely includes Python version already

}
if os_name == "Linux":
try:
libc_name, libc_ver = platform.libc_ver()
if libc_name:
info["bitsandbytes.libc"] = f"{libc_name}-{libc_ver}"
except Exception:
pass
try:
import torch

info["bitsandbytes.torch"] = torch.__version__
except ImportError:
pass
Comment on lines +166 to +171
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant, does hf hub automatically collect this?


info.update(_accel_info())

_FINGERPRINT = info
return info


def report_feature(feature: str, details: Optional[dict[str, object]] = None) -> None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for more clarity we should just name this _report_feature as well.

"""Report that a bitsandbytes feature was used.

Fires at most once per `feature` per process. Subsequent calls with the
same `feature` are O(1) no-ops.

Args:
feature: Short feature name. Becomes the final URL path segment:
`/api/telemetry/bitsandbytes/{feature}` (so it appears as
`path_filename` in ES queries).
details: Optional feature-specific key/value metadata. Keys without a
`bitsandbytes.` prefix are prefixed automatically.
"""
if feature in _REPORTED:
return
_REPORTED.add(feature)

if _is_disabled():
return
Comment on lines +192 to +197
Copy link
Copy Markdown
Member

@matthewdouglas matthewdouglas Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to to de-duping more granular than just the "feature" name as it is. But maybe we just name the features differently in that case. So that's more of a minor nit.

Should we add to _REPORTED even when disabled? Seems to me we should just exit right away.


try:
from huggingface_hub.utils import send_telemetry
except ImportError:
return

fingerprint = _fingerprint()
user_agent = dict(fingerprint)
user_agent["bitsandbytes.feature"] = feature
if details:
for k, v in details.items():
key = k if k.startswith("bitsandbytes.") else f"bitsandbytes.{k}"
user_agent[key] = str(v)

tag = os.environ.get("BNB_TELEMETRY_TAG", "").strip()
if tag:
user_agent["bitsandbytes.tag"] = tag
Comment on lines +212 to +214
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comment earlier, seems unnecessary.


try:
send_telemetry(
topic=f"bitsandbytes/{feature}",
library_name="bitsandbytes",
library_version=fingerprint.get("bitsandbytes.version", "unknown"),
user_agent=user_agent,
)
except Exception as e:
logger.debug("bitsandbytes telemetry send failed: %s", e)


def _reset_for_testing() -> None:
"""Clear module state. Intended for use in test fixtures only."""
global _FINGERPRINT
_REPORTED.clear()
_FINGERPRINT = None
3 changes: 3 additions & 0 deletions bitsandbytes/functional.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import torch
from torch import Tensor

from bitsandbytes._telemetry import report_feature
from bitsandbytes.utils import pack_dict_to_tensor, unpack_tensor_to_dict

from .cextension import lib
Expand Down Expand Up @@ -1593,6 +1594,8 @@ def int8_double_quant(
- `torch.Tensor` with dtype `torch.int32`, *optional*: A list of column indices which contain outlier features.
"""

report_feature("int8_double_quant")

if row_stats is not None:
raise ValueError("row_stats must be None. int8_double_quant() does not support pre-allocated row_stats.")
if col_stats is not None:
Expand Down
34 changes: 34 additions & 0 deletions bitsandbytes/nn/modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import torch.nn.functional as F

import bitsandbytes as bnb
from bitsandbytes._telemetry import report_feature
from bitsandbytes.functional import (
QuantState,
_convert_weight_packed_for_cpu,
Expand Down Expand Up @@ -97,6 +98,7 @@ def __init__(
)
self.norm = torch.nn.LayerNorm(embedding_dim, device=device)
GlobalOptimManager.get_instance().register_module_override(self, "weight", {"optim_bits": 32})
report_feature("embedding", {"variant": "stable"})

def reset_parameters(self) -> None:
torch.nn.init.xavier_uniform_(self.weight)
Expand Down Expand Up @@ -179,6 +181,7 @@ def __init__(
device=device,
)
GlobalOptimManager.get_instance().register_module_override(self, "weight", {"optim_bits": 32})
report_feature("embedding", {"variant": "standard"})

def reset_parameters(self) -> None:
torch.nn.init.xavier_uniform_(self.weight)
Expand Down Expand Up @@ -239,6 +242,15 @@ def __new__(
self.bnb_quantized = bnb_quantized
self.data = data
self.module = module
report_feature(
"params_4bit",
{
"quant_type": quant_type,
"blocksize": blocksize,
"compress_statistics": compress_statistics,
"quant_storage": str(quant_storage).replace("torch.", ""),
},
Comment on lines +245 to +252
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting to think we don't need this here, would prefer we just keep Linear4bit and Linear8bitLt but remove this on Params4bit/Int8Params.

)
return self

def __getstate__(self):
Expand Down Expand Up @@ -607,6 +619,16 @@ def _save_to_state_dict(self, destination, prefix, keep_vars):
destination[prefix + "weight." + k] = v if keep_vars else v.detach()

def forward(self, x: torch.Tensor):
report_feature(
"linear_4bit",
{
"quant_type": getattr(self.weight, "quant_type", "unknown"),
"blocksize": getattr(self.weight, "blocksize", 0),
"compress_statistics": getattr(self.weight, "compress_statistics", False),
"input_dtype": str(x.dtype).replace("torch.", ""),
"compute_dtype": (str(self.compute_dtype).replace("torch.", "") if self.compute_dtype else "auto"),
},
)
Comment on lines 621 to +631
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we do this in __init__ rather than add unnecessary overhead in the forward() hot path. Plus most of this is in __init__ - you don't need to use all these getattr calls.

fix_4bit_weight_quant_state_from_module(self)
quant_state = self.weight.quant_state

Expand Down Expand Up @@ -732,6 +754,7 @@ def __new__(
obj.CB = CB
obj.SCB = SCB
obj.has_fp16_weights = has_fp16_weights
report_feature("int8_params", {"has_fp16_weights": has_fp16_weights})
return obj

def _quantize(self, device):
Expand Down Expand Up @@ -855,6 +878,7 @@ def __init__(self, num_embeddings, embedding_dim, device=None, dtype=None):
self.dtype = self.weight.data.dtype

self.weight = Int8Params(self.weight.data, has_fp16_weights=False, requires_grad=False)
report_feature("embedding", {"variant": "8bit"})

def _save_to_state_dict(self, destination, prefix, keep_vars):
raise NotImplementedError("Saving Embedding8bit module is not implemented")
Expand Down Expand Up @@ -926,6 +950,7 @@ def __init__(
f"Embedding size {embedding_dim} is not divisible by block size {blocksize}. "
"This will lead to slow inference.",
)
report_feature("embedding", {"variant": "4bit", "quant_type": quant_type})

def _forward_with_partial_dequantize(self, input: Tensor):
assert self.embedding_dim % self.weight.quant_state.blocksize == 0
Expand Down Expand Up @@ -1178,6 +1203,14 @@ def to(self, *args, **kwargs):
return result

def forward(self, x: torch.Tensor):
report_feature(
"linear_8bit",
{
"has_fp16_weights": self.state.has_fp16_weights,
"threshold": self.state.threshold,
"input_dtype": str(x.dtype).replace("torch.", ""),
},
)
Comment on lines +1206 to +1213
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as with Linear4bit, this is best in __init__ and not in forward.

self.state.is_training = self.training
if self.weight.CB is not None:
self.init_8bit_state()
Expand All @@ -1199,6 +1232,7 @@ def __init__(self, input_features, output_features, bias=True, device=None):
super().__init__(input_features, output_features, bias, device)
self.outlier_dim = None
self.is_quantized = False
report_feature("outlier_aware_linear")

def forward_with_outliers(self, x, outlier_idx):
raise NotImplementedError("Please override the `forward_with_outliers(self, x, outlier_idx)` function")
Expand Down
Loading
Loading