feat: standardize logits in mot by AngeloDanducci · Pull Request #1261 · generative-computing/mellea

AngeloDanducci · 2026-06-12T06:13:51Z

Pull Request

Issue

Fixes #123

Description

standardize logits in MOT

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code was added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

Component
Requirement
Sampling Strategy
Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>

psschwei · 2026-06-12T11:50:27Z

cc @ajbozarth as it relates to your MOT redesign work

ajbozarth · 2026-06-12T15:02:45Z

While reviewing I realized that when I reworked #909 into an epic it called out an intent to put logits into generation when implementing #123 but didn't add a comment on that issue to do so.

when implemented, logits go in mot.generation (e.g. generation.logprobs), not mot.raw. Per #793 precedent.

ajbozarth

a couple follow up nits to my above comment

ajbozarth · 2026-06-12T15:08:34Z

+        """Per-token logit scores from the backend, or ``None`` if not requested or unavailable.
+
+        Populated when ``ModelOption.LOGITS=True`` and the backend supports it.
+        For the HuggingFace backend this is a tuple of 1-D tensors of shape
+        ``(vocab_size,)``, one per generated token.
+        """


I'm not sure putting a long docstring here is good practice, though if this moves into generation it would belong there

ajbozarth · 2026-06-12T15:10:38Z

        # Additional fields that should be standardized across apis.
        self.tool_calls = tool_calls
        self._thinking: str | None = None
+        self.logits: Any | None = None


as noted in my previous comment (and in #909) we should consider moving this into generation instead of staying on the MOT directly.

ajbozarth

Some feedback from Claude:

Main ask is to land the field on GenerationMetadata rather than as a top-level attr on ModelOutputThunk — per #909 and the #793 precedent, that's the standardized home for backend-execution metadata. The implementation logic (squeeze, clone-per-batch-item, cached vs. non-cached branches) looks correct; the rest of my comments are smaller items.

ajbozarth · 2026-06-12T15:33:08Z

        # Additional fields that should be standardized across apis.
        self.tool_calls = tool_calls
        self._thinking: str | None = None
+        self.logits: Any | None = None


Move this onto GenerationMetadata as generation.logits: tuple[torch.Tensor, ...] | None = None, and drop self.logits, the _copy_from, __copy__, and __deepcopy__ lines for it. GenerationMetadata is already deep-copied as a unit.

Also: the type should be the concrete tuple[torch.Tensor, ...] | None, not Any | None. Use TYPE_CHECKING to keep torch out of the runtime import path.

ajbozarth · 2026-06-12T15:33:08Z

+                # squeeze(0): hf_output.scores is (1, vocab_size) per token; normalise to (vocab_size,)
+                mot.logits = tuple(s.squeeze(0) for s in hf_output.scores)
+
            # Clear KV cache and scores from HF output - they're now owned by the LRU cache


Stale comment now: when LOGITS=True, the views in mot.logits also pin these tensors, so they aren't solely owned by the LRU cache. Worth tweaking the wording.

ajbozarth · 2026-06-12T15:33:08Z

+    Only supported by the HuggingFace local backend. Ignored silently by
+    backends that cannot return logits (OpenAI, Ollama, LiteLLM, WatsonX).
+
+    **Streaming not supported**: when ``ModelOption.STREAM=True``, logit


Suggested change

**Streaming not supported**: when ``ModelOption.STREAM=True``, logit

**Streaming not supported**: when ``ModelOption.STREAM=True``, logit

scores are not available and ``ModelOutputThunk.generation.logits`` will be ``None``.

Backends that cannot return logits (OpenAI, Ollama, LiteLLM, WatsonX) log

a warning when this option is set and leave ``generation.logits`` as ``None``.

Pair this with adding the warning in each non-HF backend's options handler — happy to take that as a follow-up if you'd rather not touch four backends in this PR. A silent None is hard to debug.

ajbozarth · 2026-06-12T15:33:08Z

    assert captured["generate_input"]["do_sample"] is False
    assert "temperature" not in captured["generate_input"]
+
+


This only covers the elif (caching-disabled) branch. Add a sibling test with _use_caches=True and a fake past_key_values so the first branch — which sets mot.logits before hf_output.scores = None — also has coverage. That's the production hot path.

ajbozarth · 2026-06-12T15:33:08Z

+
+
+if __name__ == "__main__":
+    pytest.main([__file__])


Drop the if __name__ == "__main__" block — uncommon in this repo, pytest discovers the file via test_*.py naming.

Also missing: a test that generation.logits stays None when both LOGITS=True and STREAM=True are set, since that's a documented contract.

standardize logits in mot

0ac66d2

Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>

AngeloDanducci requested review from a team, jakelorocco and nrfulton as code owners June 12, 2026 06:13

AngeloDanducci requested a review from akihikokuroda June 12, 2026 06:13

github-actions Bot added the enhancement New feature or request label Jun 12, 2026

AngeloDanducci enabled auto-merge June 12, 2026 06:16

ajbozarth reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: standardize logits in mot#1261

feat: standardize logits in mot#1261
AngeloDanducci wants to merge 1 commit into
generative-computing:mainfrom
AngeloDanducci:ad-123

AngeloDanducci commented Jun 12, 2026

Uh oh!

psschwei commented Jun 12, 2026

Uh oh!

ajbozarth commented Jun 12, 2026

Uh oh!

ajbozarth left a comment

Uh oh!

ajbozarth Jun 12, 2026

Uh oh!

ajbozarth Jun 12, 2026

Uh oh!

ajbozarth left a comment

Uh oh!

ajbozarth Jun 12, 2026

Uh oh!

ajbozarth Jun 12, 2026

Uh oh!

ajbozarth Jun 12, 2026

Uh oh!

ajbozarth Jun 12, 2026

Uh oh!

ajbozarth Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    **Streaming not supported**: when ``ModelOption.STREAM=True``, logit
+    **Streaming not supported**: when ``ModelOption.STREAM=True``, logit
+    scores are not available and ``ModelOutputThunk.generation.logits`` will be ``None``.
+    Backends that cannot return logits (OpenAI, Ollama, LiteLLM, WatsonX) log
+    a warning when this option is set and leave ``generation.logits`` as ``None``.

		assert captured["generate_input"]["do_sample"] is False
		assert "temperature" not in captured["generate_input"]

Conversation

AngeloDanducci commented Jun 12, 2026

Pull Request

Issue

Description

Testing

Attribution

Adding a new component, requirement, sampling strategy, or tool?

Uh oh!

psschwei commented Jun 12, 2026

Uh oh!

ajbozarth commented Jun 12, 2026

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants