Skip to content

feat: standardize logits in mot#1261

Open
AngeloDanducci wants to merge 1 commit into
generative-computing:mainfrom
AngeloDanducci:ad-123
Open

feat: standardize logits in mot#1261
AngeloDanducci wants to merge 1 commit into
generative-computing:mainfrom
AngeloDanducci:ad-123

Conversation

@AngeloDanducci

Copy link
Copy Markdown
Contributor

Pull Request

Issue

Fixes #123

Description

standardize logits in MOT

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

  • Component
  • Requirement
  • Sampling Strategy
  • Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

Signed-off-by: AngeloDanducci <angelo.danducci.ii@ibm.com>
@AngeloDanducci AngeloDanducci requested review from a team, jakelorocco and nrfulton as code owners June 12, 2026 06:13
@github-actions github-actions Bot added the enhancement New feature or request label Jun 12, 2026
@AngeloDanducci AngeloDanducci enabled auto-merge June 12, 2026 06:16
@psschwei

Copy link
Copy Markdown
Member

cc @ajbozarth as it relates to your MOT redesign work

@ajbozarth

Copy link
Copy Markdown
Contributor

While reviewing I realized that when I reworked #909 into an epic it called out an intent to put logits into generation when implementing #123 but didn't add a comment on that issue to do so.

when implemented, logits go in mot.generation (e.g. generation.logprobs), not mot.raw. Per #793 precedent.

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple follow up nits to my above comment

Comment thread mellea/core/base.py
Comment on lines +367 to +372
"""Per-token logit scores from the backend, or ``None`` if not requested or unavailable.

Populated when ``ModelOption.LOGITS=True`` and the backend supports it.
For the HuggingFace backend this is a tuple of 1-D tensors of shape
``(vocab_size,)``, one per generated token.
"""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure putting a long docstring here is good practice, though if this moves into generation it would belong there

Comment thread mellea/core/base.py
# Additional fields that should be standardized across apis.
self.tool_calls = tool_calls
self._thinking: str | None = None
self.logits: Any | None = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as noted in my previous comment (and in #909) we should consider moving this into generation instead of staying on the MOT directly.

@ajbozarth ajbozarth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback from Claude:

Main ask is to land the field on GenerationMetadata rather than as a top-level attr on ModelOutputThunk — per #909 and the #793 precedent, that's the standardized home for backend-execution metadata. The implementation logic (squeeze, clone-per-batch-item, cached vs. non-cached branches) looks correct; the rest of my comments are smaller items.

Comment thread mellea/core/base.py
# Additional fields that should be standardized across apis.
self.tool_calls = tool_calls
self._thinking: str | None = None
self.logits: Any | None = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this onto GenerationMetadata as generation.logits: tuple[torch.Tensor, ...] | None = None, and drop self.logits, the _copy_from, __copy__, and __deepcopy__ lines for it. GenerationMetadata is already deep-copied as a unit.

Also: the type should be the concrete tuple[torch.Tensor, ...] | None, not Any | None. Use TYPE_CHECKING to keep torch out of the runtime import path.

# squeeze(0): hf_output.scores is (1, vocab_size) per token; normalise to (vocab_size,)
mot.logits = tuple(s.squeeze(0) for s in hf_output.scores)

# Clear KV cache and scores from HF output - they're now owned by the LRU cache

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment now: when LOGITS=True, the views in mot.logits also pin these tensors, so they aren't solely owned by the LRU cache. Worth tweaking the wording.

Only supported by the HuggingFace local backend. Ignored silently by
backends that cannot return logits (OpenAI, Ollama, LiteLLM, WatsonX).

**Streaming not supported**: when ``ModelOption.STREAM=True``, logit

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Streaming not supported**: when ``ModelOption.STREAM=True``, logit
**Streaming not supported**: when ``ModelOption.STREAM=True``, logit
scores are not available and ``ModelOutputThunk.generation.logits`` will be ``None``.
Backends that cannot return logits (OpenAI, Ollama, LiteLLM, WatsonX) log
a warning when this option is set and leave ``generation.logits`` as ``None``.

Pair this with adding the warning in each non-HF backend's options handler — happy to take that as a follow-up if you'd rather not touch four backends in this PR. A silent None is hard to debug.

assert captured["generate_input"]["do_sample"] is False
assert "temperature" not in captured["generate_input"]


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only covers the elif (caching-disabled) branch. Add a sibling test with _use_caches=True and a fake past_key_values so the first branch — which sets mot.logits before hf_output.scores = None — also has coverage. That's the production hot path.

Comment thread test/core/test_logits.py


if __name__ == "__main__":
pytest.main([__file__])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the if __name__ == "__main__" block — uncommon in this repo, pytest discovers the file via test_*.py naming.

Also missing: a test that generation.logits stays None when both LOGITS=True and STREAM=True are set, since that's a documented contract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Standardize logits in MoT

3 participants