feat: Add lightonai/LateOn model#643
Conversation
📝 WalkthroughWalkthroughThis PR adds support for the LateOn late-interaction embedding model from lightonai. The change introduces a new Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
fastembed/late_interaction/lateon.py (1)
81-81: ⚡ Quick winConsider adding
strict=Trueto thezipcall to catch batch-size mismatches.Line 81 assumes
output.model_outputandoutput.attention_maskhave the same length. Since the project requires Python>=3.10.0, addingstrict=Trueis compatible and will fail fast on mismatches.🔒 Proposed improvement
- for embedding, attention_mask in zip(output.model_output, output.attention_mask): + for embedding, attention_mask in zip(output.model_output, output.attention_mask, strict=True):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@fastembed/late_interaction/lateon.py` at line 81, The loop zipping output.model_output and output.attention_mask should fail fast on length mismatches; update the zip call in lateon.py (the line iterating "for embedding, attention_mask in zip(output.model_output, output.attention_mask):") to use zip(..., strict=True) so Python raises an error if batch sizes differ, ensuring mismatched output.model_output and output.attention_mask are caught immediately.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@fastembed/late_interaction/lateon.py`:
- Line 81: The loop zipping output.model_output and output.attention_mask should
fail fast on length mismatches; update the zip call in lateon.py (the line
iterating "for embedding, attention_mask in zip(output.model_output,
output.attention_mask):") to use zip(..., strict=True) so Python raises an error
if batch sizes differ, ensuring mismatched output.model_output and
output.attention_mask are caught immediately.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3e53e071-1129-4bd3-aa93-66bb7034bf10
📒 Files selected for processing (3)
fastembed/late_interaction/late_interaction_text_embedding.pyfastembed/late_interaction/lateon.pytests/test_late_interaction_embeddings.py
Resolves #641
See these commits (which have been reverted so as not to pollute the codebase):
pylate canonical values script
compare encoding and rerank with pylate
All Submissions:
New Feature Submissions:
pre-commitwithpip3 install pre-commitand set up hooks withpre-commit install?New models submission: