Add pluggable embedding backends by aminsmd · Pull Request #369 · Watts-Lab/team_comm_tools

aminsmd · 2026-04-21T22:06:43Z

Summary

add a pluggable embedding_fn interface to FeatureBuilder
lazy-load the default sentence-transformers and RoBERTa models instead of initializing them at import time
keep custom vector caches backend-specific via embedding_backend_id / embedding_dim
fix Discursive Diversity fallback handling for non-default embedding dimensions
add focused regression tests and a README usage example

Details

This keeps the default behavior unchanged when no custom encoder is provided.

For custom backends, users can now do:

fb = FeatureBuilder(
    ...,
    embedding_fn=my_encoder,
    embedding_backend_id="openai-text-embedding-3-small",
    embedding_dim=1536,
)

The vector cache path is namespaced for custom backends so switching embedding sources does not silently reuse incompatible cached vectors.

Validation

python -m pytest tests/test_pluggable_embeddings.py tests/test_discursive_diversity_custom_embeddings.py -q
full FeatureBuilder run completed on a real dataset with:
- OpenAI text-embedding-3-small for vector-based features
- default CardiffNLP RoBERTa sentiment model for sentiment features

Notes

no new required dependency on openai was added to the package API; the OpenAI example remains user-land via embedding_fn
the DD fix was necessary to support custom embedding dimensions when chunk-level fallback vectors are used

Add pluggable embedding backends

c812639

aminsmd mentioned this pull request Apr 21, 2026

Pluggable embedding backend — allow OpenAI (and other) encoders alongside Sentence-Transformers #368

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pluggable embedding backends#369

Add pluggable embedding backends#369
aminsmd wants to merge 1 commit intoWatts-Lab:mainfrom
aminsmd:feat/pluggable-embedding-backend

aminsmd commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aminsmd commented Apr 21, 2026

Summary

Details

Validation

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant